Maximum entropy in dynamic complex networks

Noam Abadi [email protected] Franco Ruzzenenti Integrated Research on Energy, Environment and Society, Faculty of Science and Engineering, University of Groningen

(July 3, 2024)

Abstract

The field of complex networks studies a wide variety of interacting systems by representing them as networks. To understand their properties and mutual relations, the randomisation of network connections is a commonly used tool. However, information theoretic-based randomisation methods with well-established foundations mostly provide a stationary description of these systems, while stochastic randomisation methods that account for their dynamic nature lack such general foundations and require extensive repetition of the stochastic process to measure statistical properties. In this work, we extend the applicability of information-theoretic methods beyond stationary network models. By using the information-theoretic principle of maximum caliber we construct dynamic network ensemble distributions based on constraints representing statistical properties with known values throughout the evolution. We focus on the particular cases of dynamics constrained by the average number of connections of the whole network and each node, comparing each evolution to simulations of stochastic randomisation that obey the same constraints. We find that ensemble distributions estimated from simulations match those calculated with maximum caliber and that the equilibrium distributions to which they converge agree with known results of maximum entropy given the same constraints. Finally, we discuss further the connections to other maximum entropy approaches to network dynamics and conclude by proposing some possible avenues of future research.

Complex networks, maximum entropy, network dynamics

^†^†preprint: APS/123-QED

I Introduction

Complex networks is a growing field of research that studies a wide variety of interacting systems, ranging from molecular [1, 2] to socio-economic scales [3, 4]. Individual components of the system (e.g. atoms, people, or companies) are generally referred to as nodes while interactions between pairs of nodes (e.g. forces, language, or money) are called links. Although the theoretical foundations to explain these systems may often be incomplete, statistical techniques offer a valuable alternative to understanding their properties. One such technique is network randomisation, the reconfiguration of which and how different pairs of nodes are linked, used to bring out relations between network properties beyond details of particular case studies [5, 6, 7].

Statistical properties over an ensemble of networks obtained by randomisation can be estimated from samples of the ensemble. However, a network distribution, essentially the probability of each network in an ensemble, is useful both for analytically calculating statistical properties and for drawing samples from this distribution. The introduction of techniques from information theory has yielded a rigorous method for constructing network distributions based on properties of the network to be randomised, establishing a formal framework for networks analogous to statistical mechanics [8, 9]. Instead of estimating the distribution from networks where connections have been explicitly modified, it relies on analytically finding network distributions that maximise their Shannon entropy given specified constraints, that is, average values over the distribution. Constraints reflect properties that are shared between the distribution on average and the original network to be randomised, such as the number of links in the whole network. Meanwhile, the fact that these distributions maximise entropy allows them to be interpreted as being maximally random, or more precisely unbiased, with respect to properties that are not specified. Samples drawn from this distribution can then be understood as randomised networks which, on average, retain the properties of the original network specified by constraints, but are maximally random otherwise. Applications are found in many areas, for example network construction [10], reconstruction from incomplete data [11, 12] and pattern detection [13] among others. However, as maximum entropy distributions are guaranteed to be unique, they cannot account for the variability needed to describe evolving systems.

On the other hand, explicit network randomisation by considering a stochastic process that modifies the network configuration, i.e. which pairs of nodes are connected and which are not, in steps naturally accounts for evolution much more explicitly. For example, consider a process that, at every step, randomly chooses a connection in the network and places it between some random disconnected pair of nodes. From any initial network, the randomisation step can be successively applied, defining a particular trajectory of a time-dependent network. Network configurations obtained from the same trajectory can then be interpreted as states that a dynamic network takes as it evolves. On the other hand, network configurations from different trajectories after a fixed number of steps correspond to samples of a randomised network where the level of randomisation can be tuned by the number of steps. In particular, it is known that distributions estimated from randomisation processes at a large number of steps, i.e. when the distribution becomes stationary, match the maximally random results from maximum entropy in some cases. Explicit randomisation has provided significant insight on the structural properties of real-world networks. Some examples include generating small-world networks [14], the power-law degree distributions of preferential attachment mechanisms [15] and the statistical analysis of social networks [16, 17, 18, 19].

While explicit randomisation has the advantage that it can account for the fundamentally dynamic (due to their interactions) nature of complex systems, the construction of randomisation steps does not count on foundations as rigorous and general as information theory. Additionally, the need to carry out large numbers of realisations of the process to obtain enough samples to measure statistical properties can quickly become a problem, for example in large networks. This is a problem that is easily avoided when the distribution of these samples is available, as is the direct result of maximum entropy-based methods. On the other hand, the dynamic aspect is not covered by most information theoretic applications to complex networks. This calls for an integrated information-theoretic method that both contemplates dynamic distributions of networks evolving by an underlying randomisation process and leads to a maximum entropy distribution in the stationary regime.

Maximum caliber is the main tool of information theory to consider non-stationary processes [20, 21, 22, 23, 24]. Its main foundation is maximising Shannon entropy given certain constraints, giving the same interpretation of a distribution that is maximally unbiased with respect to properties that are not specified. While it is still guaranteed to produce a unique distribution, it contemplates evolution by studying probabilities of full trajectories in a dynamic process as opposed to individual states. Constraints then represent properties of the trajectories averaged over the distribution, for example the average number of connections in the whole evolution. As such, it is a strong contender for an information-theoretic method that captures the dynamic aspect of complex networks. However, literature on the application of maximum caliber to dynamic networks is not easy to find. Entropic dynamics [25, 26, 27] might be considered as an exception, having been used in the study of dynamic networks from an information theoretic perspective. Nevertheless, its version of entropy is presented ad hoc and is therefore somewhat disconnected from both the stationary results of maximum entropy and the dynamic point of view of maximum caliber.

The rest of the paper is structured as follows. In the next section, we introduce maximum caliber in the context of networks and establish how to obtain dynamic network configuration distributions for generic constraints. In appendix A we connect the formulation to entropic dynamics and show that it can be stated as a principle of maximum information entropy production, analogous to the thermodynamic theory for non-stationary processes [28, 29]. In the two sections that follow we consider specific constraint choices and randomisation steps, comparing the results of maximum caliber to distributions estimated from stochastic simulations to determine whether the information-theoretic method captures the explicit randomisation process.

II Maximum caliber networks

A network is composed of a set of nodes, representing the components of a system, and a set of links, representing their pairwise interactions. In some cases, interactions are directed from a source to a target, so links are associated with pairs in the set of all ordered pairs of nodes. These are known as directed networks, and examples include forces by particles on others and messages from people to their neighbours. In other cases, interactions are undirected, associating links to pairs in the set of unordered pairs of nodes. These are called undirected networks and are the cases of the potential energy of pairs of particles or telephone lines between pairs of houses.

The adjacency matrix of a network is a matrix $W$ where each value $w_{ij}$ describes the link of the pair $ij$ . Note that if the network is undirected then $w_{ij}=w_{ji}~{}\forall~{}ij$ as both $w_{ij}$ and $w_{ji}$ describe the same link, meaning that the adjacency matrix is symmetric. For the results presented in sections III and IV we consider undirected and directed networks respectively, but in both we will assume links are only either present or absent, represented by $w_{ij}=1$ and $w_{ij}=0$ respectively. These networks are therefore known as binary. The choice underscores that we will be describing the network structure, representing for example the presence of a one or two-way communication channel, and not properties of the links such as the capacity of the channel. However, networks have found applications in describing both this binary structure and weighted connections [30] (e.g. when $w_{ij}\in\mathbf{R}_{\geq 0}$ or $w_{ij}\in\mathbf{Z}_{\geq 0}$ ), and the framework of maximum caliber does not require specifying whether links are binary or not, suggesting that future work could make use of the methods presented here with weighted links.

An evolving network can be described by a dynamic adjacency matrix $W(t)$ , that is, one with time-dependent links $w_{ij}(t)$ . A sequence of $T$ steps in the network trajectory is then denoted by $W_{T}=(W(0),W(1),W(2),\,...\,,W(t),\,...\,,W(T))$ , and we aim to calculate the probability of these trajectories $P(W_{T})$ using maximum caliber. To do so, two steps are needed. The first is to specify constraints establishing desired average properties, which in our case represent average values over the distribution of network evolutions,

\sum_{W_{T}}F_{n}(W_{T})P(W_{T})=f_{n}\,.

(1)

For example, the average number of links over the whole evolution is established by choosing a constraint function $F_{1}(W_{T})=\sum_{0\leq t\leq T}\sum_{ij}w_{ij}(t)$ and a constraint value $f_{1}$ which represents the numerical value of the average. The second step is to find, out of all trajectory distributions which have these average values, the distribution that maximises the functional

S[P]=-\sum_{W_{T}}P(W_{T})\ln(P(W_{T}))\,.

(2)

As discussed, the constraints in eq. 1 enforce certain properties of the distribution, while maximisation of eq. 2 ensures that it is maximally unbiased with respect to properties that are not imposed. The distribution with these properties can be found analytically by introducing Lagrange multipliers $\lambda_{n}$ for each of the constraints and maximising the Lagrangian

\mathcal{L}[P]=S[P]+\sum_{n}\lambda_{n}\left(f_{n}-\sum_{W_{T}}F_{n}(W_{T})P(W% _{T})\right)\,.

(3)

The introduction of the Lagrange multipliers allows one to ignore any dependence between the values of $P(W_{T})$ for different $W_{T}$ in the maximisation. The distribution that achieves the supremum can then be obtained by deriving the Lagrangian with respect to a generic $P(W_{T})$ , equaling it to zero, and solving for $P(W_{T})$ as in standard calculus.

	$\displaystyle\frac{\partial\mathcal{L}[P]}{\partial P(W_{T})}$	$\displaystyle=-1-\ln(P(W_{T}))-\sum_{n}\lambda_{n}F_{n}(W_{T})=0$		(4)
	$\displaystyle\Rightarrow P(W_{T})$	$\displaystyle=\exp\left(-1-\sum_{n}\lambda_{n}F_{n}(W_{T})\right)$		(4)

While the result depends on the Lagrange multipliers $\lambda_{n}$ , it can then be inserted into eq. 1 to solve for the dependence of the multipliers on each of the constraint values $f_{m}$ , namely

\sum_{W_{T}}F_{m}(W_{T})\exp\left(-1-\sum_{n}\lambda_{n}F_{n}(W_{T})\right)=f_% {m}\,.

(5)

As there is one multiplier for each constraint, eq. 5 constitutes a system of as many equations as unknowns. However, the non-linear character of this equation often calls for numerical or graphical methods to solve.

II.1 Transition probabilities

At this point, maximum caliber applications usually introduce a particular constraint on the normalisation of the probability distribution, $\sum_{W_{T}}P(W_{T})=1$ (note that this is obtained with $F_{0}(W_{T})=1$ ). Instead, consider the history distribution $P(W_{T-1})$ of the first $T-1$ steps of the network trajectory distribution $W_{T}$ . It is required that marginalisation of $P(W_{T})$ over the last state $W(T)$ is such that

P(W_{T-1})=\sum_{W(T)}P(W_{T})\,.

(6)

For a generic trajectory of $T-1$ steps $W_{T-1}^{\prime}=(W(0)^{\prime},W(1)^{\prime},W(2)^{\prime},...,W(T-1)^{\prime})$ , i.e. not necessarily the first $T-1$ steps of $W_{T}$ , eq. 6 can be written as a sum over all possible trajectories $W_{T}$ by introducing $\delta_{W_{T-1}^{\prime},W_{T-1}}=1$ if and only if $W(t)^{\prime}=W(t)~{}~{}\forall~{}0\leq t\leq T-1$ and $0$ otherwise,

P(W_{T-1}^{\prime})=\sum_{W_{T}}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W_{T})\,.

(7)

This means that the sum over all trajectories $W_{T}$ only counts those where their history $W_{T-1}$ matches a specific trajectory of $T-1$ steps denoted by $W_{T-1}^{\prime}$ . This way, marginalisation takes the form of the constraints in eq. 1, and there is one such constraint for each possible $T-1$ step trajectory $W_{T-1}^{\prime}$ . While it may seem that the history distribution $P(W_{T-1}^{\prime})$ should vary with the $T$ step trajectory distribution on maximisation, it is in fact fixed and arbitrary. Consider two randomisation experiments, one carried out for $T-1$ steps and one for $T$ steps. As long as both are the same until $T-1$ , the distribution measured at the $T-1$ -th step of the $T$ step experiment must be the same as measured at the end of the $T-1$ step experiment, making the history fixed. This is true regardless of which specific process is applied up to $T-1$ , allowing an arbitrary distribution. Additionally, normalisation is no longer required as, if the history distribution $P(W_{T-1}^{\prime})$ is normalised, then so is the $T$ step trajectory distribution as can be seen by summing over $W_{T-1}^{\prime}$ in eq. 7. Introducing marginalisation constraint functions explicitly into eq. 4 along with corresponding multipliers $\lambda_{W_{T-1}^{\prime}}$ , but leaving room for constraints that are still unspecified, the distribution becomes

$\displaystyle P(W_{T})=$	$\displaystyle\exp\left(-1-\sum_{W_{T-1}^{\prime}}\lambda_{W_{T-1}^{\prime}}% \delta_{W_{T-1}^{\prime},W_{T-1}}\right)$	(8)
	$\displaystyle\times\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_{T})\right)$
$\displaystyle=$	$\displaystyle\exp\left(-1-\lambda_{W_{T-1}}-\sum_{n}\lambda_{n}F_{n}(W_{T})% \right)\,.$

In order to solve for $\lambda_{W_{T-1}}$ we must combine eq. 8 with eq. 7 as in eq. 5.

$\displaystyle P(W_{T-1}^{\prime})$	$\displaystyle=\sum_{W_{T}}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W_{T})$	(9)
	$\displaystyle=\sum_{W_{T-1}}\sum_{W(T)}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W_{T% -1},W(T))$
	$\displaystyle=\sum_{W}\sum_{W_{T-1}}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W_{T-1}% ,W)$
	$\displaystyle=\sum_{W}P(W_{T-1}^{\prime},W)$
$\displaystyle=\exp\bigg{(}-$	$\displaystyle 1-\lambda_{W_{T-1}^{\prime}}\bigg{)}\sum_{W}\exp\left(-\sum_{n}% \lambda_{n}F_{n}(W_{T-1}^{\prime},W)\right)$

In the second line we have separated the sum over all possible sequences $W_{T}$ into a sum over their histories $W_{T-1}$ and another over their final states $W(T)$ . Additionally, the notation of the distribution and constraint functions is modified to make this dependence explicit, namely $P(W_{T})=P(W_{T-1},W(T))$ and $F_{n}(W_{T})=F_{n}(W_{T-1},W(T))$ . In the third line, we use the fact that summing over all possible trajectories $W_{T}$ implies that any network configuration is possible at any step of the sequence. In particular, the final state of the sequence $W(T)$ must cover all possible adjacency matrices $W$ , which in turn means we can also sum first over histories $W_{T-1}$ and then over $W$ .

As the final result of eq. 9 is valid for any $W_{T-1}^{\prime}$ then in particular for the history $W_{T-1}$ in eq. 8

	$\displaystyle\exp$	$\displaystyle\left(-1-\lambda_{W_{T-1}}\right)$		(10)
		$\displaystyle=\frac{P(W_{T-1})}{\sum_{W}\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_% {T-1},W)\right)}\,.$		(10)

When introduced into eq. 8, this yields

P(W_{T})=\frac{\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_{T-1},W(T))\right)}{\sum_% {W}\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_{T-1},W)\right)}P(W_{T-1})\,.

(11)

With eq. 11 we see that the conditional distribution of network configurations at the final step given the history results from a transition probability

	$\displaystyle M_{T}:$	$\displaystyle=\frac{P(W_{T})}{P(W_{T-1})}=P(W_{T}\|W_{T-1})=P(W(T)\|W_{T-1})$		(12)
		$\displaystyle=\frac{\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_{T-1},W(T))\right)}{% \sum_{W}\exp\left(-\sum_{n}\lambda_{n}F_{n}(W_{T-1},W)\right)}\,.$		(12)

which depends only on the chosen constraints beyond marginalisation. The network distribution at the last step can then be obtained from said transitions and the history distribution,

P(W(T))=\sum_{W_{T-1}}M_{T}P(W_{T-1})\,.

(13)

As the history distribution $P(W_{T-1})$ is the arbitrary history imposed in eq. 7, we can assume that it too is the maximum caliber trajectory distribution of $T-1$ step trajectories. Repeating the same process applied to $P(W_{T})$ to obtain $P(W_{T-1})$ we see that the problem can be recursively reduced to the initial distribution $P(W_{0})=P(W(0))$ . Additionally, as eq. 13 is valid for any $T$ , we can consider $P(W(T))$ as a dynamic distribution on a network ensemble from which samples of randomised networks can be drawn at different times $T$ , with the dynamics of the distribution determined by the choice of constraints beyond marginalisation. However, before showing that this distribution matches distributions estimated from stochastic simulations if constraints are chosen accordingly, we produce two useful results obtained by requiring the constraint functions to obey certain additional properties and explain how the comparison between simulations and analytical results is carried out.

II.2 Markov processes

While eq. 12 presents, in general, a non-Markovian evolution of an ensemble distribution, the only dependence in the full history $W_{T-1}$ is through the constraint functions $F_{n}(W_{T})$ . This means that if all $F_{n}(W_{T})$ depend on $W_{T}$ only through $W(T)$ and $W(T-1)$ , then

$\displaystyle P(W(T))$	$\displaystyle=\sum_{W_{T-1}}M_{T}P(W_{T-1})$	(14)
	$\displaystyle=\sum_{W(T-1)}M_{T}\sum_{W_{T-2}}P(W_{T-1})$
	$\displaystyle=\sum_{W(T-1)}M_{T}P(W(T-1))\,,$

meaning that the dynamics becomes Markovian. Note that if constraints depend on the full history $W_{T-1}$ then the sum over $W_{T-2}$ in the second line cannot factor out $M_{T}$ . In previous literature it has been established that Markov processes emerge when constraints specify the state of the system at individual instants in time [21, 31] (essentially each $F_{n}(W_{T})$ depends on a single $W(t)$ in $W_{T}$ ), so this represents an extension of that condition. This is discussed in greater detail in appendix B, where it is shown how some constraints placed on the entire sequence of states (as is common in maximum caliber) can hide constraints that specify the state of the system at each instant.

II.3 Independent links

Consider now the possibility that the constraint functions in the transition matrix can be expressed as a linear combination of functions, each depending on the sequence of states composing the trajectory of a particular link in the network $w_{ij}^{T}:=(w_{ij}(0),w_{ij}(1),w_{ij}(2),...,w_{ij}(T))$ ,

\sum_{n}\lambda_{n}F_{n}(W_{T})=\sum_{ij}\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}(w_% {ij}^{T})\,.

(15)

In this case, eq. 12 becomes

$\displaystyle M_{T}$	$\displaystyle=\frac{\exp\left(-\sum_{ij}\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}(w_{% ij}^{T-1},w_{ij}(T))\right)}{\sum_{W}\exp\left(-\sum_{ij}\sum_{m}\lambda_{ij}^% {m}G^{m}_{ij}(w_{ij}^{T-1},w_{ij})\right)}$	(16)
	$\displaystyle=\frac{\prod_{ij}\exp\left(-\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}(w_% {ij}^{T-1},w_{ij}(T))\right)}{\sum_{kl}\sum_{w_{kl}}\prod_{ij}\exp\left(-\sum_% {m}\lambda_{ij}^{m}G^{m}_{ij}(w_{ij}^{T-1},w_{ij})\right)}$
	$\displaystyle=\frac{\prod_{ij}\exp\left(-\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}(w_% {ij}^{T-1},w_{ij}(T))\right)}{\prod_{ij}\sum_{w_{ij}}\exp\left(-\sum_{m}% \lambda_{ij}^{m}G^{m}_{ij}(w_{ij}^{T-1},w_{ij})\right)}\,.$

The denominator of the second line of eq. 16 is obtained by decomposing the sum over all networks $W$ into sums over the states $w_{kl}$ of each link $kl$ . In the third the denominator results from by carrying out the sum over states of each link $w_{ij}$ after factoring out the products that correspond to all other links $kl\neq ij$ , just as would be done to show that a distribution of independent events $x_{i}$ is normalised $\sum_{i,x_{i}}\prod_{j}P(x_{j})=\prod_{j}\sum_{x_{j}}P(x_{j})=\prod_{j}1=1$ . Note that in both the sum over links in eq. 15 and the product over them in eq. 16, whether these correspond to unordered or ordered pairs of nodes for undirected and directed networks respectively must be taken into account.

In eq. 16 the whole transition matrix can be written as a product of transition probabilities $P_{ij}(w_{ij}(T)|w_{ij}^{T-1})$ corresponding to the trajectory of each link,

	$\displaystyle P_{ij}(w_{ij}(T)\|w_{ij}^{T-1}):$	$\displaystyle=\frac{\exp\left(-\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}(w_{ij}^{T-1}% ,w_{ij}(T))\right)}{\sum_{w_{ij}}\exp\left(-\sum_{m}\lambda_{ij}^{m}G^{m}_{ij}% (w_{ij}^{T-1},w_{ij})\right)}$		(17)
	$\displaystyle M_{T}$	$\displaystyle=\prod_{ij}P_{ij}(w_{ij}(T)\|w_{ij}^{T-1})$		(17)

If the transition matrix takes the form of eq. 17 and the distribution of network trajectories of length $T-1$ is a product of independent link trajectory distributions $P_{ij}(w_{ij}^{T-1})$ , that is $P(W_{T-1})=\prod_{ij}P_{ij}(w_{ij}^{T-1})$ then so will the distribution of trajectories of $T$ steps $P(W_{T})=\prod_{ij}P_{ij}(w_{ij}(T)|w_{ij}^{T-1})P_{ij}(w_{ij}^{T-1})=\prod_{% ij}P_{ij}(w_{ij}^{T})$ . Additionally, whenever the network trajectory distribution is a product of independent link trajectory distributions, then the network distribution $P(W(T))$ is a product of individual link probabilities $P_{ij}(w_{ij}(T))$ since, following the same logic as the second and third lines of eq. 16,

$\displaystyle P(W(T))$	$\displaystyle=\sum_{W_{T-1}}P(W_{T})$	(18)
	$\displaystyle=\sum_{kl}\sum_{w_{kl}^{T-1}}\prod_{ij}P_{ij}(w_{ij}^{T})$
	$\displaystyle=\prod_{ij}\sum_{w_{ij}^{T-1}}P_{ij}(w_{ij}^{T})$
	$\displaystyle=\prod_{ij}P_{ij}(w_{ij}(T))\,.$

Due to the recursive nature of the results on link independence, the ability to factorise network distributions into independent link probabilities can be traced back to the choice of initial network distribution $P(W(0))$ .

II.4 Comparing stochastic simulations and maximum caliber

To assess whether maximum caliber can capture the dynamic distribution of an ensemble of networks undergoing a stochastic process, we consider an ensemble of $R$ networks $\{W^{1}(0),W^{2}(0),...,W^{r}(0),...,W^{R}(0)\}$ drawn from a distribution $P(W(0))$ . A network is sampled from a binary network distribution with independent links $P(W(0))=\prod_{ij}P(w_{ij}(0))$ , as is assumed to be the case from here on, by starting with a fully disconnected network with the same nodes as the network distribution. A random number $x_{ij}$ uniformly distributed between $0$ and $1$ is then drawn for each different pair $ij$ , accounting for whether the network is directed or not. The sample is then constructed by connecting $ij$ if $x_{ij}<P(w_{ij}(0)=1)$ . The distribution $P(W(0))$ will be used as the initial condition for maximum caliber while networks in the ensemble are initial conditions for different realisations of the randomisation process. This ensures that, at least initially, the distribution of maximum caliber represents the explicitly randomised ensemble. Note that if the initial distribution is of the type $P(W(0))=\delta_{W(0),V}=\prod_{ij}\delta_{w_{ij}(0),v_{ij}}$ , where $v_{ij}$ is the entry $ij$ in the adjacency matrix of a network $V$ , then all samples drawn from the distribution are $V$ and therefore all randomisation trajectories start from the same network.

Given the ensemble and the distribution it is drawn from, we choose a set of constraints for maximum caliber and a randomisation step for stochastic simulations. The constraints of maximum caliber allow to calculate the transitions of the initial probability distribution $P(W(0))$ to $P(W(1))$ while the randomisation step is applied to each network $W^{r}(0)$ drawn from the initial distribution of maximum caliber obtaining once-randomised samples $W^{r}(1)$ .

Once samples have been obtained by the first step of explicit randomisation, we can estimate their distribution and compare it to the one updated by maximum caliber to find out whether they are the same. Both for the estimation and comparison of binary networks distributions of independent links it is useful to define a probability matrix $P$ with values $p_{ij}:=P_{ij}(w_{ij}=1)$ that describe the probability that each pair $ij$ of a network distribution is connected. The probability matrix is enough to fully capture distributions of the networks considered as connected pairs of nodes have probability $P_{ij}(w_{ij}=1)=p_{ij}$ by definition and disconnected ones have probability $P_{ij}(w_{ij}=0)=1-p_{ij}$ because of normalisation. Thus they facilitate the estimation of sample distributions as it is easy to estimate the sample probability matrix directly. From network samples $W^{r}$ , the estimated probability that a given link $ij$ is connected is the average value of connections in that link $p_{ij}=\sum_{r}w^{r}_{ij}/R$ and therefore the estimated probability matrix is simply the average adjacency matrix of the samples $P=\sum_{r}W^{r}/R$ . The values of the probability matrix corresponding to maximum caliber result from the network distribution by definition.

After the probability matrices have been obtained by each method at the first step, the transitions of maximum caliber once again update the maximum caliber distribution, and a randomisation step is applied to each once-rewired network sample $W^{r}(1)$ , obtaining a twice rewired $W^{r}(2)$ . The estimation and calculation of the probability matrices is repeated, allowing for a new iteration. After $T$ repetitions, the methods are compared by examining if the estimated and calculated probability matrices are the same at each time up to $T$ . This concludes the description of the procedure to test whether maximum caliber distributions can capture stochastic network evolution, represented graphically in fig. 1.

Refer to caption — Figure 1: On the right side, an initial distribution is evolved by maximum caliber for $T$ steps. On the left side, networks drawn from the initial distribution used by maximum caliber are evolved according to a stochastic randomisation simulation for $T$ steps. At each step, maximum caliber produces a probability matrix and the network simulations estimate one, which are the same if the processes agree.

Throughout the cases considered in the following sections, we consider eight initial conditions, each one defining an initial distribution for maximum caliber and therefore to draw samples from for realisations of explicit randomisation. The first four of these are what we refer to as ensemble-like initial conditions, shown as heatmaps of their probability matrices in the first column from left to right of fig. 2, and consist of

ER -

$10$ node Erdös-Rényi: the maximum entropy network ensemble distribution resulting from constraining the total amount of connections, in this case $10$ , in a binary undirected network of $10$ nodes. The connection probability of any unordered pair is the same value $2/9$ .
RG -

$25$ node regular grid: neighbouring nodes on a two-dimensional $5\times 5$ grid are connected with probability $1$ and otherwise are connected with probability $0$ .
BM -

$40$ node block model: two blocks of $10$ and $30$ nodes with uniform probabilities of $0.8$ and $0.3$ between pairs of nodes in each block respectively and a connection probability of $0$ for pairs of nodes belonging to different blocks.
CM -

$100$ node binary undirected configuration model: the maximum entropy network ensemble distribution resulting from constraining the degree sequence of a network, the number of connections of each node. The resulting probability of any particular node pair is $p_{ij}=1/(1+\exp(\lambda_{i}+\lambda_{j}))$ . In this case, the values of $\lambda$ were drawn independently from a uniform distribution between $0$ and $3$ instead of resulting from a particular degree sequence for simplicity.

The other four initial conditions are determined by drawing a network $V$ from each of the aforementioned network distributions and defining the distributions $\delta_{W(0),V}$ for each one. We refer to these as sample-like initial conditions, and note that the probability matrices of these distributions, shown in the second column of fig. 2, are equal to the adjacency matrices of the networks $V$ that give rise to them. The graph of each is shown in the third column of the same figure. For each initial condition, the number of nodes $N$ in the network determines the total of steps in the evolution, which is $20N$ in all cases, and the number of realisations to estimate explicitly randomised distributions, $10N$ except for the configuration model ensemble and sample like initial condition, for which $N$ samples are used.

In the next two sections, we consider particular cases of randomisation steps and maximum caliber constrained dynamics. In section III these are Watts-Strogatz rewiring [14] and the conservation of the number of links, along with some variations. In section IV they are degree-preserving rewiring [16, 17, 18, 19] and the conservation of the degree sequence. From the initial conditions described, we compare the distribution of each method over time, showing that maximum caliber dynamics captures the evolving distribution of explicitly randomised network ensembles.

III Maximum entropy Watts-Strogatz rewiring

A single Watts-Strogatz rewiring step consists of choosing, uniformly and at random, one among $L$ connections of a binary undirected network and, with a replacement probability $p$ , placing it among the $N(N-1)/2-L$ disconnected pairs of nodes, also uniformly at random (with probability $1-p$ , no change is made). Note that, on average, a given value of $p$ requires $1/p$ times the number of steps to achieve the same randomisation. Similarly, directed networks where the link $ij$ is considered different from $ji$ require randomising twice the number of connections. We will therefore focus on the case where the replacement probability is $p=1$ and undirected links.

By construction, Watts-Strogatz rewiring conserves the number of connections $L$ in a network throughout its application. As constraints capture characteristic traits of the network trajectory evolution and due to the recursive nature of maximum caliber, we impose the conservation of the average number of connections in the ensemble with respect to the previous step,

\sum_{W_{T}}\sum_{i,j>i}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})=0\,.

(19)

Note that, as the network is undirected, the sum over unordered pairs $ij$ is carried out by summing only the upper triangular adjacency matrix $i,j>i$ .

Additionally, another constraint is needed. Watts-Strogatz rewiring as described defines a single step by making exactly two changes in the configuration of connections in the network, one pair of nodes being disconnected and another connected. The number of such changes can be measured by counting the pairs of nodes that change states, regardless of whether they change from $w_{ij}(T-1)=0$ to $w_{ij}(T)=1$ or $w_{ij}(T-1)=1$ to $w_{ij}(T)=0$ . For this the constraint is

\sum_{W_{T}}\sum_{i,j>i}|w_{ij}(T)-w_{ij}(T-1)|P(W_{T})=2\,.

(20)

Note that the process defined by these constraints is Markov as both depend only on the two last states $W(T-1)$ and $W(T)$ in the network trajectory. Also, introducing multipliers $\alpha$ and $\beta$ for eq. 19 and eq. 20 respectively, we have

		$\displaystyle\sum_{n}\lambda_{n}F_{n}(W_{T})$		(21)
		$\displaystyle~{}~{}=\sum_{i,j>i}\alpha\left(w_{ij}(T)-w_{ij}(T-1)\right)+\beta% \|w_{ij}(T)-w_{ij}(T-1)\|$		(21)

meaning that eq. 15 is valid with $\lambda^{0}_{ij}=\alpha$ , $\lambda^{1}_{ij}=\beta$ , $G^{0}_{ij}(w^{T}_{ij})=w_{ij}(T)-w_{ij}(T-1)$ and $G^{1}_{ij}(w^{T}_{ij})=|w_{ij}(T)-w_{ij}(T-1)|$ . By the results from section II the network transition matrix $M_{T}$ is a product of independent Markovian link transitions,

		$\displaystyle P_{ij}(w_{ij}(T)\|w_{ij}^{T-1})=P_{ij}(w_{ij}(T)\|w_{ij}(T-1))$		(22)
		$\displaystyle=\frac{\exp\left(-\alpha\left(w_{ij}(T)-w_{ij}(T-1)\right)-\beta\|% w_{ij}(T)-w_{ij}(T-1)\|\right)}{\sum_{w_{ij}}\exp\left(-\alpha\left(w_{ij}-w_{% ij}(T-1)\right)-\beta\|w_{ij}-w_{ij}(T-1)\|\right)}$		(22)

As the network links are binary, the link transitions define the annihilation probabilities

$\displaystyle a_{ij}:$	$\displaystyle=P_{ij}(w_{ij}(T)=0\|w_{ij}(T-1)=1)$	(23)
	$\displaystyle=\frac{\exp\left(\alpha-\beta\right)}{\sum_{w_{ij}=0}^{1}\exp% \left(-\alpha\left(w_{ij}-1\right)-\beta\|w_{ij}-1\|\right)}$
	$\displaystyle=\frac{1}{1+\exp\left(-\alpha+\beta\right)}$

and creation probabilities

$\displaystyle c_{ij}:$	$\displaystyle=P_{ij}(w_{ij}(T)=1\|w_{ij}(T-1)=0)$	(24)
	$\displaystyle=\frac{\exp\left(-\alpha-\beta\right)}{\sum_{w_{ij}=0}^{1}\exp% \left(-\alpha w_{ij}-\beta\|w_{ij}\|\right)}$
	$\displaystyle=\frac{1}{1+\exp\left(\alpha+\beta\right)}\,.$

The link transitions can define link-specific transition matrices

$\displaystyle m_{ij}:$	$\displaystyle=P_{ij}(w_{ij}(T)\|w_{ij}(T-1))$	(25)
	$\displaystyle=\begin{pmatrix}1-c_{ij}&a_{ij}\\ c_{ij}&1-a_{ij}\end{pmatrix}$
	$\displaystyle=\begin{pmatrix}\frac{1}{1+\exp(-\alpha-\beta))}&\frac{1}{1+\exp(% -\alpha+\beta)}\\ \frac{1}{1+\exp(\alpha+\beta)}&\frac{1}{1+\exp(\alpha-\beta))}\end{pmatrix}$

where the entries with values $1-c_{ij}$ and $1-a_{ij}$ result from marginalisation. As the multipliers are independent of each specific link $ij$ , we find that the link transition matrix is the same for each link in the network, that is $c_{ij}=c=1/(1+\exp(\alpha+\beta))$ and $a_{ij}=a=1/(1+\exp(-\alpha+\beta))$ for all $ij$ . The values of these probabilities can be found analytically by imposing constraints eqs. 19 and 20,

$\displaystyle 0$	$\displaystyle=\sum_{W_{T}}\sum_{i,j>i}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})$	(26)
	$\displaystyle=\sum_{W_{T}}\sum_{i,j>i}\left(w_{ij}(T)-w_{ij}(T-1)\right)\prod_% {i,j>i}P_{ij}(w_{ij}^{T})$
	$\displaystyle=\sum_{i,j>i}\sum_{\begin{subarray}{c}w_{ij}(T)\\ w_{ij}(T-1)\end{subarray}}\left(w_{ij}(T)-w_{ij}(T-1)\right)m_{ij}P_{ij}(w_{ij% }(T-1))$
	$\displaystyle=\sum_{i,j>i}c(1-p_{ij}(T-1))-ap_{ij}(T-1)$
	$\displaystyle=(N(N-1)/2-L)c-aL$

and

$\displaystyle 2$	$\displaystyle=\sum_{W_{T}}\sum_{i,j>i}\|w_{ij}(T)-w_{ij}(T-1)\|P(W_{T})$	(27)
	$\displaystyle=\sum_{W_{T}}\sum_{i,j>i}\|w_{ij}(T)-w_{ij}(T-1)\|\prod_{i,j>i}P_{% ij}(w_{ij}^{T})$
	$\displaystyle=\sum_{i,j>i}\sum_{\begin{subarray}{c}w_{ij}(T)\\ w_{ij}(T-1)\end{subarray}}\|w_{ij}(T)-w_{ij}(T-1)\|m_{ij}P_{ij}(w_{ij}(T-1))$
	$\displaystyle=\sum_{i,j>i}c(1-p_{ij}(T-1))+ap_{ij}(T-1)$
	$\displaystyle=(N(N-1)/2-L)c+aL\,.$

In both cases, the second line is obtained by expanding the network trajectory distribution into a product of independent link trajectory distributions. The third results from summing over the trajectories of all links in the product that are not multiplied by the corresponding $G_{ij}^{m}(w_{ij})$ , all of which are normalised. The last line explicitly introduces the value $L=\sum_{i,j>i}p_{ij}(T-1)$ of the average number of links of the network distribution $P(W(T-1))$ . The creation and annihilation probabilities thus result in

c=\frac{1}{N(N-1)/2-L}~{}~{}~{}~{}~{}~{}~{}~{}a=\frac{1}{L}

(28)

Having described the explicit randomisation process and obtained the transition probabilities according to maximum caliber, we will now compare the evolution of the resulting network distributions from the different initial conditions represented in fig. 2. For this, we choose a set of links $ij$ for each initial condition and show the evolution of their connection probabilities according to the value of their respective entries in the probability matrix of explicit randomisation and maximum caliber. In fig. 3 we show the evolution of connection probabilities starting from sample-like initial conditions while fig. 4 are from ensemble-like initial conditions. Values obtained from explicit randomisation are shown in circular markers at regular intervals of $15$ and $10$ time steps for sample and ensemble-like initial conditions respectively, while full lines represent maximum caliber.

From sample-like initial conditions as those shown in fig. 3, connection probability values all start at $1$ or $0$ , expected as the initial probability matrix matches the adjacency matrix of each sample. We therefore show the evolution of two links, an initially connected one starting with a connection probability of $1$ and an initially disconnected one starting with a connection probability of $0$ . We have verified that other links present the same evolution (for each initial condition), as expected by the creation and annihilation probabilities not depending on specific $ij$ pairs. Additionally, the evolution of all entries converges to the same probability at long times (for each network), matching the equilibrium distribution of Watts-Strogatz rewiring. This corresponds to the Erdös-Rényi distribution, the maximum entropy distribution constrained by the number of connections in the network.

Evolution from ensemble-like initial conditions are presented in fig. 4. In Erdös-Rényi initial conditions (ER) all links have the same initial probability of being connected, and therefore the evolution of a single entry in the probability matrix is representative of the whole network ensemble. Additionally, because the Erdös-Rényi random graph is the equilibrium state of the rewiring process and the maximum entropy distribution, the network distribution is unchanged by the rewiring process. For the regular grid (RG), neighbouring nodes have a probability of $1$ of being connected while all others have probability $0$ , so the observed evolution is identical to the one in fig. 3. The block model (BM) takes three possible connection probabilities corresponding to connections within the small block, large block, or between them, resulting in three different evolutions from the three possible initial probability values. Finally, the configuration model (CM) practically takes a continuum of initial probability values, so we have chosen $7$ links with approximately evenly spaced initial probabilities to compare their behaviour over time.

In both the case of sample-like and ensemble-like initial conditions, the evolution according to maximum caliber matches that of explicit randomisation with high accuracy. The constraints that define the analytic dynamics reflect properties of the underlying randomisation process and the constraints of the equilibrium maximum entropy distribution. This indicates that the method is well suited to replace realisations of dynamical processes on networks, yielding the distribution of trajectories based on constraints and an initial condition, deriving the evolution analogously to how traditional maximum entropy methods yield the distribution of equilibrium states.

III.1 Variation of the average number of links

The conservation of the average number of links in eq. 19 can be extended to a more general case. Notice that the constraint function on the left-hand side of the equation establishes what property of the dynamics is imposed, in this case the change in the average number of links between time $T-1$ and $T$ , while the constraint value on the right-hand side sets the numerical value of said property. In the particular case presented, this value is $0$ for all $T$ , reflecting the conservation of the average number of links throughout the evolution. If, on the other hand, the value is different from $0$ , the constraint function still represents the change in the average number of links, but the value no longer indicates a conservation law. Nevertheless, no mention of the constraint values is made until the Lagrange multipliers are found by setting the functional form of transitions $M_{T}$ into the constraints. The only difference between a conservation law of a certain property and the case where the change of the same property is specified but non-zero is in this last step.

To test maximum caliber constraints beyond conservation laws, we consider two variations of the results already presented in this section. The first considers an explicit randomisation process identical to the described Watts-Strogatz rewiring with the modification that, in addition to the replacement of a connected link, every $\tau$ steps a connection is created between a randomly chosen disconnected pair of nodes. For maximum caliber, the change of the average number of connections can be described by $\Delta_{c}(T)=1$ if $T\text{ mod }\tau=0$ and $\Delta_{c}(T)=0$ otherwise, converting eq. 19 to

\sum_{W_{T}}\sum_{i,j>i}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})=\Delta_{c}(% T)\,,

(29)

and the number of changes made in the network configuration, eq. 20, becomes

\sum_{W_{T}}\sum_{i,j>i}|w_{ij}(T)-w_{ij}(T-1)|P(W_{T})=2+\Delta_{c}(T)\,.

(30)

Because the constraint functions are still the same, the network trajectory transition matrix $M_{T}$ is a product of independent link transition matrices $m_{ij}$ with annihilation and creation probabilities $a_{ij}=a$ and $c_{ij}=c$ that are the same for every pair of nodes in the network. Their values can be found by introducing the transitions into eqs. 29 and 30. Following the same steps as in eqs. 26 and 27 we have

	$\displaystyle\Delta_{c}(T)$	$\displaystyle=(N(N-1)/2-L)c-aL$		(31)
	$\displaystyle 2+\Delta_{c}(T)$	$\displaystyle=(N(N-1)/2-L)c+aL$		(31)

which results in

c=\frac{1+\Delta_{c}(T)}{N(N-1)/2-L}~{}~{}~{}~{}~{}~{}~{}~{}a=\frac{1}{L}\,.

(32)

In fig. 5 we show the evolution of the same probability matrix values presented in fig. 4 starting from ensemble-like initial conditions with varying average number of connections according to simulations and maximum caliber. The values of $\tau$ are $\tau_{\text{ER}}=6$ , $\tau_{\text{RG}}=2$ , $\tau_{\text{BM}}=2$ and $\tau_{\text{CM}}=1$ for the Erdös-Rényi, regular grid, block model, and configuration model initial conditions respectively. As in the case of conserved average number of links, probabilities obtained from maximum caliber and simulations are found to match.

The second scenario with varying average number of connections considers a sinusoidal signal $S(t):=K\sin(\omega t)$ which, when positive, indicates links are added and when negative, indicates links are removed. In explicit randomisation by simulations, the amount of added or removed connections at each step $T$ is an integer drawn from a binomial distribution of the same amount of trials as disconnected or connected links respectively and average $|S(T)|$ . For maximum caliber applications, the number of added nodes, on average over simulations is $\Delta_{c}(T)=|S(T)|$ if $S(T)>0$ and $\Delta_{c}(T)=0$ otherwise. The number of removed nodes is $\Delta_{a}(T)=|S(T)|$ if $S(T)<0$ and $\Delta_{a}(T)=0$ otherwise. Therefore the constraint on the average change in the number of connections can be written as

\sum_{W_{T}}\sum_{i,j>i}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})=\Delta_{c}(% T)-\Delta_{a}(T)\,,

(33)

and the number of changes made in the network configuration becomes

\sum_{W_{T}}\sum_{i,j>i}|w_{ij}(T)-w_{ij}(T-1)|P(W_{T})=2+\Delta_{c}(T)+\Delta% _{a}(T)\,.

(34)

Following the same steps as before, the transitions are translated into annihilation and creation probabilities that are the same for every pair of nodes in the network, with values that can be found by introducing them explicitly into eqs. 33 and 34. This results in the system of equations

	$\displaystyle\Delta_{c}(T)-\Delta_{a}(T)$	$\displaystyle=(N(N-1)/2-L)c-aL$		(35)
	$\displaystyle 2+\Delta_{c}(T)+\Delta_{a}(T)$	$\displaystyle=(N(N-1)/2-L)c+aL$		(35)

which yields

c=\frac{1+\Delta_{c}(T)}{N(N-1)/2-L}~{}~{}~{}~{}~{}~{}~{}~{}a=\frac{1+\Delta_{% a}(T)}{L}\,.

(36)

In fig. 6 we show the evolution of selected probability matrix values starting from ensemble-like initial conditions with a sinusoidal variation of the average number of connections according to simulations and maximum caliber. The values of $K$ are half of the initial average number of links of each ensemble-like initial condition and $\omega=2\pi/70$ in all cases. We show the evolution up to a maximum of $500$ steps as the long-term behaviour of all connection probabilities being the same, but oscillating in time is already reached by then. As in the first case of varying average number of links, probabilities obtained from maximum caliber and simulations are found to match.

Note that while samples drawn from an ensemble of networks with a constant average number of connections allow for a certain variation in the number of links of each sample, these are random fluctuations. By changing constraint values, on the other hand, we can control the dynamics of the ensemble average beyond conservation laws, leading to changes in the number of links of samples due to this dynamic control and fluctuations.

IV Maximum entropy degree-preserving rewiring

Consider now a rewiring process such that the degree sequence, that is the number of connections of each node, is conserved. For simplicity of both the explicit randomisation and maximum caliber, as well as the flexibility thereof, we will now consider directed networks where the link $ij$ is considered different from $ji$ . A single step of explicit randomisation consists of selecting a group of two pairs of connected nodes $ij$ and $kl$ in the network such that all nodes in the group are different and both $il$ and $kj$ are disconnected. Then the connections between $ij$ and $kl$ are removed while $il$ and $kj$ are connected.

From the perspective of maximum caliber, the conservation of the number of connections of each node requires a constraint for each node $j$ defining the change of its in degree,

\sum_{W_{T}}\sum_{i}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})=0\,,

(37)

and one constraint for each node $i$ establishing the change in its out degree

\sum_{W_{T}}\sum_{j}\left(w_{ij}(T)-w_{ij}(T-1)\right)P(W_{T})=0\,.

(38)

The number of changes in the network configuration in a single step of the randomisation is now $4$ instead of $2$ as two connections are removed and two are added. The corresponding constraint is

\sum_{W_{T}}\sum_{i,j}|w_{ij}(T)-w_{ij}(T-1)|P(W_{T})=4\,.

(39)

Just as for Watts-Strogatz rewiring, the constraints produce Markov transitions as the constraint functions depend only on the two last network configurations. Introducing Lagrange multipliers $\alpha^{\text{in}}_{j}$ for eq. 37, $\alpha^{\text{out}}_{i}$ for eq. 38 and $\beta$ for eq. 39, we have

$\displaystyle\sum_{n}\lambda_{n}$	$\displaystyle F_{n}(W_{T})$	(40)
$\displaystyle=\sum_{i,j}$	$\displaystyle(\alpha^{\text{in}}_{j}+\alpha^{\text{out}}_{i})(w_{ij}(T)-w_{ij}% (T-1))$
	$\displaystyle+\beta\|w_{ij}(T)-w_{ij}(T-1)\|$

meaning that eq. 15 is valid with $\lambda^{0}_{ij}=\alpha^{\text{out}}_{i}+\alpha^{\text{in}}_{j}$ , $\lambda^{1}_{ij}=\beta$ , $G^{0}_{ij}(w^{T}_{ij})=w_{ij}(T)-w_{ij}(T-1)$ and $G^{1}_{ij}(w^{T}_{ij})=|w_{ij}(T)-w_{ij}(T-1)|$ . By the results from section II the network transition matrix $M_{T}$ is a product of independent Markovian link transitions which, following the same steps as in section III, results in link transition matrices

$\displaystyle m_{ij}:$	$\displaystyle=P_{ij}(w_{ij}(T)\|w_{ij}(T-1))$	(41)
	$\displaystyle=\begin{pmatrix}1-c_{ij}&a_{ij}\\ c_{ij}&1-a_{ij}\end{pmatrix}$
	$\displaystyle=\begin{pmatrix}\frac{1}{1+\exp(-\alpha^{\text{out}}_{i}-\alpha^{% \text{in}}_{j}-\beta))}&\frac{1}{1+\exp(-\alpha^{\text{out}}_{i}-\alpha^{\text% {in}}_{j}+\beta)}\\ \frac{1}{1+\exp(\alpha^{\text{out}}_{i}+\alpha^{\text{in}}_{j}+\beta)}&\frac{1% }{1+\exp(\alpha^{\text{out}}_{i}+\alpha^{\text{in}}_{j}-\beta))}\end{pmatrix}$

Differently from the Watts-Strogatz rewiring process, the transition matrix of each link is different. This makes the imposed constraints’ preservation extremely difficult to solve analytically to obtain a relation between the annihilation and creation probabilities and the imposed average values. However, as in the binary configuration model, that is the equilibrium case of imposing the degree of each node, eq. 41 gives a functional form of the annihilation and creation probabilities in terms of the Lagrange multipliers which can be adjusted numerically by imposing the constraints of eqs. 37, 38 and 39

In fig. 7 we show the evolution of connection probabilities over time according to explicit randomisation simulations, in circular markers, and maximum caliber, in full lines, for $5$ pairs of nodes from sample-like initial conditions

The most notable difference between this case and Watts-Strogatz rewiring is that, as the link transition matrices are different for each pair of nodes, trajectories can start from the same point and still behave differently. There is also a larger difference (due to the number of realisations required) between the simulation and maximum caliber results. We have verified that the stationary connection probability values achieved by individual links (the asymptotic values of fig. 7) agree with the distribution expected by the directed binary configuration model, $p^{\text{eq}}_{ij}=(1+\exp(\lambda^{\text{out}}_{i}+\lambda^{\text{in}}_{j}))^% {-1}$ such that $\sum_{i}p^{\text{eq}}_{ij}=k^{\text{in}}_{j}$ is the in degree of node $j$ and $\sum_{j}p^{\text{eq}}_{ij}=k^{\text{out}}_{i}$ the out degree of node $i$ obtained from the initial network. This again shows that the distributions resulting from traditional entropy maximisation correspond to equilibrium distributions of a dynamic process defined by the analogous conservation constraints in the context of maximum caliber.

V Discussion and conclusions

In this work, we have applied the principle of maximum caliber of Jaynes to construct the evolution of random network configuration probabilities from constraints representing statistical properties of the evolution. The method is an approach to constructing dynamic processes in the same way that stationary network distributions are obtained by maximising Shannon entropy. The main difference between the method presented here and other applications of maximum caliber is that it obtains individual transition probabilities at different times in the evolution instead of probabilities of entire trajectories. The transition probabilities can then be used to obtain the evolution of a dynamic distribution from desired initial conditions.

In section II we show how to obtain such transitions from maximum caliber by replacing the requirement of probability normalisation with a marginalisation, essentially imposing the history of the network evolution at its inception. This only requires that other constraints are average values of the trajectory distribution, in principle allowing for memory-dependent processes. We then highlight specific conditions under which the constraints result in Markov processes and the transitions of the whole network can be described by those of individual links. In appendix B we extend the conditions for Markov processes, and in appendix A we also show that this formulation of maximum caliber can be interpreted as an analogous in information theory to the maximum entropy production principle in the field of thermodynamics, and can be used to strengthen the theoretical basis of the method of entropic dynamics. Next, we focus on particular choices of constraints under which conditions for Markov processes and evolution by individual links apply. In section III we start with the conservation of the average number of connections in a network, showing that the dynamic distributions that result predict the same connection probabilities as estimated from repetitions of explicit simulations the rewiring process of Watts and Strogatz. We then modify the constraints in order to represent a controlled variation of the average number of connections, obtaining the same results from simulations of modified Watts-Strogatz randomisation which produce the same average variation. This leads us to conclude that constraints control the evolution of imposed properties rather generally, with conservation resulting as a particular case. In section IV we apply the same procedure to the conservation of the number of connections of each node in a network, comparing to explicit simulations implementing the same process and once again showing that connection probabilities match.

Our results allow us to conclude that maximum caliber can serve as a useful tool to obtain the evolution of network ensembles undergoing randomisation processes without requiring explicit simulations. It establishes the evolution on the statistical basis of information theory, allowing for the flexibility given by imposing arbitrary constraints that represent properties required of the network evolution. As for future work, three unexplored topics take the spotlight. Firstly, the method as presented here is discrete in time, a limitation that needs to be overcome for the method to be applied to systems continuous in time. Second, in terms of weighted networks, it has been shown that equilibrium maximum entropy networks are better reconstructed by imposing constraints on binary and weighted properties simultaneously. Such constraints in a dynamic context might be applied to the interplay between the network structure of dynamical systems and the dynamics on that structure. Lastly, in terms of memory dependence, the fact that the method naturally incorporates non-Markov processes suggests that it is worthwhile to take a closer look at the perspective provided, especially in the context of complex systems where such effects are paramount. Addressing these challenges would allow for the method to be applied to the study of many real-world complex networks.

Acknowledgements.

We would like to thank Martin Kuffer for discussions and revisions of calculations in this work. Also to Professor Leonid Martyushev and Professor Mario Abadi for their insightful comments and critiques which accompanied us throughout the process of develo** the present analysis and writing this article.

Appendix A Connection to maximum entropy production and entropic dynamics

In section II transitions $W_{T-1}\rightarrow W_{T}$ which essentially update the configuration of the network $W(T)$ were established by maximising the Shannon entropy of the trajectory distribution

S=-\sum_{W_{T}}P(W_{T})\ln(P(W_{T}))

(42)

subject to constraints

\sum_{W_{T}}F_{n}(W_{T})P(W_{T})=f_{n}

(43)

of which one is marginalisation $\sum_{W_{T}}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W_{T})=P(W_{T-1}^{\prime})$ . Although the constraint value $P(W_{T-1}^{\prime})$ is the trajectory distribution at the previous number of steps, it is also arbitrary and for all purposes of the maximisation, fixed. One can then also maximise the entropy production

$\displaystyle\Delta S_{T}=$	$\displaystyle-\sum_{W_{T}}P(W_{T})\ln(P(W_{T}))$	(44)
	$\displaystyle+\sum_{W_{T-1}}P(W_{T-1})\ln(P(W_{T-1}))$
$\displaystyle=$	$\displaystyle-\sum_{W(T),W_{T-1}}P(W(T),W_{T-1})\ln(P(W(T),W_{T-1}))$
	$\displaystyle+\sum_{W(T),W_{T-1}}P(W(T),W_{T-1})\ln(P(W_{T-1}))$
$\displaystyle=$	$\displaystyle-\sum_{W,W_{T-1}}P(W\|W_{T-1})P(W_{T-1})\ln(P(W\|W_{T-1}))\,.$

As $P(W(T)|W_{T-1})$ holds $P(W_{T-1})$ fixed, we can maximise the entropy production with respect to the transition probability instead of the trajectory distribution, writing the constraints as

	$\displaystyle\sum_{W,W_{T-1}}P(W\|W_{T-1})F_{n}(W,W_{T-1})P(W_{T-1})$	$\displaystyle=f_{n}$		(45)
	$\displaystyle\sum_{W,W_{T-1}}\delta_{W_{T-1}^{\prime},W_{T-1}}P(W\|W_{T-1})=% \sum_{W}P(W\|W_{T-1}^{\prime})$	$\displaystyle=1\,.$		(45)

This defines a Lagrangian

		$\displaystyle\mathcal{L}=-\sum_{W,W_{T-1}}P(W\|W_{T-1})P(W_{T-1})\ln(P(W\|W_{T-1% }))$		(46)
		$\displaystyle+\sum_{W_{T-1}}\lambda_{W_{T-1}}\left(1-\sum_{W}P(W\|W_{T-1})\right)$
		$\displaystyle+\sum_{n}\lambda_{n}\left(f_{n}-\sum_{\begin{subarray}{c}W\\ W_{T-1}\end{subarray}}P(W\|W_{T-1})F_{n}(W,W_{T-1})P(W_{T-1})\right)$

maximised by finding the roots of

		$\displaystyle\frac{\partial\mathcal{L}}{\partial P(W\|W_{T-1})}=-\lambda_{W_{T-% 1}}$		(47)
		$\displaystyle-P(W_{T-1})\left[\ln(P(W\|W_{T-1})+1+\sum_{n}\lambda_{n}F_{n}(W,W_% {T-1})\right]$		(47)

in terms of $P(W|W_{T-1})$ . This yields

		$\displaystyle P(W\|W_{T-1})=$		(48)
		$\displaystyle\exp\left(-\left[\lambda_{W_{T-1}}/P(W_{T-1})+1+\sum_{n}\lambda_{% n}F_{n}(W,W_{T-1})\right]\right)\,.$		(48)

Imposing marginalisation $\sum_{W}P(W|W_{T-1})=1$ ,

	$\displaystyle\exp\left(\lambda_{W_{T-1}}/P(W_{T-1})+1\right)=$	(49)
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}\sum_{W}\exp\left(-\sum_{n}\lambda_{n}F_{% n}(W,W_{T-1})\right)$
$\displaystyle\Rightarrow$	$\displaystyle P(W\|W_{T-1})=\frac{\exp\left(-\sum_{n}\lambda_{n}F_{n}(W,W_{T-1}% )\right)}{\sum_{W^{\prime}}\exp\left(-\sum_{n}\lambda_{n}F_{n}(W^{\prime},W_{T% -1})\right)}$

As eq. 49 matches the result obtained in eq. 12, maximum caliber in combination with marginalisation constraints is equivalent to maximisation of the entropy production defined in eq. 44. Moreover, the same functional form is reached if the ”entropy production” is arbitrarily defined as

-\sum_{W,W_{T-1}}P(W|W_{T-1})\ln(P(W|W_{T-1}))

(50)

with constraints

\sum_{W,W_{T-1}}P(W|W_{T-1})F_{n}(W,W_{T-1})=g_{n}

(51)

independently of the constraint values $g_{n}$ . We also know that there must exist constraint values $g_{n}$ such that the resulting transition probabilities, and not just the functional form, are the same as derived from maximum caliber. These can be constructed by combining the functional form of the transitions, which is independent of the method used, with the Lagrange multipliers resulting from maximum caliber. This fully defines the transition probabilities on the left-hand side of eq. 51, which allows the construction of the constraint values on the right-hand side.

Note that for the case of Markov processes, we can write $P(W(T)|W_{T-1})=P(W(T)|W(T-1))$ . In particular eqs. 50 and 51 under this condition result in the formulation used in entropic dynamics. This shows that the latter method yields the correct results, and sets it on the more solid foundations of Maximum caliber.

Appendix B Maximum caliber and Markov processes

We have established in section II that when constraint functions depend only on two successive states $W(T-1)$ and $W(T)$ of the network, the resulting transitions $M_{T}$ define a Markov process. However, this condition can be somewhat loosened by considering linear combinations of constraints. For example, consider a more typical constraint of maximum caliber defining the average number of connections $C(T)$ over a trajectory

C(T)=\sum_{W_{T}}\sum_{ij}\sum_{0\leq t\leq T}w_{ij}(t)P(W_{T})\,.

(52)

For trajectories one step shorter, the same constraint is

	$\displaystyle C(T-1)$	$\displaystyle=\sum_{W_{T-1}}\sum_{ij}\sum_{0\leq t\leq T-1}w_{ij}(t)P(W_{T-1})$		(53)
		$\displaystyle=\sum_{W_{T}}\sum_{ij}\sum_{0\leq t\leq T-1}w_{ij}(t)P(W_{T})\,.$		(53)

which can be subtracted from eq. 52 to yield the average number of connections $L(T)$ at time $T$

C(T)-C(T-1)=\sum_{W_{T}}\sum_{ij}w_{ij}(T)P(W_{T})=L(T)\,.

(54)

Considering eq. 54 for $T-1$ and $T$ , these can also be subtracted, obtaining the constraints used for section III

L(T)-L(T-1)=\sum_{W_{T}}\sum_{ij}(w_{ij}(T)-w_{ij}(T-1))P(W_{T})\,.

(55)

On the other hand, consider the case where constraints over trajectories include coefficients that depend on the length of the trajectory, for example

\sum_{W_{T}}\sum_{ij}\left(\sum_{t=0}^{T}A_{T}e^{-t}w_{ij}(t)\right)P(W_{T})=C% (T)\,.

(56)

When we attempt to construct the instantaneous constraint by difference of two successive times, we find that the result again depends on the whole trajectory,

$\displaystyle C(T)-C(T-1)$	$\displaystyle=$	(57)
$\displaystyle\sum_{W_{T}}\sum_{ij}$	$\displaystyle\left[A_{T}e^{-T}w_{ij}(T)+\right.$
	$\displaystyle~{}+\sum_{t=0}^{T-1}\left.(A_{T}-A_{T-1})e^{-t}w_{ij}(t)\right]P(% W_{T})\,,$

and the same is true for higher-order differences, suggesting that the resulting process is not Markov.

References

Bertz [1981] S. H. Bertz, The first general index of molecular complexity, Journal of the American Chemical Society 103, 3599 (1981).
Abadi and Ruzzenenti [2023] N. Abadi and F. Ruzzenenti, Complex networks and interacting particle systems, Entropy 25, 1490 (2023).
Pedersen et al. [2021] T. T. Pedersen, M. Victoria, M. G. Rasmussen, and G. B. Andresen, Modeling all alternative solutions for highly renewable energy systems, Energy 234, 121294 (2021).
Merz et al. [2023] E. Merz, E. Saberski, L. J. Gilarranz, P. D. Isles, G. Sugihara, C. Berger, and F. Pomati, Disruption of ecological networks in lakes by climate change and nutrient fluctuations, Nature Climate Change 13, 389 (2023).
Newman [2003] M. E. Newman, Mixing patterns in networks, Physical review E 67, 026126 (2003).
Dadashi et al. [2010] M. Dadashi, I. Barjasteh, and M. Jalili, Rewiring dynamical networks with prescribed degree distribution for enhancing synchronizability, Chaos: An Interdisciplinary Journal of Nonlinear Science 20 (2010).
Bertotti and Modanese [2020] M. L. Bertotti and G. Modanese, Network rewiring in the r-k plane, Entropy 22, 653 (2020).
Park and Newman [2004] J. Park and M. E. Newman, Statistical mechanics of networks, Physical Review E 70, 066117 (2004).
Cimini et al. [2019] G. Cimini, T. Squartini, F. Saracco, D. Garlaschelli, A. Gabrielli, and G. Caldarelli, The statistical physics of real-world networks, Nature Reviews Physics 1, 58 (2019).
Garlaschelli and Loffredo [2008] D. Garlaschelli and M. I. Loffredo, Maximum likelihood: Extracting unbiased information from complex networks, Physical Review E 78, 015101 (2008).
Squartini et al. [2011a] T. Squartini, G. Fagiolo, and D. Garlaschelli, Randomizing world trade. i. a binary network analysis, Physical Review E 84, 046117 (2011a).
Squartini et al. [2011b] T. Squartini, G. Fagiolo, and D. Garlaschelli, Randomizing world trade. ii. a weighted network analysis, Physical Review E 84, 046118 (2011b).
Squartini and Garlaschelli [2017] T. Squartini and D. Garlaschelli, Maximum-entropy networks: Pattern detection, network reconstruction and graph combinatorics (Springer, 2017).
Watts and Strogatz [1998] D. J. Watts and S. H. Strogatz, Collective dynamics of ‘small-world’networks, nature 393, 440 (1998).
Barabási and Albert [1999] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, science 286, 509 (1999).
Katz and Powell [1957] L. Katz and J. H. Powell, Probability distributions of random variables associated with a structure of the sample space of sociometric investigations, The Annals of Mathematical Statistics 28, 442 (1957).
Holland and Leinhardt [1976] P. W. Holland and S. Leinhardt, Local structure in social networks, Sociological methodology 7, 1 (1976).
Rao et al. [1996] A. R. Rao, R. Jana, and S. Bandyopadhyay, A markov chain monte carlo method for generating random (0, 1)-matrices with given marginals, Sankhyā: The Indian Journal of Statistics, Series A , 225 (1996).
Roberts Jr [2000] J. M. Roberts Jr, Simple methods for simulating sociomatrices with given marginal totals, Social Networks 22, 273 (2000).
Jaynes [1980] E. T. Jaynes, The minimum entropy production principle, Annual Review of Physical Chemistry 31, 579 (1980).
Ge et al. [2012] H. Ge, S. Pressé, K. Ghosh, and K. A. Dill, Markov processes follow from the principle of maximum caliber, The Journal of chemical physics 136 (2012).
Pressé et al. [2013] S. Pressé, K. Ghosh, J. Lee, and K. A. Dill, Principles of maximum entropy and maximum caliber in statistical physics, Reviews of Modern Physics 85, 1115 (2013).
Dixit et al. [2018] P. D. Dixit, J. Wagoner, C. Weistuch, S. Pressé, K. Ghosh, and K. A. Dill, Perspective: Maximum caliber is a general variational principle for dynamical systems, The Journal of chemical physics 148 (2018).
Ghosh et al. [2020] K. Ghosh, P. D. Dixit, L. Agozzino, and K. A. Dill, The maximum caliber variational principle for nonequilibria, Annual review of physical chemistry 71, 213 (2020).
Caticha [2011] A. Caticha, Entropic dynamics, time and quantum theory, Journal of Physics A: Mathematical and Theoretical 44, 225303 (2011).
Caticha [2015] A. Caticha, Entropic dynamics, Entropy 17, 6110 (2015).
Pessoa et al. [2021] P. Pessoa, F. X. Costa, and A. Caticha, Entropic dynamics on gibbs statistical manifolds, Entropy 23, 494 (2021).
Martyushev and Seleznev [2006] L. M. Martyushev and V. D. Seleznev, Maximum entropy production principle in physics, chemistry and biology, Physics reports 426, 1 (2006).
Martyushev [2021] L. M. Martyushev, Maximum entropy production principle: History and current status, Physics-Uspekhi 64, 558 (2021).
Farahani et al. [2019] F. V. Farahani, W. Karwowski, and N. R. Lighthall, Application of graph theory for identifying connectivity patterns in human brain networks: a systematic review, frontiers in Neuroscience 13, 585 (2019).
Davis and González [2015] S. Davis and D. González, Hamiltonian formalism and path entropy maximization, Journal of Physics A: Mathematical and Theoretical 48, 425003 (2015).

		$\displaystyle\mathcal{L}=-\sum_{W,W_{T-1}}P(W\|W_{T-1})P(W_{T-1})\ln(P(W\|W_{T-1% }))$		(46)
		$\displaystyle+\sum_{W_{T-1}}\lambda_{W_{T-1}}\left(1-\sum_{W}P(W\|W_{T-1})\right)$
		$\displaystyle+\sum_{n}\lambda_{n}\left(f_{n}-\sum_{\begin{subarray}{c}W\\ W_{T-1}\end{subarray}}P(W\|W_{T-1})F_{n}(W,W_{T-1})P(W_{T-1})\right)$