Modularity Based Community Detection in Hypergraphs

Bogumił Kamiński &Paweł Misiorek &Paweł Prałat &François Théberge Decision Analysis and Support Unit, SGH Warsaw School of Economics, Warsaw, Poland; e-mail: [email protected]Institute of Computer Sciences, Poznan University of Technology, Poznan, Poland; e-mail: [email protected]Department of Mathematics, Toronto Metropolitan University, Toronto, ON, Canada; e-mail: [email protected]Tutte Institute for Mathematics and Computing, Ottawa, ON, Canada; email: [email protected]
Abstract

In this paper, we propose a scalable community detection algorithm using hypergraph modularity function, h–Louvain. It is an adaptation of the classical Louvain algorithm in the context of hypergraphs. We observe that a direct application of the Louvain algorithm to optimize the hypergraph modularity function often fails to find meaningful communities. We propose a solution to this issue by adjusting the initial stage of the algorithm via carefully and dynamically tuned linear combination of the graph modularity function of the corresponding two-section graph and the desired hypergraph modularity function. The process is guided by Bayesian optimization of the hyper-parameters of the proposed procedure. Various experiments on synthetic as well as real-world networks are performed showing that this process yields improved results in various regimes.

1 Introduction

Many networks that are currently modelled as graphs would be more accurately modelled as hypergraphs. This includes the collaboration network [5] in which nodes correspond to researchers and hyperedges correspond to papers that consist of nodes associated with researchers that co-author a given paper. Social events may include more than two people which is not equivalent to social interactions among all pairs of people participating in the event. Hypergraphs have shown promise in modeling systems such as protein complexes and metabolic reactions [18]. Another natural examples are co-purchases hypergraphs but there are plenty of other real-world hypergraphs.

After many years of intense research using graph theory in modelling and mining complex networks [17, 23, 30, 45], hypergraphs start gaining considerable traction [4, 5, 6, 8]. Many higher-order network data is being collected in recent years (see, for example, [5]). It has became clear to both researchers and practitioners that dyadic relationships are insufficient in many real-world scenarios. Higher-order network analysis, using the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, is needed to study complex systems and to make an impact across many important applications [6, 36, 50, 40]. Indeed, the inherent expressiveness of hypergraphs has led to their applications across a diverse range of fields such as recommendation systems [52], computer vision [42], natural language processing [16], social network analysis [44], financial analysis [54], bioinformatics [18], and circuit design [22]. Standard but important questions in network science are currently being revisited in the context of hypergraphs. However, hypergraphs also create brand new questions which did not have their counterparts for graphs. For example, how hyperedges overlap in empirical hypergraphs [41]? Or how the existing patters in a hypergraph affect the formation of new hyperedges [24]?

In this paper we concentrate on the classical problem of community detection in networks that can be represented using hypergraphs [1, 7, 11, 12, 27, 28, 34, 35, 55, 56]. Community detection is a challenging, NP-hard problem even for graphs [10, 19, 47] so obtaining an optimal solution becomes computationally infeasible, even for small networks represented as graphs. Dealing with hypergraphs is clearly much more difficult so, despite the fact that currently there is a vivid discussion around hypergraphs, the theory and tools are still not sufficiently developed to tackle this problem directly within this context. Indeed, researchers and practitioners, due to lack of proper solutions for hypergraphs, often create the 2-section graph of a hypergraph of interest (that is, replace each hyperedge with a clique, a process known also as clique expansion). Given the 2-section graph representation, we can directly apply some graph clustering algorithm such as Louvain [9] and Leiden [51]. Another approach is to perform agglomerative clustering via some definition of distance between nodes, such as the derivative graph defined in Contreras-Aso et al. [15], and then select the partition that maximizes the 2-section graph modularity. However, with the 2-section graph, one clearly loses some information about hyperedges of size greater than two. In the experiments presented in Section 5, we use the Louvain algorithm on the 2-section graph representations as our basis of comparison for hypergraph-based algorithms.

As mentioned earlier, there are some recent attempts to deal with hypergraphs in the context of clustering. For example, Kumar et al. [34, 35] still reduce the problem to graphs but use original hypergraphs to iteratively adjust weights to encourage some hyperedges to be included in some cluster but discourage other ones (this process can be viewed as separating signal from noise). In Chodrow et al. [12], a hypergraph stochastic block model is defined, leading to a Louvain-type clustering algorithm, in particular, for the “all or nothing” regime (AON), where edges must have all nodes from the same community to improve the objective function. We provide more details about these two algorithms at the beginning of Section 5.

Many of the successful graph clustering algorithms use the modularity function to benchmark partitions to guide the associated optimization heuristics. Two widely used algorithms from this family are the Louvain and Leiden algorithms mentioned earlier. Based on its spectacular success, a number of extensions of the classical graph modularity function to hypergraphs are proposed [27, 28] that can potentially be used by true hypergraph algorithms. In this paper, we concentrate on this approach.

Unfortunately, there are many ways such extension of the modularity function to hypergraphs can be done, depending on how often nodes in one community share hyperedges with nodes from other communities. We believe that the underlying process that governs pureness of community hyperedge is something that varies between networks at hand and also potentially depends on the hyperedge sizes. Let us come back to the collaboration network we discussed earlier. Hyperedges associated with papers written by mathematicians might be more homogeneous and smaller in comparison with those written by medical doctors who tend to work in large and multidisciplinary teams. Moreover, in general, papers with a large number of co-authors tend to be less homogeneous, and other patterns can be identified [24]. The algorithm we propose in this paper, h–Louvain, is flexible and can use any of such hypergraph modularity function. In other words, there is no unique way of extending the concept of modularity from graphs to hypergraphs. For this reason we consider a family of such extensions parametrized by the user’s preference over homogeneity of within-community hyperedges. At the same time we recognize that there can be situations in which it might not be clear for a user what homogeneity level is desired. Therefore, in Section 5.1 we provide some suggestions to help the user to make the right choice.

A significant challenge in optimizing modularity functions is that these objective functions have their domains defined over all partitions of the set of nodes and they are known to be extremely difficult to optimize. As already mentioned, one of the most popular and efficient heuristic methods for modularity optimization for graphs is the Louvain algorithm [9]. In this paper, we show how this algorithm can be adapted to optimize hypergraph modularity. One of the main challenges is the fact that, when hyperedges of size two (edges) or three are not present in the hypergraph, then the Louvain algorithm immediately gets stuck in its local minimum. Moreover, even if there are a few hyperedges of size two or three, the algorithm may still get stuck almost immediately, and yield a solution that is heavily biased toward small edges. Hence, in such situations, one cannot simply start optimizing the hypergraph modularity right from the beginning. More importantly, we observe that even if hyperedges of size two are present in the hypergraph, the algorithm often converges to a local optimum that is of low quality. In order to address these two problems, we propose a method that works reasonably well in practice in which we optimize a weighted average of the 2-section graph modularity function and the hypergraph modularity function. For that we adjust the Louvain algorithm in such a way that the weight of the hypergraph modularity function increases during the optimization process. The pace of this weight change is governed by two hyperparameters of the procedure, which we tune using Bayesian optimization.

The paper is structured as follows. We first introduce the necessary notation; in particular, we state the definitions of graph and hypergraph modularity functions (Section 2). Synthetic as well as real-world hypergraphs that are used in our experiments are introduced in Section 3. Section 4 is devoted to explain details behind the proposed algorithm, h–Louvain. First, we discuss the classical Louvain algorithm for graphs (Subsection 4.1) and explain why it is difficult to adjust it to directly optimize hypergraph modularity (Subsection 4.2). Following this, we describe our solution that is considering a linear combination of the 2-section graph modularity and the hypergraph modularity as objective function (Subsection 4.3), and explain its implementation challenges (Subection 4.4). In particular, the main challenge is to tune the two hyperparameters responsible for the speed of convergence to the hypergraph modularity function. To find a “sweet spot” in an unsupervised way, Bayesian optimization is used (Subsection 4.5). Section 5 highlights the results of numerical experiments of using the proposed algorithm on synthetic hypergraphs (Subsections 5.2 and 5.3) as well as real-world hypergraphs (Subsection 5.4). We also highlight important implications of the choice of the modularity function to optimize (Subsection 5.1). The paper is concluded with a summary of outlooks for further research in this area (Section 6).

Finally, let us mention that this paper is an extended, journal version of the short, proceeding paper [25] that contained some preliminary experiments with a much simpler algorithm. The algorithm as well as notebooks containing all experiments included in this paper can be found on GitHub repository***https://github.com/pawelwm/h-louvain.

2 Modularity Functions

Let us start with some basic definitions. In the hypergraph H=(V,E)𝐻𝑉𝐸H=(V,E)italic_H = ( italic_V , italic_E ), each hyperedge eE𝑒𝐸e\in Eitalic_e ∈ italic_E is a multiset of V𝑉Vitalic_V of any cardinality d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N called its size. Multisets in the context of hypergraphs are natural generalization of loops in the context of graphs. Hypergraphs are natural generalization of graphs in We shouldhich edge is a multiset of size two. Even though H𝐻Hitalic_H does not always contain multisets, it is convenient to allow them as they may appear in the random hypergraph that will be used as the null model to “benchmark” the edge contribution component of the modularity function. It will be convenient to partition the hyperedge set E𝐸Eitalic_E into {E1,E2,}subscript𝐸1subscript𝐸2\{E_{1},E_{2},\ldots\}{ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … }, where Edsubscript𝐸𝑑E_{d}italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT consists of hyperedges of size d𝑑ditalic_d. As a result, hypergraph H𝐻Hitalic_H can be expressed as the disjoint union of d𝑑ditalic_d-uniform hypergraphs H=Hd𝐻subscript𝐻𝑑H=\bigcup H_{d}italic_H = ⋃ italic_H start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, where Hd=(V,Ed)subscript𝐻𝑑𝑉subscript𝐸𝑑H_{d}=(V,E_{d})italic_H start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ( italic_V , italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). As for graphs, degH(v)subscriptdegree𝐻𝑣\deg_{H}(v)roman_deg start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_v ) is the degree of node v𝑣vitalic_v, that is, the number of hyperedges v𝑣vitalic_v is a part of (taking into account the fact that hyperedges are multisets). Finally, the volume of a subset of nodes AV𝐴𝑉A\subseteq Vitalic_A ⊆ italic_V is volH(A)=vAdegH(v)subscriptvol𝐻𝐴subscript𝑣𝐴subscriptdegree𝐻𝑣\textrm{vol}_{H}(A)=\sum_{v\in A}\deg_{H}(v)vol start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_A ) = ∑ start_POSTSUBSCRIPT italic_v ∈ italic_A end_POSTSUBSCRIPT roman_deg start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_v ).

Graph Modularity

The definition of modularity for graphs was first introduced by Newman and Girvan in [48]. Despite some known issues with this function such as the “resolution limit” reported in [20], many popular algorithms for partitioning nodes of large graphs use it [14, 38, 46] and perform very well. The two prominent ones from this family are Louvain [9] and Leiden [51]. The modularity function favours partitions of the set of nodes of a graph G𝐺Gitalic_G in which a large proportion of the edges fall entirely within the parts (often called clusters), but benchmarks it against the expected number of edges one would see in those parts in the corresponding Chung-Lu random graph model [13] which generates random graphs with the expected degree sequence following exactly the degree sequence in G𝐺Gitalic_G.

Formally, for a graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and a given partition 𝐀={A1,A2,,Ak}𝐀subscript𝐴1subscript𝐴2subscript𝐴𝑘\mathbf{A}=\{A_{1},A_{2},\ldots,A_{k}\}bold_A = { italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } of V𝑉Vitalic_V, the modularity function is defined as follows:

qG(𝐀)subscript𝑞𝐺𝐀\displaystyle q_{G}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_A ) =\displaystyle== Ai𝐀eG(Ai)|E|Ai𝐀(volG(Ai)volG(V))2,subscriptsubscript𝐴𝑖𝐀subscript𝑒𝐺subscript𝐴𝑖𝐸subscriptsubscript𝐴𝑖𝐀superscriptsubscriptvol𝐺subscript𝐴𝑖subscriptvol𝐺𝑉2\displaystyle\sum_{A_{i}\in\mathbf{A}}\frac{e_{G}(A_{i})}{|E|}-\sum_{A_{i}\in% \mathbf{A}}\left(\frac{\textrm{vol}_{G}(A_{i})}{\textrm{vol}_{G}(V)}\right)^{2},∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_A end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG | italic_E | end_ARG - ∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_A end_POSTSUBSCRIPT ( divide start_ARG vol start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG vol start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_V ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (1)

where eG(Ai)subscript𝑒𝐺subscript𝐴𝑖e_{G}(A_{i})italic_e start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the number of edges in the subgraph of G𝐺Gitalic_G induced by set Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The first term in (1), Ai𝐀eG(Ai)/|E|subscriptsubscript𝐴𝑖𝐀subscript𝑒𝐺subscript𝐴𝑖𝐸\sum_{A_{i}\in\mathbf{A}}e_{G}(A_{i})/|E|∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_A end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / | italic_E |, is called the edge contribution and it computes the fraction of edges that fall within one of the parts. The second one, Ai𝐀(volG(Ai)/volG(V))2subscriptsubscript𝐴𝑖𝐀superscriptsubscriptvol𝐺subscript𝐴𝑖subscriptvol𝐺𝑉2\sum_{A_{i}\in\mathbf{A}}(\textrm{vol}_{G}(A_{i})/\textrm{vol}_{G}(V))^{2}∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_A end_POSTSUBSCRIPT ( vol start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / vol start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_V ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, is called the degree tax and it computes the expected fraction of edges that do the same in the corresponding random graph (the null model). The modularity measures the deviation between the two.

The maximum modularity q(G)superscript𝑞𝐺q^{*}(G)italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) is defined as the maximum of qG(𝐀)subscript𝑞𝐺𝐀q_{G}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_A ) over all possible partitions 𝐀𝐀\mathbf{A}bold_A of V𝑉Vitalic_V; that is, q(G)=max𝐀qG(𝐀).superscript𝑞𝐺subscript𝐀subscript𝑞𝐺𝐀q^{*}(G)=\max_{\mathbf{A}}q_{G}(\mathbf{A}).italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) = roman_max start_POSTSUBSCRIPT bold_A end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_A ) . In order to maximize qG(𝐀)subscript𝑞𝐺𝐀q_{G}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_A ) one wants to find a partition with large edge contribution subject to small degree tax. If q(G)superscript𝑞𝐺q^{*}(G)italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) approaches 1 (which is the trivial upper bound), we observe a strong community structure; conversely, if q(G)superscript𝑞𝐺q^{*}(G)italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) is close to zero (which is the trivial lower bound), there is no community structure. The definition in (1) can be generalized to weighted edges (with weight function w:E+:𝑤𝐸subscriptw:E\to\mathbb{R}_{+}italic_w : italic_E → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT), by replacing edge counts with sums of the corresponding edge weights.

Using Graph Modularity for Hypergraphs

Given a hypergraph H=(V,E)𝐻𝑉𝐸H=(V,E)italic_H = ( italic_V , italic_E ), it is common to transform its hyperedges into complete graphs (cliques), the process known as forming the 2-section of H𝐻Hitalic_H or clique expansion, the graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT, on the same set of nodes as H𝐻Hitalic_H. For each hyperedge eE𝑒𝐸e\in Eitalic_e ∈ italic_E with |e|2𝑒2|e|\geq 2| italic_e | ≥ 2 having weight w(e)𝑤𝑒w(e)italic_w ( italic_e ), (|e|2)binomial𝑒2\binom{|e|}{2}( FRACOP start_ARG | italic_e | end_ARG start_ARG 2 end_ARG ) edges are formed, each of them with weight of w(e)/(|e|2)𝑤𝑒binomial𝑒2w(e)/\binom{|e|}{2}italic_w ( italic_e ) / ( FRACOP start_ARG | italic_e | end_ARG start_ARG 2 end_ARG ). This choice preserves the total weight. There are other natural choices for the weight, for example the weighting scheme where w(e)/(|e|1)𝑤𝑒𝑒1w(e)/(|e|-1)italic_w ( italic_e ) / ( | italic_e | - 1 ) that ensures that the degree distribution of the created graph matches the one of the original hypergraph H𝐻Hitalic_H [35, 34]. As hyperedges in H𝐻Hitalic_H usually overlap, this process creates a multigraph. In order for H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT to be a simple graph, if the same pair of vertices appear in multiple hyperedges, the corresponding edge weights are summed.

One of the approaches for finding communities in hypergraphs that practitioners use is to apply one of the algorithms that aim to maximize the original, graph modularity function (such as Louvain, Leiden, or ECG) to graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT. Despite the fact that this procedure is simple, it has a drawback that the 2-section graph looses some potentially useful information. Therefore, it is desired to define modularity function that is tailored explicitly for hypergraphs and aim to optimize it directly.

Hypergraph Modularity

For edges of size greater than 2, several definitions can be used to quantify the edge contribution for a given partition 𝐀𝐀\mathbf{A}bold_A of the set of nodes. As a result, the choice of hypergraph modularity function is not unique. It depends on how strongly one believes that a hyperedge is an indicator that some of its vertices fall into one community. The fraction of nodes of a given hyperedge that belong to one community is called its homogeneity (provided it is more than 50%). In one extreme case, all vertices of a hyperedge have to belong to one of the parts in order to contribute to the modularity function; this is the strict variant assuming that only homogeneous hyperedges provide information about underlying community structure. In the other natural extreme variant, the majority one, one assumes that edges are not necessarily homogeneous and so a hyperedge contributes to one of the parts if more than 50% of its vertices belong to it; in this case being over 50% is the only information that is considered relevant for community detection. All variants in between guarantee that hyperedges contribute to at most one part. This is an important difference from the modularity on H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT, where a single original hyperedge is split into multiple graph edges that could be considered as contributing to multiple different parts (communities). Once the variant is fixed, one needs to benchmark the corresponding edge contribution using the degree tax computed for the generalization of the Chung-Lu model to hypergraphs proposed in [27].

The hypergraph modularity function is controlled by hyper-parameters ηc,d[0,1]subscript𝜂𝑐𝑑01\eta_{c,d}\in[0,1]italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT ∈ [ 0 , 1 ] (d2𝑑2d\geq 2italic_d ≥ 2, d/2+1cd𝑑21𝑐𝑑\lfloor d/2\rfloor+1\leq c\leq d⌊ italic_d / 2 ⌋ + 1 ≤ italic_c ≤ italic_d). For a fixed set of hyper-parameters and a given partition 𝐀={A1,A2,,Ak}𝐀subscript𝐴1subscript𝐴2subscript𝐴𝑘\mathbf{A}=\{A_{1},A_{2},\ldots,A_{k}\}bold_A = { italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } of V𝑉Vitalic_V, we define

qH(𝐀)=d2c=d/2+1dηc,dqHc,d(𝐀),subscript𝑞𝐻𝐀subscript𝑑2superscriptsubscript𝑐𝑑21𝑑subscript𝜂𝑐𝑑superscriptsubscript𝑞𝐻𝑐𝑑𝐀q_{H}({\mathbf{A}})=\sum_{d\geq 2}\leavevmode\nobreak\ \sum_{c=\lfloor d/2% \rfloor+1}^{d}\eta_{c,d}\ q_{H}^{c,d}({\mathbf{A}}),italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) = ∑ start_POSTSUBSCRIPT italic_d ≥ 2 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_c = ⌊ italic_d / 2 ⌋ + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c , italic_d end_POSTSUPERSCRIPT ( bold_A ) , (2)

where

qHc,d(𝐀)=1|E|Ai𝐀(eHc,d(Ai)|Ed|Pr(Bin(d,vol(Ai)vol(V))=c));superscriptsubscript𝑞𝐻𝑐𝑑𝐀1𝐸subscriptsubscript𝐴𝑖𝐀superscriptsubscript𝑒𝐻𝑐𝑑subscript𝐴𝑖subscript𝐸𝑑PrBin𝑑volsubscript𝐴𝑖vol𝑉𝑐q_{H}^{c,d}({\mathbf{A}})=\frac{1}{|E|}\sum_{A_{i}\in{\bf A}}\left(e_{H}^{c,d}% (A_{i})-|E_{d}|\cdot\Pr\left(\textrm{Bin}\left(d,\frac{\textrm{vol}(A_{i})}{% \textrm{vol}(V)}\right)=c\right)\right);italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c , italic_d end_POSTSUPERSCRIPT ( bold_A ) = divide start_ARG 1 end_ARG start_ARG | italic_E | end_ARG ∑ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_A end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c , italic_d end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - | italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT | ⋅ roman_Pr ( Bin ( italic_d , divide start_ARG vol ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG vol ( italic_V ) end_ARG ) = italic_c ) ) ;

eHc,d(Ai)superscriptsubscript𝑒𝐻𝑐𝑑subscript𝐴𝑖e_{H}^{c,d}(A_{i})italic_e start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c , italic_d end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the number of hyperedges of size d𝑑ditalic_d that have exactly c𝑐citalic_c members in Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and Bin(d,p)Bin𝑑𝑝\textrm{Bin}(d,p)Bin ( italic_d , italic_p ) is the binomial random variable, that is,

Pr(Bin(d,p)=c)=(dc)pc(1p)dc.PrBin𝑑𝑝𝑐binomial𝑑𝑐superscript𝑝𝑐superscript1𝑝𝑑𝑐\Pr\left(\textrm{Bin}\big{(}d,p\big{)}=c\right)=\binom{d}{c}p^{c}(1-p)^{d-c}.roman_Pr ( Bin ( italic_d , italic_p ) = italic_c ) = ( FRACOP start_ARG italic_d end_ARG start_ARG italic_c end_ARG ) italic_p start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_d - italic_c end_POSTSUPERSCRIPT .

Hyper-parameters ηc,dsubscript𝜂𝑐𝑑\eta_{c,d}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT give us a lot of flexibility and allow to value some hyperedges more than other ones depending on their size and homogeneity. However, there is a natural family of hyper-parameters that one might consider, namely, ηc,d=(c/d)τsubscript𝜂𝑐𝑑superscript𝑐𝑑𝜏\eta_{c,d}=(c/d)^{\tau}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = ( italic_c / italic_d ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT for some constant τ[0,)𝜏0\tau\in[0,\infty)italic_τ ∈ [ 0 , ∞ ). We will refer to the corresponding modularity function as τ𝜏\tauitalic_τ-modularity function. This family has only one parameter to tune, τ𝜏\tauitalic_τ, but it still covers a wide range of possible scenarios. For example, one might want to value all hyperedges equally (τ=0𝜏0\tau=0italic_τ = 0) or value more homogeneous hyperedges more (τ>0𝜏0\tau>0italic_τ > 0), including the extreme situation in which only fully homogeneous hyperedges are counted (τ𝜏\tau\to\inftyitalic_τ → ∞). In particular, we get the following four natural parameterizations of the modularity function to optimize:

  • strict modularity (τ𝜏\tau\to\inftyitalic_τ → ∞): ηd,d=1subscript𝜂𝑑𝑑1\eta_{d,d}=1italic_η start_POSTSUBSCRIPT italic_d , italic_d end_POSTSUBSCRIPT = 1 and ηc,d=0subscript𝜂𝑐𝑑0\eta_{c,d}=0italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = 0 for d/2+1c<d𝑑21𝑐𝑑\lfloor d/2\rfloor+1\leq c<d⌊ italic_d / 2 ⌋ + 1 ≤ italic_c < italic_d,

  • quadratic modularity (τ=2𝜏2\tau=2italic_τ = 2): ηc,d=(c/d)2subscript𝜂𝑐𝑑superscript𝑐𝑑2\eta_{c,d}=(c/d)^{2}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = ( italic_c / italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for d/2+1cd𝑑21𝑐𝑑\lfloor d/2\rfloor+1\leq c\leq d⌊ italic_d / 2 ⌋ + 1 ≤ italic_c ≤ italic_d,

  • linear modularity (τ=1𝜏1\tau=1italic_τ = 1): ηc,d=c/dsubscript𝜂𝑐𝑑𝑐𝑑\eta_{c,d}=c/ditalic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = italic_c / italic_d for d/2+1cd𝑑21𝑐𝑑\lfloor d/2\rfloor+1\leq c\leq d⌊ italic_d / 2 ⌋ + 1 ≤ italic_c ≤ italic_d,

  • majority modularity (τ=0𝜏0\tau=0italic_τ = 0): ηc,d=1subscript𝜂𝑐𝑑1\eta_{c,d}=1italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = 1 for d/2+1cd𝑑21𝑐𝑑\lfloor d/2\rfloor+1\leq c\leq d⌊ italic_d / 2 ⌋ + 1 ≤ italic_c ≤ italic_d.

Note that regardless of the parameter τ𝜏\tauitalic_τ, the weights are normalized so that maxcηc,d=1subscript𝑐subscript𝜂𝑐𝑑1\max_{c}\eta_{c,d}=1roman_max start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = 1 for all d𝑑ditalic_d. This ensures that the modularity function is normalized to be between 0 and 1.

As already mentioned above, the choice of the parameter τ𝜏\tauitalic_τ should be made depending on how much more homogeneous hyperedges are valued compared to inhomogeneous ones. However, in an absence of any external intuition about the nature of the ground-truth communities, our suggestion is to use τ=2𝜏2\tau=2italic_τ = 2. This choice is justified based on the connection to H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT, the corresponding 2-section graph of H𝐻Hitalic_H. Indeed, hyperedges of size d𝑑ditalic_d in H𝐻Hitalic_H that have exactly c𝑐citalic_c members in one of the communities contribute (c2)/(d2)=c(c1)d(d1)(c/d)2binomial𝑐2binomial𝑑2𝑐𝑐1𝑑𝑑1superscript𝑐𝑑2\binom{c}{2}/\binom{d}{2}=\frac{c(c-1)}{d(d-1)}\approx(c/d)^{2}( FRACOP start_ARG italic_c end_ARG start_ARG 2 end_ARG ) / ( FRACOP start_ARG italic_d end_ARG start_ARG 2 end_ARG ) = divide start_ARG italic_c ( italic_c - 1 ) end_ARG start_ARG italic_d ( italic_d - 1 ) end_ARG ≈ ( italic_c / italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraction of their original weight to the graph modularity function of H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT.

Having said that, let us stress the fact that optimizing the hypergraph 2222-modularity function of H𝐻Hitalic_H is not equivalent to optimizing the graph modularity function of H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT since hyperedges with cd/2𝑐𝑑2c\leq d/2italic_c ≤ italic_d / 2 members in one of the communities do not contribute to the hypergraph modularity whereas they still do in the graph counterpart.

Indeed, this observation highlights the key difference between our approach to extracting communities from the hypergraph and doing it via the corresponding 2-section graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT that we already indicated when introducing hypergraph modularity. Our assumption is that there exists an underlying set of latent communities in the hypergraph (commonly referred to as the ground-truth). A given set of nodes appears as a hyperedge with the probability that depends on whether the majority of them are from one of the communities or not. As a result, hyperedges of size d𝑑ditalic_d in H𝐻Hitalic_H that have at most d/2𝑑2d/2italic_d / 2 members in one of the communities are considered as noise, that unnecessarily influences the modularity function of H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT. Indeed, the hypergraph modularity is guaranteed to count a single hyperedge at most for one community (as we require c>d/2𝑐𝑑2c>d/2italic_c > italic_d / 2). On the other hand, the graph modularity of H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT potentially treats a single hyperedge as a positive signal contributing to multiple communities.

It is well known that optimizing modularity function in large networks might fail to resolve small communities, even when they are well defined. This well-known potential problem of applying a global null-models and is often referred to as the resolution limit [20]. A standard approach which tries to solve the resolution limit is to multiply the degree tax in the definition of the modularity function by a parameter γ>0𝛾0\gamma>0italic_γ > 0. This additional parameter controls the relative importance between the edge contribution and the degree tax. The hypergraph modularity function may be tuned the same way, if needed.

Finally, let us mention that for a given partition 𝐀𝐀\mathbf{A}bold_A, the values of different modularity functions should not be compared, as they are scaled differently; rather the same modularity function should be used to rank various partitions for a given graph.

3 Hypergraphs Used in Our Experiments

In this section, we introduce the hypergraphs we use in our experiments, both synthetic and real-world ones.

3.1 Synthetic Hypergraph Model: h–ABCD

There are very few hypergraph datasets with ground-truth identified and labelled. Synthetic networks are extremely useful to test various scenarios, such as the level of noise, via tuneable and interpretable parameters. As a result, there is need for synthetic random graph models with community structure that resemble real-world networks in order to benchmark and tune clustering algorithms that are unsupervised by nature.

It is worth mentioning that the family of clustering algorithms we are interested in aims to find partitions that maximize given modularity function, not to find the ground-truth partition. Those are often very similar partitions (but not always). Note that ground truth partitions typically influence the creation of a hypergraph in a noisy way, which means that just as a consequence of this randomness a good community in a graph (after the randomness is resolved) does not have to match exactly the ground truth community.

In particular, algorithm A𝐴Aitalic_A would be considered better than algorithm B𝐵Bitalic_B if it finds a partition yielding larger modularity. Selecting the right modularity function to optimize is crucial for making sure that the outcome of the algorithm is close to the ground-truth (or some specific requirements of the user), but once the function is selected the algorithm should aim to maximize it. We propose a simple, unsupervised method for making such selection (see Section 5.1) but this paper focuses on the optimization algorithm.

The standard for the generation of synthetic graphs is rather clear. The LFR (Lancichinetti, Fortunato, Radicchi) model [39, 37] generates networks with communities and at the same time it allows for the heterogeneity in the distributions of both node degrees and of community sizes. It became a standard and extensively used method for generating artificial networks. The Artificial Benchmark for Community Detection (ABCD[29] was recently introduced and implementedhttps://github.com/bkamins/ABCDGraphGenerator.jl/, including a fast implementationhttps://github.com/tolcz/ABCDeGraphGenerator.jl/ that uses multiple threads (ABCDe[33]. Undirected variant of LFR and ABCD produce graphs with comparable properties but ABCD/ABCDe is faster than LFR and can be easily tuned to allow the user to make a smooth transition between the two extremes: pure (disjoint) communities and random graph with no community structure. Moreover, it is easier to analyze theoretically—for example, in [26, 2] various theoretical asymptotic properties of the ABCD model are investigated including the modularity function and self-similarities of the ground-truth communities.

The situation for hypergraphs is not as clear as for graphs. There are not only few real-world datasets (with ground-truth) available, but also there are not so many synthetic hypergraph models. Fortunately, the building blocks in the ABCD model are flexible and may be adjusted to satisfy different needs. For example, the model was adjusted to include potential outliers in [31] resulting in ABCD+o model. Adjusting the model to hypergraphs is more complex but it was also done recently [32] resulting in h–ABCD model. We will use this model for our experiments.

The h–ABCD model generates a hypergraph on n𝑛nitalic_n nodes. The degree distribution follows power-law with exponent γ𝛾\gammaitalic_γ, minimum and maximum value equal to δ𝛿\deltaitalic_δ and, respectively, D𝐷Ditalic_D. Community sizes are between s𝑠sitalic_s and S𝑆Sitalic_S, and also follow power-law distribution, but this time with exponent β𝛽\betaitalic_β. Parameter ξ𝜉\xiitalic_ξ is responsible for the level of noise. If ξ=0𝜉0\xi=0italic_ξ = 0, then each hyperedge is a community hyperedge meaning that majority of its nodes belong to one community. On the other extreme, if ξ=1𝜉1\xi=1italic_ξ = 1, then communities do not play any roles and hyperedges are simply “sprinkled” across the entire hypergraph that we will refer to as background hypergraph. Vector (q1,,qL)subscript𝑞1subscript𝑞𝐿(q_{1},\ldots,q_{L})( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) determines the distribution of the number of hyperedges of a given size, where L𝐿Litalic_L is the size of largest hyperedges.

Finally, parameters wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT specify how many nodes from its own community a given community hyperedge should have. We call a community hyperedge to be of type (c,d)𝑐𝑑(c,d)( italic_c , italic_d ) if it has size d𝑑ditalic_d and exactly c𝑐citalic_c of its nodes belong to one of the communities. Note that, in light of the discussion we had at the end of the previous section, we require that a community hyperedge must have more than a half of its nodes from the community. Therefore, wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT is defined for d/2<cd𝑑2𝑐𝑑d/2<c\leq ditalic_d / 2 < italic_c ≤ italic_d, where d[L]𝑑delimited-[]𝐿d\in[L]italic_d ∈ [ italic_L ].

The model is flexible and may accept any family of parameters wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT satisfying specific needs of the users, but here is a list of three standard options implemented in the code:

  • majority model: wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT is uniform for all admissible values of c𝑐citalic_c, that is, for any d/2<cd𝑑2𝑐𝑑d/2<c\leq ditalic_d / 2 < italic_c ≤ italic_d, wc,d=1(dd/2)=1d/2,subscript𝑤𝑐𝑑1𝑑𝑑21𝑑2w_{c,d}=\frac{1}{(d-\lfloor d/2\rfloor)}=\frac{1}{\lceil d/2\rceil},italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ( italic_d - ⌊ italic_d / 2 ⌋ ) end_ARG = divide start_ARG 1 end_ARG start_ARG ⌈ italic_d / 2 ⌉ end_ARG ,

  • linear model: wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT is proportional to c𝑐citalic_c for all admissible values of c𝑐citalic_c, that is, for any d/2<cd𝑑2𝑐𝑑d/2<c\leq ditalic_d / 2 < italic_c ≤ italic_d, wc,d=2c(d+d/2+1)(dd/2)=2c(d+d/2+1)d/2,subscript𝑤𝑐𝑑2𝑐𝑑𝑑21𝑑𝑑22𝑐𝑑𝑑21𝑑2w_{c,d}=\frac{2c}{(d+\lfloor d/2\rfloor+1)(d-\lfloor d/2\rfloor)}=\frac{2c}{(d% +\lfloor d/2\rfloor+1)\lceil d/2\rceil},italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = divide start_ARG 2 italic_c end_ARG start_ARG ( italic_d + ⌊ italic_d / 2 ⌋ + 1 ) ( italic_d - ⌊ italic_d / 2 ⌋ ) end_ARG = divide start_ARG 2 italic_c end_ARG start_ARG ( italic_d + ⌊ italic_d / 2 ⌋ + 1 ) ⌈ italic_d / 2 ⌉ end_ARG ,

  • strict model: only “pure” hyperedges are allowed, that is wd,d=1subscript𝑤𝑑𝑑1w_{d,d}=1italic_w start_POSTSUBSCRIPT italic_d , italic_d end_POSTSUBSCRIPT = 1 and wc,d=0subscript𝑤𝑐𝑑0w_{c,d}=0italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT = 0 for d/2<c<d𝑑2𝑐𝑑d/2<c<ditalic_d / 2 < italic_c < italic_d.

Let us note that the parameterizations of wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT in h–ABCD and ηc,dsubscript𝜂𝑐𝑑\eta_{c,d}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT in the definition of the hypergraph modularity function have the same (due to matching functional form) name but they are not equivalent. Parameters wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT determine the composition of hyperedges in the generated synthetic graph whereas parameters ηc,dsubscript𝜂𝑐𝑑\eta_{c,d}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT specify the objective function that the analyst decided to optimize against while looking for communities in the hypergraph at hand.

Specifically, we used the following parameters for our experiments:

  • n=300𝑛300n=300italic_n = 300 nodes,

  • power-law degree exponent α=2.5𝛼2.5\alpha=2.5italic_α = 2.5, in the range [5,30]530[5,30][ 5 , 30 ],

  • power-law community size exponent β=1.5𝛽1.5\beta=1.5italic_β = 1.5, in the range [80,120]80120[80,120][ 80 , 120 ].

We generated 6 families of h–ABCD hypergraphs, namely:

  • linear_2to5: linear model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with edge sizes 2 to 5 (with respective probabilities 0.1,0.4,0.4,0.10.10.40.40.10.1,0.4,0.4,0.10.1 , 0.4 , 0.4 , 0.1),

  • majority_2to5: majority model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with edge sizes 2 to 5 (with respective probabilities 0.1,0.4,0.4,0.10.10.40.40.10.1,0.4,0.4,0.10.1 , 0.4 , 0.4 , 0.1),

  • strict_2to5: strict model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with edge sizes 2 to 5 (with respective probabilities 0.1,0.4,0.4,0.10.10.40.40.10.1,0.4,0.4,0.10.1 , 0.4 , 0.4 , 0.1),

  • linear_5: linear model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with all edge of size 5,

  • majority_5: majority model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with all edge of size 5, and

  • strict_5: strict model for wc,dsubscript𝑤𝑐𝑑w_{c,d}italic_w start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT, with all edge of size 5.

3.2 Real-world Hypergraphs

To illustrate various aspects of hypergraph modularity-based clustering, we analyze a few real-world hypergraphs. The first is a contact hypergraphs in which nodes correspond to primary school children or teachers and hyperedges represent close physical proximity between individuals within a prescribed time period (see [12, 49, 43] for more details). There are 242 nodes labelled with respect to their class (there are 10 classes), plus another label for the teachers. There are 12,704 hyperedges of size up to 5. In Table 1 (left), we show the distribution of edge composition for this dataset with respect to the ground-true communities. We see that there are many edges between communities, in particular edges of size 2. Community edges (with c>d/2𝑐𝑑2c>d/2italic_c > italic_d / 2) are mostly “pure” edges (with c=d𝑐𝑑c=ditalic_c = italic_d), but there is a significant number of edges of type (c,d)=(2,3)𝑐𝑑23(c,d)=(2,3)( italic_c , italic_d ) = ( 2 , 3 ), thus it is unclear if hypergraph τ𝜏\tauitalic_τ-modularity functions with large parameter τ𝜏\tauitalic_τ would do well, or if a small value for τ𝜏\tauitalic_τ should be used.

primary-school
d c purity frequency
2 1 50% 5202
2 2 100% 2546
3 3 100% 2434
3 2 67% 1751
3 1 33% 415
4 4 100% 158
4 2 50% 93
4 3 75% 84
4 1 25% 12
5 3 60% 6
cora
d c purity frequency
2 2 100% 472
3 3 100% 307
4 4 100% 175
2 1 50% 151
3 2 67% 118
4 3 75% 91
5 5 100% 83
5 4 80% 55
4 2 50% 42
3 1 33% 39
Table 1: The number of hyperedges of type (c,d)𝑐𝑑(c,d)( italic_c , italic_d ) (the top-10 most frequent ones) for the primary-school dataset (left) and the cora co-citation dataset (right). (Combinations contributing to hypergraph modularity are highlighted in grey.)

Another dataset we use for our experiments in the co-reference dataset between scientific publication which belong to one of seven classes (cora); see [53] for more details. Hyperedges consist of co-cited scientific publications, and we only keep hyperedges of size 2 or more. There are 1,434 nodes appearing in at least one hyperedge (cited publications), and 1,579 hyperedges. In Table 1 (right), we show the top-10 distribution of edge composition with respect to the true communities. We see that there are many pure community edges (with c=d𝑐𝑑c=ditalic_c = italic_d), so we can expect that hypergraph τ𝜏\tauitalic_τ-modularity functions with large parameter τ𝜏\tauitalic_τ would do well.

4 Hypergraph Modularity Optimization Algorithm: h–Louvain

Let us fix the hypergraph modularity function qH(𝐀)subscript𝑞𝐻𝐀q_{H}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ), either by restricting ourselves to τ𝜏\tauitalic_τ-modularity function with some specific value of τ𝜏\tauitalic_τ (such as τ=2𝜏2\tau=2italic_τ = 2 that is recommended as the default value) or by specifying the more general hyper-parameters ηc,dsubscript𝜂𝑐𝑑\eta_{c,d}italic_η start_POSTSUBSCRIPT italic_c , italic_d end_POSTSUBSCRIPT. The goal of this section is to highlight challenges in designing a heuristic algorithm aiming to optimize qH(𝐀)subscript𝑞𝐻𝐀q_{H}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) and to describe our solution that overcame these challenges, producing an algorithm that we will refer to as h–Louvain.

4.1 Louvain Algorithm

Let us start by introducing one of the most popular algorithms for detecting communities in graphs, namely, the Louvain algorithm [9]. It is a hierarchical clustering algorithm that tries to optimize the modularity function we described in Section 2.

In the first pass of this algorithm, small communities are found by optimizing the graph modularity function locally on all nodes. Then, each small community is grouped together into a single node that we will refer to as super-node. This process is repeated recursively on those smaller graphs consisting of super-nodes (the subsequent passes) until no improvement on the modularity function can be further achieved.

One pass of the algorithm consists of two phases that are repeated iteratively. In the first phase, each node in the network is assigned to its own community. For each node v𝑣vitalic_v, we consider all neighbours u𝑢uitalic_u of v𝑣vitalic_v and compute the change in the modularity function if v𝑣vitalic_v is removed from its own community and moved into the community of u𝑢uitalic_u. It is important to mention that this value can be easily and efficiently calculated without the need to recompute the modularity function from scratch. Once all the communities that v𝑣vitalic_v could belong to are considered, v𝑣vitalic_v is placed into the community that resulted in the largest increase of the modularity function. If no increase is possible, v𝑣vitalic_v remains in its original community. The process is repeated for the remaining nodes following a given (typically random) permutation of nodes, possibly multiple times, until a local maximum value is achieved and the first phase ends.

During the second phase, the algorithm contracts all nodes that belong to one community into a single super-node. All edges within that community are replaced by a single weighted loop. Similarly, all edges between two communities are replaced by a single weighted edge. Once the new network is created, the second phase ends. The resulting graph is typically much smaller than the original graph. As a result, the first pass is typically the most time consuming part of the algorithm.

4.2 Challenges with Adjusting the Algorithm to Hypergraphs

One could try to directly apply the Louvain algorithm to optimize hypergraph modularity, since in both cases the goal is to find a partition of the set nodes. However, as the algorithm moves only one node at a time, it creates a problem in the case of hypergraphs.

Consider, for example, a hypergraph in which all hyperedges have size at least four. In this case, regardless which two nodes u𝑢uitalic_u and v𝑣vitalic_v are considered for possible merging into one community, the edge contribution would not change (that is, it would stay equal to zero), even if u𝑢uitalic_u and v𝑣vitalic_v are part of some hyperedge. (Recall that only hyperedges with majority of nodes from the same community may affect the edge contribution). On the other hand, the degree tax would increase after such a move and, as a result, the modularity function would decrease. Therefore, no move would be made and the algorithm would get immediately stuck. We will refer to this issue as a lift off from the ground problem.

The above, extreme, situation is not the only problem one should be aware of. This time consider a hypergraph that consists of a mixture of hyperedges of various sizes, including edges of size two. In this scenario there is no problem with lifting off from the ground but small hyperedges clearly play a much more important role than large ones during the initial merging in the first phase of the algorithm. On the other hand, very large hyperedges would be mostly ignored. This behaviour is not desirable either. In order to illustrate a potential danger, consider a hypergraph representing interactions between researchers at some institution. Nodes in this hypergraph correspond to researchers and hyperedges correspond to meetings of some groups of people. For simplicity, assume that there are two communities, say, faculty of science and faculty of engineering. Many hyperedges within the two communities are large (e.g. hyperedges associated with departmental meetings) whereas hyperedges between the two communities are mostly of size two (e.g. two members of different teams meet individually from time to time). In this scenario, the algorithm would start merging people from different communities during the first phase.

Finally, let us note that one could alternatively consider modifying the algorithm and allow for not only merging two nodes into one community in a single move but entire hyperedges. Again, this does not seem to be desirable as hyperedges might consist of members from different communities and so such operations would generate many incorrect merges too fast.

4.3 Our Approach to Hypergraph Modularity Optimization

In order to overcome the above mentioned challenges, we want to design an algorithm that, as in the classical Louvain algorithm, merges single pairs of nodes while, at the same time, takes into account information stored in hyperedges of all sizes. To that end we propose to optimize a linear combination of the hypergraph modularity qH(𝐀)subscript𝑞𝐻𝐀q_{H}({\mathbf{A}})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) and the graph modularity of the corresponding 2-section graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT, that is, optimize function

q(𝐀,α):=αqH(𝐀)+(1α)qH[2](𝐀),assign𝑞𝐀𝛼𝛼subscript𝑞𝐻𝐀1𝛼subscript𝑞subscript𝐻delimited-[]2𝐀q({\mathbf{A}},\alpha):=\alpha\cdot q_{H}({\mathbf{A}})+(1-\alpha)\cdot q_{H_{% [2]}}({\mathbf{A}}),italic_q ( bold_A , italic_α ) := italic_α ⋅ italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) + ( 1 - italic_α ) ⋅ italic_q start_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_A ) , (3)

where α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ]. For simplicity, we will refer to our algorithm as h–Louvain.

To understand the motivation behind this approach, let us observe the following. The hypergraph modularity, equation (2), is flexible and may approximate well the graph modularity for the corresponding 2-section graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT. Indeed, if c𝑐citalic_c vertices of a hyperedge e𝑒eitalic_e of size d𝑑ditalic_d and weight w(e)𝑤𝑒w(e)italic_w ( italic_e ) fall into one part of the partition 𝐀𝐀\mathbf{A}bold_A, then the contribution to the graph modularity is w(e)(c2)/(|e|2)w(e)(c/|e|)2𝑤𝑒binomial𝑐2binomial𝑒2𝑤𝑒superscript𝑐𝑒2w(e)\binom{c}{2}/\binom{|e|}{2}\approx w(e)(c/|e|)^{2}italic_w ( italic_e ) ( FRACOP start_ARG italic_c end_ARG start_ARG 2 end_ARG ) / ( FRACOP start_ARG | italic_e | end_ARG start_ARG 2 end_ARG ) ≈ italic_w ( italic_e ) ( italic_c / | italic_e | ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (in the variant of the 2-section where the total weight is preserved) or w(e)(c2)/(|e|1)𝑤𝑒binomial𝑐2𝑒1w(e)\binom{c}{2}/(|e|-1)italic_w ( italic_e ) ( FRACOP start_ARG italic_c end_ARG start_ARG 2 end_ARG ) / ( | italic_e | - 1 ) (if the degrees are preserved). Hence, the hyper-parameters of the hypergraph modularity can be adjusted to approximate H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT modularity. The only difference is that (2) does not allow to include contributions from parts that contain at most d/2𝑑2d/2italic_d / 2 vertices which still contributes to the graph modularity of H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT.

The observation justifies using q(𝐀,α)𝑞𝐀𝛼q({\mathbf{A}},\alpha)italic_q ( bold_A , italic_α ) for optimizing the hypergraph modularity. It is a linear combination of the actual hypergraph modularity we want to optimize, qH(𝐀)subscript𝑞𝐻𝐀q_{H}({\mathbf{A}})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ), and an approximation of the hypergraph modularity for special value of hyper-parameter (τ=2𝜏2\tau=2italic_τ = 2) and without the restriction of hyperedge contribution, qH[2](𝐀)subscript𝑞subscript𝐻delimited-[]2𝐀q_{H_{[2]}}({\mathbf{A}})italic_q start_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_A ). The benefit of the second part is that it is sensitive to merging two nodes and so it always gives some indication of how nodes should be merged (even if the first part qH(𝐀)subscript𝑞𝐻𝐀q_{H}(\mathbf{A})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) does not give such an indication). In short, it resolves the lifting off from the ground problem. If α𝛼\alphaitalic_α is close to zero, then we concentrate mostly on the approximation part, while if α𝛼\alphaitalic_α is close to one, then we mostly concentrate on the actual hypergraph modularity we aim to optimize.

The above discussion leads us to the conclusion that the parameter α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] should be appropriately tuned during the algorithm. The main questions are: a) when the change should be made, and b) what values of this parameter should be used? In [25], we performed various experiments and made the following observations. The optimization process should start with low values of the parameter α𝛼\alphaitalic_α (to let the process lift off from the ground) and then it should be gradually increased till it reaches one by the end of the process. The algorithm should start increasing parameter α𝛼\alphaitalic_α when the communities induce enough edges so that merging additional nodes makes a difference in the edge contribution of the qHsubscript𝑞𝐻q_{H}italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT function value; this, in particular, means that since the strict hypergraph modularity pays attention to only pure hyperedges (all members belong to one community), in this case, the algorithm needs to start with lower values of α𝛼\alphaitalic_α and increase it slower than for the majority or the linear counterparts of the hypergraph modularity for which it is enough that over 50% of nodes in some hyperedge are captured in one community.

Based on these observations, we propose the following schema for setting the successive values of α𝛼\alphaitalic_α used in the objective function (3), which leads to monotonic (non-decreasing) sequences (α1,α2,subscript𝛼1subscript𝛼2\alpha_{1},\alpha_{2},\dotsitalic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , …). The schema is guided by the following two parameters: pb[0,1]subscript𝑝𝑏01p_{b}\in[0,1]italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ [ 0 , 1 ] and pc(0,1)subscript𝑝𝑐01p_{c}\in(0,1)italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ ( 0 , 1 ). The parameter pbsubscript𝑝𝑏p_{b}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is used to determine the values of αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, while pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT governs when the algorithm switches from αi1subscript𝛼𝑖1\alpha_{i-1}italic_α start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT to αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (for i2𝑖2i\geq 2italic_i ≥ 2) as the optimization progresses.

For a given pair of parameters (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ), the values of αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are determined as follows: for any i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N,

αi=1(1pb)i1.subscript𝛼𝑖1superscript1subscript𝑝𝑏𝑖1\alpha_{i}=1-(1-p_{b})^{i-1}.italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - ( 1 - italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT .

(We use the convention that 00=1superscript0010^{0}=10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 1.) Note that α1=0subscript𝛼10\alpha_{1}=0italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 and αi1subscript𝛼𝑖1\alpha_{i}\to 1italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → 1 as i𝑖i\to\inftyitalic_i → ∞, unless pb=0subscript𝑝𝑏0p_{b}=0italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 0. In the degenerate case, if pb=0subscript𝑝𝑏0p_{b}=0italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 0, then αi=0subscript𝛼𝑖0\alpha_{i}=0italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 for all i𝑖iitalic_i. The algorithm switches from αi1subscript𝛼𝑖1\alpha_{i-1}italic_α start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT to αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s (for i2𝑖2i\geq 2italic_i ≥ 2) when the number of communities drops to npci1𝑛superscriptsubscript𝑝𝑐𝑖1np_{c}^{i-1}italic_n italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT or below for the first time (note that the number of communities typically decreases but it is not always the case; as usual, n𝑛nitalic_n denotes the number of nodes). In summary, the two parameters have the following interpretation: pbsubscript𝑝𝑏p_{b}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT controls the rate of change of α𝛼\alphaitalic_α (values close to zero make αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT converge to one slowly, values close to one make convergence fast); pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT controls the speed of change of α𝛼\alphaitalic_α.

There are two possible endings once the algorithm reaches a partition in which no improvement of the modularity function is possible via local changes. (Note that it might happen when the value of αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is still away from one.) By default, we fix αi=1subscript𝛼𝑖1\alpha_{i}=1italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 and continue optimizing the hypergraph modularity function on the small graph consisting of super-nodes until no further improvement can be achieved. Alternatively, the local optimization can be performed on the original graph consisting of nodes. The pseudo-code of h–Louvain can be found in the Appendix (see Section A). As it is discussed in Section 4.5, we use the default ending when doing Bayesian optimization to select a good pair (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) of parameters because it is faster. Once the final pair is chosen, we do an additional tuning process with local-optimization enabled which typically yields better values of the objective function.

4.4 Parameters of the h–Louvain Algorithm

In this subsection, we aim to investigate the quality of the h–Louvain algorithm for different pairs of parameters (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ). To that end, we analyzed the performance of the algorithm using 9999 different h–ABCD graphs on 1,00010001{,}0001 , 000 nodes. For each of the three options for community hyperedges in the h–ABCD model (namely, strict, linear, and majority), we used the following three settings with respect of different levels of noise and sizes of hyperedges:

  1. 1.

    small level of noise (ξ=0.15𝜉0.15\xi=0.15italic_ξ = 0.15, ξemp=0.29subscript𝜉𝑒𝑚𝑝0.29\xi_{emp}=0.29italic_ξ start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT = 0.29), hyperedges of size between 2 and 5, the degree distribution following power-law with exponent γ=2.5𝛾2.5\gamma=2.5italic_γ = 2.5, minimum and maximum degree 5555 and 20202020,

  2. 2.

    large level of noise (ξ=0.6𝜉0.6\xi=0.6italic_ξ = 0.6, ξemp=0.62subscript𝜉𝑒𝑚𝑝0.62\xi_{emp}=0.62italic_ξ start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT = 0.62), hyperedges of size between 2 and 5, the degree distribution following power-law with exponent γ=2.5𝛾2.5\gamma=2.5italic_γ = 2.5, minimum and maximum degree 5555 and 20202020,

  3. 3.

    large hyperedges (sizes between 5 and 8), large level of noise (ξ=0.3𝜉0.3\xi=0.3italic_ξ = 0.3, ξemp=0.63subscript𝜉𝑒𝑚𝑝0.63\xi_{emp}=0.63italic_ξ start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT = 0.63), the degree distribution following power-law with exponent γ=2.5𝛾2.5\gamma=2.5italic_γ = 2.5, minimum and maximum degree 5555 and 60606060.

(In the above description, ξempsubscript𝜉𝑒𝑚𝑝\xi_{emp}italic_ξ start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT refers to the actual level of noise in the produced hypergraph. The model ensures that ξempξsubscript𝜉𝑒𝑚𝑝𝜉\xi_{emp}\approx\xiitalic_ξ start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT ≈ italic_ξ for graphs without small communities. In our scenario, it is not the case but the generated hypergraphs still have drastically different levels of noise.) In all three settings, the distribution of community sizes follows power-law with exponent β=1.5𝛽1.5\beta=1.5italic_β = 1.5, minimum and maximum size 10101010 and 30303030. The distribution of hyperedges of different sizes is (0.4,0.3,0.2,0.1)0.40.30.20.1(0.4,0.3,0.2,0.1)( 0.4 , 0.3 , 0.2 , 0.1 ), that is, there are slightly more hyperedges of smaller size.

Figure 1 presents the performance of the algorithm for three selected hypergraphs out of the 9999 we experimented with. (Results for the remaining six hypergraphs can be found in the associated GitHub repository.) For each hypergraph, we present the quality of the algorithm optimizing the corresponding modularity function (that is, for example, for strict hypergraph we optimize the strict modularity function) as a function of (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ). The parameters were tested from the 2-dimensional grid (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ), where pb{0.05,0.1,0.3,0.5,0.7,0.9,0.95}subscript𝑝𝑏0.050.10.30.50.70.90.95p_{b}\in\{0.05,0.1,0.3,0.5,0.7,0.9,0.95\}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ { 0.05 , 0.1 , 0.3 , 0.5 , 0.7 , 0.9 , 0.95 } and pc{0.1,0.2,,0.9}subscript𝑝𝑐0.10.20.9p_{c}\in\{0.1,0.2,\ldots,0.9\}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ { 0.1 , 0.2 , … , 0.9 }. For each pair of parameters, the average modularity function is reported over 10 independent runs with different random seeds. (Recall that h–Louvain is a randomized algorithm.)

The general conclusion is that the optimal choice of parameters depends on many factors: property of the hypergraph (such as the composition of community hyperedges, the level of noise, sizes of hyperedges) as well as the modularity function that one aims to maximize. However, not surprisingly, it is not recommended to set both parameters to be close to zero (the case of slow and small increases of the alpha parameter, so optimizing mainly the graph modularity of the corresponding 2-section graph H[2]subscript𝐻delimited-[]2H_{[2]}italic_H start_POSTSUBSCRIPT [ 2 ] end_POSTSUBSCRIPT), or to be close to one (the case of fast and significant changes of the alpha parameter, so optimizing the hypergraph modularity qHsubscript𝑞𝐻q_{H}italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT almost from the beginning of algorithm execution). The optimal values are often obtained for settings with balanced values of both parameters, namely, with pb+pc1subscript𝑝𝑏subscript𝑝𝑐1p_{b}+p_{c}\approx 1italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≈ 1. In order to find a “sweet spot” in an unsupervised way, we use Bayesian optimization that we discuss next.

Refer to caption
Refer to caption
Refer to caption
Figure 1: Quality of h–Louvain on h–ABCD as a function of parameters pb,pcsubscript𝑝𝑏subscript𝑝𝑐p_{b},p_{c}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. Optimal combinations of the two parameters depend on the choice of h–ABCD variant and hypergraph modularity function: strict, large level of noise (left), majority, large level of noise (middle), or linear, small level of noise (right).

4.5 Bayesian Optimization: Selecting the Parameters

In order to find a good pair of the two parameters (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) guiding the h–Louvain algorithm that yield large hypergraph modularity function, we use the Bayesian optimization approach [21]. We chose this tool for our problem as this approach is best suited for optimizing objective functions that take a long time to evaluate over continuous domains of less than 20 dimensions, and tolerates well non-negligible local variability of the evaluation of the function. It builds a surrogate for the objective function and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample the domain in an on-line fashion.

Specifically, in our case the Bayesian optimization aims to explore the two dimensional space (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) with pb[0,1]subscript𝑝𝑏01p_{b}\in[0,1]italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ [ 0 , 1 ] and pc(0,1)subscript𝑝𝑐01p_{c}\in(0,1)italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ ( 0 , 1 ). The target function is defined as the average modularity function of the outcome partition for 10101010 independent executions of the h–Louvain algorithm with different (but fixed across runs) random number generator seeds. Note that in this setting we maximize a deterministic function (since the seeds are fixed). We take the average over 10 different seeds because we aim to identify the region of the (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) domain that leads to good values of the obtained evaluations of modularity and taking the average reduces the noise that is present in modularity values observed in single runs of the algorithm.

Because tuning hyper-parameters is computationally intensive, we initially use the default ending of the algorithm, that is, without the local-optimization approach for the last phase (see Section 4.3 for an explanation how the optional ending works). The reason for this choice is that in this phase of the process we are mostly interested in capturing the shape of the response surface (recall that we take the average of 10 runs of the algorithm for the very same reason, namely, to smooth-out the results and to better capture the shape of the response surface). The default ending is sufficient for this purpose and it is substantially faster. After finding an approximation of the optimal (pb,pc)subscript𝑝𝑏subscript𝑝𝑐(p_{b},p_{c})( italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) combination, the algorithm comes back to the partition obtained with these parameters but this time the local-optimization approach is used during the last phase. It is more computationally expensive, but at this stage of the procedure we are interested in finding the maximum value of the hypergraph modularity, and so this additional computational cost is justified.

We configured the Bayesian optimization procedure so that it starts with evaluation of 5555 initial pairs of parameters selected randomly from the domain and at least 10101010 pairs are tested in total. Once the Bayesian optimization converges, the algorithm returns the partition of the largest modularity from all partitions generated during the entire process. Note that this partition might not be one of the 10 partitions that contributed to the largest value of the target function; these partitions only have the best average modularity.

In order to visualize the Bayesian optimization procedure, we performed the following experiment. We selected one of the nine h–ABCD hypergraphs we experimented with (namely, the linear hypergraph with small level of noise, but this time with only n=300𝑛300n=300italic_n = 300 nodes) and one of the three modularity functions (namely, the linear one) to be our target function. For cleaner visualization, we fixed pb=0.9subscript𝑝𝑏0.9p_{b}=0.9italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 0.9 and used the procedure to find the optimum value of pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT that maximizes the selected modularity function. Figure 2 presents situation at step k𝑘kitalic_k of the algorithm for k{8,9,10,11}𝑘891011k\in\{8,9,10,11\}italic_k ∈ { 8 , 9 , 10 , 11 }. The blue curve is the target function that we independently computed but it is not available to the Bayesian optimization. The orange curve is a surrogate for the target function based on k1𝑘1k-1italic_k - 1 sampled points that are marked on this curve. The level of uncertainty is represented by the shaded area around this curve. Based on this information, the Bayesian optimization selects the next point to sample at this step which is depicted as a green point that lies outside of the orange curve. Note that the blue curve is deterministic (as we use fixed seeding of random number generator), as discussed above. It still has visible local variability, although it is possible to identify the region of good values of the parameter pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT (in this case around 0.20.20.20.2). This variability is expected and is the reason why when computing it we take the average of 10 independent evaluations of the algorithm (this approach significantly reduces the level of noise).

Refer to caption
(a) Step 8
Refer to caption
(b) Step 9
Refer to caption
(c) Step 10
Refer to caption
(d) Step 11
Figure 2: Visualization of the Bayesian optimization approach optimizing the modularity function returned by the h–Louvain algorithm.

5 Experiments

In this section, we present several experiments aimed at testing our h–Louvain clustering algorithm as well as comparing outcomes of selecting various hypergraph modularity functions. One general observation is that the choice of the objective function to optimize, here the hypergraph τ𝜏\tauitalic_τ-modularity, typically has an enormous impact on the quality of the results (see Subsection 5.1). Fortunately, one should be able to make a reasonable selection of a good value of τ𝜏\tauitalic_τ in an unsupervised way.

In general, in most of our experiments, we compare results obtained with our h–Louvain algorithm with results obtained by classical Louvain algorithm on the corresponding (weighted) 2-section graphs, as well as results using Kumar’s algorithm; see [35]. Kumar’s algorithm is a modification of the Louvain algorithm on 2-section graphs in which edges are re-weighted by taking into account the underlying hypergraph structure, and the hyperedge composition (with respect to the communities).

We consider synthetic hypergraphs with community structures, obtained via the h–ABCD benchmark, as well as four real-life hypergraphs, all of them described earlier in Section 3. Synthetic hypergraphs allow us to investigate the performance of algorithms in various scenarios (see Subsection 5.2), from hypergraphs with low level of noise (ξ𝜉\xiitalic_ξ close to 0) that are easy to deal with to noisy hypergraphs (ξ𝜉\xiitalic_ξ close to 1) that are challenging to find communities in. We also investigate a challenging case in which there are many hyperedges between two small communities (see Subsection 5.3). It is known that many networks exhibit self-similar, “fractal-type” structure (see [2, 3] and references therein) so such example aims to reproduce typical scenarios. This example highlights the power of our h–Louvain algorithm that in this particular setting clearly outperforms its competitors.

For the real hypergraphs, we additionally consider the “all of nothing” (AON) variant of the Louvain algorithm. Specifically, we consider the version aiming to optimize the strict modularity, referred to as AON, as described in the associated GitHub repository§§§https://github.com/nveldt/HyperModularity.jl. Note that, unless we start from some non-trivial partition such as the one obtained from the 2-section graph with Louvain, this algorithm requires 2-edges to be present. This is the case for the real hypergraphs we considered (but not so for several h–ABCD benchmark hypergraphs).

In general, the experiments on real-world hypergraphs (see Subsection 5.4) show that for appropriate value of τ𝜏\tauitalic_τ affecting the choice of the objective function to optimize, one can improve the quality of the clusters measured by the AMI score with respect to the ground-truth.

5.1 Selecting the Modularity Function to Optimize

Selecting an appropriate hypergraph modularity function to optimize is an important part of the process. The choice depends on how strongly one believes that a hyperedge is an indicator that at least some fraction of its nodes fall into one of the communities. In some situations, a reasonable assumption could be that not necessarily all members of that hyperedge must be in a single community but majority should (in such situations, quadratic modularity function might work well). On the other hand, some situations might have some underlying physical constraints that make one believe that all members should belong to one community unless such hyperedge is simply a noise (this time, strict modularity might be the one to optimize). If an analyst has some reasonable assumptions about the underlying process that created a hypergraph, then the decision which modularity function to use should be made based on this expert knowledge. In this section, we provide a general strategy for selecting a modularity function if such expert knowledge is not available, based on the structure of the hypergraph that can be detected in an unsupervised way.

Let us first start with highlighting important implications of the choice of the modularity function one decides to optimize. Recall that a community hyperedge (hyperedge with more than 50% of members from one of the communities) of size d𝑑ditalic_d that have exactly c𝑐citalic_c members from one of the communities is said to be of type (c,d)𝑐𝑑(c,d)( italic_c , italic_d ). In the absence of having the ground-truth available, one way to compare partitions returned by the algorithm aiming to optimize different modularity functions is to look at the distribution of edges of certain types. In our first experiment, we consider a synthetic h–ABCD graphs with only edges of size 5 and generated with the strict or the majority model, strict_5 and majority_5 with noise parameters ξ=0.3𝜉0.3\xi=0.3italic_ξ = 0.3 and ξ=0.2𝜉0.2\xi=0.2italic_ξ = 0.2. We run this experiment to show how the (c,d)𝑐𝑑(c,d)( italic_c , italic_d ) hyperedge composition changes under various modularity functions optimized. We expect that there should be visible difference in this distribution between strict_5 and majority_5 graphs independent of the choice of the objective function for community detection.

In Table 2, we compare the distribution of edge types for 5 different partitions, namely, (i) the ground-truth partition, and partitioned returned by (ii) 2-section Louvain, (iii) h–Louvain with τ=2𝜏2\tau=2italic_τ = 2 (quadratic modularity), (iv) h–Louvain with τ=3𝜏3\tau=3italic_τ = 3 (cubic modularity), and (v) h–Louvain with τ𝜏\tau\to\inftyitalic_τ → ∞ (strict modularity). We count the number of edges with 5, 4 or, respectively, 3 nodes in the most frequent community; the remaining edges are considered to be noise.

Majority Ground-truth Louvain h–Louvain
class size communities 2-section τ=2𝜏2\tau=2italic_τ = 2 τ=3𝜏3\tau=3italic_τ = 3 τ𝜏\tau\to\inftyitalic_τ → ∞ (strict)
Strict with noise parameter ξ=0.2𝜉0.2\xi=0.2italic_ξ = 0.2
5 352 352 349 352 352
4 36 36 41 36 36
3 92 92 91 92 92
2absent2\leq 2≤ 2 42 42 41 42 42
Strict with noise parameter ξ=0.3𝜉0.3\xi=0.3italic_ξ = 0.3
5 314 314 311 314 314
4 30 30 35 30 30
3 123 123 122 123 123
2absent2\leq 2≤ 2 55 55 54 55 55
Majority with noise parameter ξ=0.2𝜉0.2\xi=0.2italic_ξ = 0.2
5 169 170 137 169 264
4 175 165 171 174 154
3 137 138 161 137 104
2absent2\leq 2≤ 2 41 49 53 42 0
Majority with noise parameter ξ=0.3𝜉0.3\xi=0.3italic_ξ = 0.3
5 158 129 88 158 206
4 145 140 148 144 147
3 158 151 196 161 169
2absent2\leq 2≤ 2 61 102 90 59 0
Table 2: The number of hyperedges of each type for h–ABCD hypergraphs with 5-edges generated with strict (strict_5) or majority (majority_5) assignment rule.

In the second column of in Table 2 we show hyperedge composition of the ground truth. As expected, for hypergraph with the strict model used the (5,5)55(5,5)( 5 , 5 ) hyperedges are most common, and for hypergraph with the majority model used (5,5)55(5,5)( 5 , 5 ), (4,5)45(4,5)( 4 , 5 ), and (3,5)35(3,5)( 3 , 5 ) hyperedges have similar frequencies (this holds both for ξ=0.2𝜉0.2\xi=0.2italic_ξ = 0.2 and ξ=0.3𝜉0.3\xi=0.3italic_ξ = 0.3). The crucial observation is that regardless of which of the modularity function is optimized (Louvain 2-section or h–Louvain with varying τ𝜏\tauitalic_τ), the recovered hyperedge composition is similar to the ground truth. This observation suggests using the following approach in cases where the user does not have a prior preference for the τ𝜏\tauitalic_τ parameter in the h–Louvain algorithm.

As a rule of thumb, running a quick clustering (for example with 2-section Louvain) as a part of Exploratory Data Analysis (EDA), and looking at the composition of edge types is a recommended first step that can be used to decide on the value(s) of τ𝜏\tauitalic_τ one wants to use as the objective τ𝜏\tauitalic_τ-modularity function for h–Louvain. In general, there are two major possible scenarios that the user could consider. Seeing mostly “’pure” edges suggests using large value of τ𝜏\tauitalic_τ (or strict modularity), while the opposite suggests using a smaller values of τ𝜏\tauitalic_τ such as τ=2𝜏2\tau=2italic_τ = 2 or τ=3𝜏3\tau=3italic_τ = 3.

5.2 Synthetic h–ABCD Hypergraphs

We ran a series of experiments using the synthetic h–ABCD benchmark hypergraphs. For each family of hypergraphs, we considered a wide range of values for the noise parameter ξ𝜉\xiitalic_ξ, and for each ξ𝜉\xiitalic_ξ, we generated 30 independent copies of h–ABCD hypergraphs. For each hypergraph, we obtained clusterings in various ways:

  • taking the 2-section (weighted) graph and applying the Louvain algorithm several times, kee** the partition with the largest (graph) modularity;

  • running Kumar’s algorithm;

  • running our h–Louvain algorithm with Bayesian optimization for τ=2𝜏2\tau=2italic_τ = 2 and τ=3𝜏3\tau=3italic_τ = 3, and

  • running our h–Louvain algorithm with Bayesian optimization using the strict modularity (τ𝜏\tau\to\inftyitalic_τ → ∞) as the objective function.

In the analysis of the results, we computed the AMI of each partition with respect to the ground truth communities. The plots in Figures 35 show the difference of the AMI of a given algorithm and the AMI of 2-section result. In other words, we measure how much gain/loss is obtained by switching from a standard 2-section approach to finding communities in a hypergraph to our algorithm designed specifically for hypergraphs.

Refer to caption
Refer to caption
Figure 3: Results with h–ABCD hypergraphs with strict model for the community edge composition, (strict_5 and strict_2to5) showing AMI difference between 2-section communities and the considered algorithms. Positive values indicate increase of AMI for a given algorithm.
Refer to caption
Refer to caption
Figure 4: Results with h–ABCD hypergraphs with linear model for the community edge composition, (linear_5 and linear_2to5) showing AMI difference between 2-section communities and the considered algorithms. Positive values indicate increase of AMI for a given algorithm.
Refer to caption
Refer to caption
Figure 5: Results with h–ABCD hypergraphs with majority model for the community edge composition, (majority_5 and majority_2to5) showing AMI difference between 2-section communities and the considered algorithms. Positive values indicate increase of AMI for a given algorithm.

Here are some general remarks from those experiments:

  • The hypergraph specialized (h-Louvain and Kumar) algorithms give the most substantial benefits for moderately noisy hypergraphs. The reason is that for hypergraphs with very low level of noise (values of ξ𝜉\xiitalic_ξ close to zero) all algorithms produce similar results, as the community-finding problem is simple, and for very noisy graph (values of ξ𝜉\xiitalic_ξ close to one) the noise itself creates spurious communities that the algorithm start to recover (this effect has been previously studied and analytically analyzed for the ABCD graphs in [26]).

  • Our h–Louvain algorithm with τ=2𝜏2\tau=2italic_τ = 2 and τ=3𝜏3\tau=3italic_τ = 3 outperforms Kumar’s algorithm most of the time. The exceptions are a few cases with large amount of noise in the hypergraph (values of ξ𝜉\xiitalic_ξ close to one).

  • Strict h–Louvain modularity function (τ𝜏\tau\to\inftyitalic_τ → ∞) may work poorly for hypergraphs that have many non-pure community hyperedges. For this reason, in the case of absence of a prior preference, users should follow the initial verification procedure described in Section 5.1 before using this parameterization. The reason is that for τ𝜏\tau\to\inftyitalic_τ → ∞, all hyperedges of type (c,d)𝑐𝑑(c,d)( italic_c , italic_d ) with c<d𝑐𝑑c<ditalic_c < italic_d are not counted as community hyperedges, which would loose potentially valuable information in cases when they are indeed informative.

5.3 More Challenging Case—Synthetic Hypergraphs with Localized Noise

In this experiment, we simulate an example in which the difficulty in recovering communities is due to the fact that several “noise” edges touch a small number of communities instead of being sprinkled over several communities. To that end, we generated the h–ABCD hypergraph with n=300𝑛300n=300italic_n = 300 nodes, degree exponent α=2.5𝛼2.5\alpha=2.5italic_α = 2.5 in the range [5,30]530[5,30][ 5 , 30 ], community size exponent β=1.5𝛽1.5\beta=1.5italic_β = 1.5 in the range [40,60]4060[40,60][ 40 , 60 ], edges of size 5 with purity distribution for community hyperedges set to (0.7, 0.2, 0.1), i.e. 70% of them have 3 community nodes (type (3,5)35(3,5)( 3 , 5 )), 20% have 4 (type (4,5)45(4,5)( 4 , 5 )), and 10% have 5 (pure hyperedges, type (5,5)55(5,5)( 5 , 5 )). Overall noise is set to ξ=0.2𝜉0.2\xi=0.2italic_ξ = 0.2, but we also add 35 additional 5-edges where nodes are randomly sampled from the two smallest communities. This simulates “localized” noise which should make the community detection more challenging.

First, simulating a real-life application of the procedure we proposed in Subsection 5.1, we look at the edge composition when running a 2-section clustering, which is reported in Table 3. Running this quick experiment is indicative that smaller values for τ𝜏\tauitalic_τ are likely to be a better choice than using the strict modularity version, since there are not that many “pure” edges.

d𝑑ditalic_d c𝑐citalic_c frequency
5 5 58
5 4 158
5 3 247
5 2 120
5 1 7
Table 3: Number of edges of each type for h–ABCD hypergraphs with 5-edges and localized noise added. The partition was obtained by running Louvain on the weighted 2-section graph.

We did 100 runs for each choice of the τ𝜏\tauitalic_τ-modularity for our h–Louvain: strict (τ𝜏\tau\to\inftyitalic_τ → ∞) and τ{0,0.5,1,1.5,2,2.5,3,3.5,4}𝜏00.511.522.533.54\tau\in\{0,0.5,1,1.5,2,2.5,3,3.5,4\}italic_τ ∈ { 0 , 0.5 , 1 , 1.5 , 2 , 2.5 , 3 , 3.5 , 4 }. We also did 100 runs using the 2-section modularity with Louvain, and Kumar’s algorithm. The results are presented in Figure 6. From those results, we see that clustering the 2-section graph or using Kumar’s algorithm yield good results, but we can improve those results when optimizing the hypergraph τ𝜏\tauitalic_τ-modularity when choosing τ2𝜏2\tau\approx 2italic_τ ≈ 2. As expected from the preliminary EDA analysis we performed earlier, using small values for τ𝜏\tauitalic_τ (close to zero) or large values for τ𝜏\tauitalic_τ (including strict modularity, τ𝜏\tau\to\inftyitalic_τ → ∞) are bad choices in this case.

Refer to caption
Figure 6: Results of 100 runs for several choices of hypergraph τ𝜏\tauitalic_τ-modularity for h–Louvain (strict, and with 0τ40𝜏40\leq\tau\leq 40 ≤ italic_τ ≤ 4) as well as using Louvain on 2-section modularity and Kumar.

5.4 Real-world Hypergraphs

We consider two real-world hypergraphs the primary-school contact and the cora. We first run the EDA procedure, suggested in Subsection 5.1, on both graphs by looking at the hyperedge composition associated with the corresponding 2-section clusterings. The results are presented in Table 4. We can see that the primary-school hypergraph (left) has relatively more non-pure hyperedges than the cora hypergraph. This indicates that one should expect that for the primary-school case the optimal τ𝜏\tauitalic_τ is smaller than the one for the cora hypergraph.

primary-school
d c purity frequency
2 1 50% 4051
2 2 100% 3697
3 3 100% 3385
3 2 67% 1054
3 1 33% 161
4 4 100% 240
4 3 75% 58
4 2 50% 47
5 4 80% 3
5 3 60% 3
cora
d c purity frequency
2 2 100% 512
3 3 100% 354
4 4 100% 212
5 5 100% 108
3 2 67% 85
4 3 75% 62
2 1 50% 60
5 4 80% 41
4 2 50% 32
5 3 60% 21
Table 4: The number of edges of each type (top-10 most frequent) for the primary-school contact (left) and cora (right) hypergraphs. The corresponding partitions were obtained by running Louvain on the weighted 2-section graph.

5.4.1 Contact Hypergraph

Let us first consider the primary-school contact hypergraph described in Section 3.2. The results are shown in Figure 7, where we compare 2-section (graph) clustering with Louvain, Kumar, and AON clustering as well as our h–Louvain using different values of τ𝜏\tauitalic_τ for the modularity function. The AMI scores are averaged over 30 runs. The variance is negligible and is not shown. From this experiment, we see that one can get some improvement over 2-section and Louvain or Kumar’s algorithm when using small values for τ𝜏\tauitalic_τ in our h–Louvain.

Refer to caption
Figure 7: Results for several choices of hypergraph τ𝜏\tauitalic_τ-modularity for h–Louvain (strict and with 0τ40𝜏40\leq\tau\leq 40 ≤ italic_τ ≤ 4) as well as using 2-section modularity and Louvain, Kumar’s, and AON algorithms.

5.4.2 Co-citation Hypergraphs

Next, we consider the cora co-citation hypergraph described in Subsection 3.2 in which nodes are publications which belong to 7 categories, and hyperedges represent co-citations. Since the graph has several small disconnected components, we restrict ourselves to the giant component which has 1,330 nodes and 1,503 hyperedges.

We ran each clustering algorithm 50 times, with the results reported in Figure 8. The results with AON were worse in this case (with AMI around 0.21, not reported in Figure 8). Instead, we report the results when starting from a partition returned by the 2-section graph clustering before running AON, which give better results. As expected from the EDA analysis comparing primary-school and cora hypergraphs, for the cora hypergraph, we get good results running h–Louvain with values of τ𝜏\tauitalic_τ larger than for the primary-school hypergraph, slightly improving on the results with 2-section graph clustering with Louvain, Kumar’s algorithm, or AON.

Refer to caption
Figure 8: Clustering the cora co-citation hypergraph.

6 Conclusions

In this paper, we proposed a modification of the classical Louvain algorithm that allows us to optimize the hypergraph modularity, h-Louvain. Our approach is to optimize a weighted average of the 2-section graph modularity and the hypergraph modularity, with an increasing weight of hypergraph modularity component as the optimization process progresses. We presented both theoretical arguments as well as empirical evidence that the approach of increasing the weight of the hypergraph modularity component is efficient. Since there are several ways to update this weight, we developed a method allowing for automatic selection of hyperparameters of this process using Bayesian optimization. We have shown that the h–Louvain algorithm is competitive and, in particular, that it can outperform both Louvain on 2-section graph and Kumar’s algorithms in terms of recovering ground truth communities both for synthetic and real networks.

Additionally, let us mention about another important and interesting aspect. Since in h–Louvain the optimization process is stochastic by nature, the results of a single optimization pass can be easily improved by running many such optimizations in parallel. Therefore, an important extension to the algorithm is for allowing it to learn how to dynamically set the tuneable parameters when multiple optimization processes are executed.

References

  • [1] Kwangjun Ahn, Kangwook Lee, and Changho Suh. Hypergraph spectral clustering in the weighted stochastic block model. IEEE Journal of Selected Topics in Signal Processing, 12(5):959–974, 2018.
  • [2] Jordan Barrett, Bogumił Kamiński, Paweł Prałat, and François Théberge. Self-similarity of communities of the abcd model. preprint, arXiv, 2023.
  • [3] Jordan Barrett, Bogumił Kamiński, Paweł Prałat, and François Théberge. Self-similarity of communities of the abcd model. In International Workshop on Algorithms and Models for the Web-Graph, pages 17–31. Springer, 2024.
  • [4] Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 874:1–92, 2020.
  • [5] Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences, 115(48):E11221–E11230, 2018.
  • [6] Austin R Benson, David F Gleich, and Desmond J Higham. Higher-order network analysis takes off, fueled by classical ideas and new data. arXiv preprint arXiv:2103.05031, 2021.
  • [7] Austin R Benson, David F Gleich, and Jure Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 118–126. SIAM, 2015.
  • [8] Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
  • [9] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
  • [10] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE transactions on knowledge and data engineering, 20(2):172–188, 2007.
  • [11] I Chien, Chung-Yi Lin, and I-Hsiang Wang. Community detection in hypergraphs: Optimal statistical limit and efficient algorithms. In International Conference on Artificial Intelligence and Statistics, pages 871–879. PMLR, 2018.
  • [12] Philip S Chodrow, Nate Veldt, and Austin R Benson. Generative hypergraph clustering: From blockmodels to modularity. Science Advances, 7(28):eabh1303, 2021.
  • [13] Fan Chung and Linyuan Lu. Complex Graphs and Networks. Number 107 in Conference Board of the mathematical science. American Mathematical Society, 2006.
  • [14] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.
  • [15] Gonzalo Contreras-Aso, Regino Criado, Guillermo Vera de Salas, and **ling Yang. Detecting communities in higher-order networks by using their derivative graphs. Chaos, Solitons & Fractals, 177:114200, 2023.
  • [16] Kaize Ding, Jianling Wang, Jundong Li, Dingcheng Li, and Huan Liu. Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387, 2020.
  • [17] David Easley and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge university press, 2010.
  • [18] Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D Mitchell, Brenda Praggastis, Amie J Eisfeld, Amy C Sims, Larissa B Thackray, et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC bioinformatics, 22(1):287, 2021.
  • [19] Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75–174, 2010.
  • [20] Santo Fortunato and Marc Barthelemy. Resolution limit in community detection. Proceedings of the national academy of sciences, 104(1):36–41, 2007.
  • [21] Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  • [22] Katarzyna Grzesiak-Kopeć, Piotr Oramus, and Maciej Ogorzałek. Hypergraphs and extremal optimization in 3d integrated circuit design automation. Advanced Engineering Informatics, 33:491–501, 2017.
  • [23] Matthew O Jackson. Social and economic networks. Princeton university press, 2010.
  • [24] Jonas L Juul, Austin R Benson, and Jon Kleinberg. Hypergraph patterns and collaboration structure. arXiv preprint arXiv:2210.02163, 2022.
  • [25] Bogumił Kamiński, Paweł Misiorek, Paweł Prałat, and François Théberge. Modularity based community detection in hypergraphs. In International Workshop on Algorithms and Models for the Web-Graph, pages 52–67. Springer, 2023.
  • [26] Bogumił Kamiński, Bartosz Pankratz, Paweł Prałat, and François Théberge. Modularity of the abcd random graph model with community structure. Journal of Complex Networks, 10(6):cnac050, 2022.
  • [27] Bogumił Kamiński, Valérie Poulin, Paweł Prałat, Przemysław Szufel, and François Théberge. Clustering via hypergraph modularity. PloS one, 14(11):e0224307, 2019.
  • [28] Bogumił Kamiński, Paweł Prałat, and François Théberge. Community detection algorithm using hypergraph modularity. In International Conference on Complex Networks and Their Applications, pages 152–163. Springer, 2020.
  • [29] Bogumił Kamiński, Paweł Prałat, and François Théberge. Artificial benchmark for community detection (abcd)—fast random graph model with community structure. Network Science, pages 1–26, 2021.
  • [30] Bogumił Kamiński, Paweł Prałat, and François Théberge. Mining Complex Networks. Chapman and Hall/CRC, 2021.
  • [31] Bogumił Kamiński, Paweł Prałat, and François Théberge. Artificial benchmark for community detection with outliers (abcd+o). Applied Network Science, 8(1):25, 2023.
  • [32] Bogumił Kamiński, Paweł Prałat, and François Théberge. Hypergraph artificial benchmark for community detection (h–abcd). Journal of Complex Networks, 11(4):cnad028, 2023.
  • [33] Bogumił Kamiński, Tomasz Olczak, Bartosz Pankratz, Paweł Prałat, and François Théberge. Properties and performance of the abcde random graph model with community structure. Big Data Research, 30:100348, 2022.
  • [34] Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. Hypergraph clustering by iteratively reweighted modularity maximization. Applied Network Science, 5(52), 2020.
  • [35] Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. A new measure of modularity in hypergraphs: Theoretical insights and implications for effective clustering. In Hocine Cherifi, Sabrina Gaito, José Fernendo Mendes, Esteban Moro, and Luis Mateus Rocha, editors, Complex Networks and Their Applications VIII, pages 286–297, Cham, 2020. Springer International Publishing.
  • [36] Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes. Understanding complex systems: From networks to optimal higher-order models. arXiv preprint arXiv:1806.05977, 2018.
  • [37] Andrea Lancichinetti and Santo Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlap** communities. Physical Review E, 80(1):016118, 2009.
  • [38] Andrea Lancichinetti and Santo Fortunato. Limits of modularity maximization in community detection. Physical review E, 84(6):066122, 2011.
  • [39] Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Benchmark graphs for testing community detection algorithms. Physical review E, 78(4):046110, 2008.
  • [40] Geon Lee, Fanchen Bu, Tina Eliassi-Rad, and Kijung Shin. A survey on hypergraph mining: Patterns, tools, and generators. arXiv preprint arXiv:2401.08878, 2024.
  • [41] Geon Lee, Minyoung Choe, and Kijung Shin. How do hyperedges overlap in real-world hypergraphs?-patterns, measures, and generators. In Proceedings of the Web Conference 2021, pages 3396–3407, 2021.
  • [42] Xiaowei Liao, Yong Xu, and Haibin Ling. Hypergraph neural networks for hypergraph matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1266–1275, 2021.
  • [43] Rossana Mastrandrea, Julie Fournet, and Alain Barrat. Contact patterns in a high school: A comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE, 2015.
  • [44] Stan Matwin, Aristides Milios, Paweł Prałat, Amilcar Soares, and François Théberge. Generative Methods for Social Media Analysis. Springer Nature, 2023.
  • [45] Mark Newman. Networks. Oxford university press, 2018.
  • [46] Mark EJ Newman. Fast algorithm for detecting community structure in networks. Physical review E, 69(6):066133, 2004.
  • [47] Mark EJ Newman. Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23):8577–8582, 2006.
  • [48] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.
  • [49] Juliette Stehlé and et al. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE, 2011.
  • [50] Hao Tian and Reza Zafarani. Higher-order networks representation and learning: A survey. arXiv preprint arXiv:2402.19414, 2024.
  • [51] Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019.
  • [52] Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Lizhen Cui, and Xiangliang Zhang. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI conference on artificial intelligence, volume 35 (5), pages 4503–4511, 2021.
  • [53] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. Hypergcn: A new method for training graph convolutional networks on hypergraphs. In Advances in Neural Information Processing Systems (NeurIPS) 32, pages 1509–1520. Curran Associates, Inc., 2019.
  • [54] Sudo Yi and Deok-Sun Lee. Structure of international trade hypergraphs. Journal of Statistical Mechanics: Theory and Experiment, 2022(10):103402, 2022.
  • [55] Hao Yin, Austin R Benson, and Jure Leskovec. Higher-order clustering in networks. Physical Review E, 97(5):052306, 2018.
  • [56] Hao Yin, Austin R Benson, Jure Leskovec, and David F Gleich. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 555–564, 2017.

Appendix A Appendix—Pseudo-code of the h-Louvain Algorithm

Algorithm 1 h-Louvain(H, ΓΓ\Gammaroman_Γ)
1:H=(V,E)𝐻𝑉𝐸H=(V,E)italic_H = ( italic_V , italic_E ) – input hypergraph; ΓΓ\Gammaroman_Γ – policy to control α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] defined using parameters pbsubscript𝑝𝑏p_{b}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
2:𝐀𝐀{\bf{A}}bold_A – partition of V𝑉Vitalic_V; qH(𝐀)subscript𝑞𝐻𝐀q_{H}(\bf{A})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A ) - hypergraph modularity
3:Initialize: Build G=(V,EG)𝐺𝑉subscript𝐸𝐺G=(V,E_{G})italic_G = ( italic_V , italic_E start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (2-section), and set partition 𝐀𝐀\bf{A}bold_A with all vertices vV𝑣𝑉v\in Vitalic_v ∈ italic_V in their own cluster
4:modified \leftarrow True
5:α0𝛼0\alpha\leftarrow 0italic_α ← 0
6:n|V|𝑛𝑉n\leftarrow|V|italic_n ← | italic_V |
7:while modified do
8:     modified \leftarrow False
9:     improved \leftarrow True
10:     while improved do
11:         improved \leftarrow False
12:         randomize the order of vertices in V𝑉Vitalic_V
13:         for vV𝑣𝑉v\in Vitalic_v ∈ italic_V do
14:              bestCommunity Aiabsentsubscript𝐴𝑖\leftarrow A_{i}← italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (current community of v𝑣vitalic_v)
15:              bestDelta0𝑏𝑒𝑠𝑡𝐷𝑒𝑙𝑡𝑎0bestDelta\leftarrow 0italic_b italic_e italic_s italic_t italic_D italic_e italic_l italic_t italic_a ← 0
16:              for all neighbouring communities Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of v𝑣vitalic_v do
17:                  deltaModularity(1α)ΔqG(A)+αΔqH(A)𝑑𝑒𝑙𝑡𝑎𝑀𝑜𝑑𝑢𝑙𝑎𝑟𝑖𝑡𝑦1𝛼Δsubscript𝑞𝐺superscript𝐴𝛼Δsubscript𝑞𝐻superscript𝐴deltaModularity\leftarrow(1-\alpha)\Delta q_{G}(A^{\prime})+\alpha\Delta q_{H}% (A^{\prime})italic_d italic_e italic_l italic_t italic_a italic_M italic_o italic_d italic_u italic_l italic_a italic_r italic_i italic_t italic_y ← ( 1 - italic_α ) roman_Δ italic_q start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + italic_α roman_Δ italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (Asuperscript𝐴A^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT: v𝑣vitalic_v moved from Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT)
18:                  if deltaModularity>bestDelta𝑑𝑒𝑙𝑡𝑎𝑀𝑜𝑑𝑢𝑙𝑎𝑟𝑖𝑡𝑦𝑏𝑒𝑠𝑡𝐷𝑒𝑙𝑡𝑎deltaModularity>bestDeltaitalic_d italic_e italic_l italic_t italic_a italic_M italic_o italic_d italic_u italic_l italic_a italic_r italic_i italic_t italic_y > italic_b italic_e italic_s italic_t italic_D italic_e italic_l italic_t italic_a then
19:                       bestDeltadeltaModularity𝑏𝑒𝑠𝑡𝐷𝑒𝑙𝑡𝑎𝑑𝑒𝑙𝑡𝑎𝑀𝑜𝑑𝑢𝑙𝑎𝑟𝑖𝑡𝑦bestDelta\leftarrow deltaModularityitalic_b italic_e italic_s italic_t italic_D italic_e italic_l italic_t italic_a ← italic_d italic_e italic_l italic_t italic_a italic_M italic_o italic_d italic_u italic_l italic_a italic_r italic_i italic_t italic_y
20:                       bestCommunity𝑏𝑒𝑠𝑡𝐶𝑜𝑚𝑚𝑢𝑛𝑖𝑡𝑦absentbestCommunity\leftarrowitalic_b italic_e italic_s italic_t italic_C italic_o italic_m italic_m italic_u italic_n italic_i italic_t italic_y ← Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
21:                       improved𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑absentimproved\leftarrowitalic_i italic_m italic_p italic_r italic_o italic_v italic_e italic_d ← True
22:                  end if
23:              end for
24:              if improved𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑improveditalic_i italic_m italic_p italic_r italic_o italic_v italic_e italic_d then
25:                  change 𝐀𝐀\bf{A}bold_A by moving v𝑣vitalic_v to bestCommunity𝑏𝑒𝑠𝑡𝐶𝑜𝑚𝑚𝑢𝑛𝑖𝑡𝑦bestCommunityitalic_b italic_e italic_s italic_t italic_C italic_o italic_m italic_m italic_u italic_n italic_i italic_t italic_y
26:                  modified𝑚𝑜𝑑𝑖𝑓𝑖𝑒𝑑absentmodified\leftarrowitalic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d ← True
27:                  α𝛼absent\alpha\leftarrowitalic_α ← UpdateAlpha(𝐀𝐀\bf{A}bold_A,n𝑛nitalic_n,α𝛼\alphaitalic_α,ΓΓ\Gammaroman_Γ)
28:              end if
29:         end for
30:     end while
31:     if modified𝑚𝑜𝑑𝑖𝑓𝑖𝑒𝑑modifieditalic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d then
32:         update H=(V,E)𝐻𝑉𝐸H=(V,E)italic_H = ( italic_V , italic_E ) (merge current communities into supernodes and update edges)
33:     else if α<1𝛼1\alpha<1italic_α < 1 then
34:         α1𝛼1\alpha\leftarrow 1italic_α ← 1
35:         revert the last merging communities step
36:         modified𝑚𝑜𝑑𝑖𝑓𝑖𝑒𝑑absentmodified\leftarrowitalic_m italic_o italic_d italic_i italic_f italic_i italic_e italic_d ← True
37:     end if
38:end while
39:return 𝐀𝐀\bf{A}bold_A, qH(𝐀)subscript𝑞𝐻𝐀q_{H}(\bf{A})italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( bold_A )
Algorithm 2 UpdateAlpha(A, n𝑛nitalic_n, α𝛼\alphaitalic_α, ΓΓ\Gammaroman_Γ)
1:current partition 𝐀𝐀\bf{A}bold_A, total number of nodes n𝑛nitalic_n, previous value of α𝛼\alphaitalic_α, policy ΓΓ\Gammaroman_Γ to control α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] defined using parameters pbsubscript𝑝𝑏p_{b}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
2:new value of α𝛼\alphaitalic_α
3:if α<1𝛼1\alpha<1italic_α < 1 then
4:     |𝐀|𝐀absent|\bf{A}|\leftarrow| bold_A | ← number of communities in current partition 𝐀𝐀\bf{A}bold_A
5:     jargmaxk(k:|𝐀|npck)𝑗subscript𝑘:𝑘𝐀𝑛superscriptsubscript𝑝𝑐𝑘j\leftarrow\arg\max_{k\in\mathbb{N}}(k:\frac{|{\bf{A}}|}{n}\leq p_{c}^{k})italic_j ← roman_arg roman_max start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT ( italic_k : divide start_ARG | bold_A | end_ARG start_ARG italic_n end_ARG ≤ italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )
6:     α1(1pb)j𝛼1superscript1subscript𝑝𝑏𝑗\alpha\leftarrow 1-(1-p_{b})^{j}italic_α ← 1 - ( 1 - italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT
7:     return α𝛼\alphaitalic_α
8:else
9:     return 1
10:end if