Dataless Quadratic Neural Networks for the Maximum Independent Set Problem

Ismail Alkhouri1,2,  Cedric Le Denmat3,  Yingjie Li4,  Cunxi Yu4,  Jia Liu3,
 Rongrong Wang1,  Alvaro Velasquez5
1Michigan State University
2University of Michigan Ann Arbor,
3Ohio State University,
4University of Maryland College Park,
5University of Colorado Boulder
Abstract

Combinatorial Optimization (CO) plays a crucial role in addressing various significant problems, among them the challenging Maximum Independent Set (MIS) problem. In light of recent advancements in deep learning methods, efforts have been directed towards leveraging data-driven learning approaches, typically rooted in supervised learning and reinforcement learning, to tackle the NP-hard MIS problem. However, these approaches rely on labeled datasets, exhibit weak generalization, and often depend on problem-specific heuristics. Recently, ReLU-based dataless neural networks were introduced to address combinatorial optimization problems. This paper introduces a novel dataless quadratic neural network formulation, featuring a continuous quadratic relaxation for the MIS problem. Notably, our method eliminates the need for training data by treating the given MIS instance as a trainable entity. More specifically, the graph structure and constraints of the MIS instance are used to define the structure and parameters of the neural network such that training it on a fixed input provides a solution to the problem, thereby setting it apart from traditional supervised or reinforcement learning approaches. By employing a gradient-based optimization algorithm like ADAM and leveraging an efficient off-the-shelf GPU parallel implementation, our straightforward yet effective approach demonstrates competitive or superior performance compared to state-of-the-art learning-based methods. Another significant advantage of our approach is that, unlike exact and heuristic solvers, the running time of our method scales only with the number of nodes in the graph, not the number of edges.

1 Introduction

In his landmark paper [1], Richard Karp introduced the concept of reducibility among combinatorial problems that are complete for the complexity class 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP. This pivotal work established a connection between combinatorial optimization problems and the 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP-𝗁𝖺𝗋𝖽𝗁𝖺𝗋𝖽\mathsf{hard}sansserif_hard complexity class, implying their inherent computational challenges. Although these problems are notorious for their intractability, they have proven to be foundational in various sectors [2], demonstrating their widespread applicability. While a polynomial-time solver remains elusive for solving 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP-𝗁𝖺𝗋𝖽𝗁𝖺𝗋𝖽\mathsf{hard}sansserif_hard problems with respect to (w.r.t.) the input size, various efficient solvers have been developed [3]. Such solvers can be broadly classified into heuristic algorithms [4], branch-and-bound-based global optimization methods [5], and approximation algorithms [6].

In the 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP-𝗁𝖺𝗋𝖽𝗁𝖺𝗋𝖽\mathsf{hard}sansserif_hard complexity class, one of the most fundamental problems is the ‘Maximum Independent Set’ (MIS) problem, which is concerned with determining a subset of vertices in a graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) with maximum cardinality, such that no two vertices in this subset are connected by an edge [7]. In the past few decades, in addition to commercial Integer Programming (IP) solvers (e.g., CPLEX [8], Gurobi [9], and most recently CP-SAT [10]), powerful heuristic methods (e.g., ReduMIS in [3]) have been introduced to tackle the complexities inherent in the MIS problem. Notably, a plethora of data-driven machine learning approaches were proposed for solving the MIS problem [11, 12, 13]. These methods fall into two categories: Supervised Learning (SL) approaches and Reinforcement Learning (RL) approaches.

However, both data-driven SL and RL approaches are known for their unsatisfactory generalization performance when faced with graph instances exhibiting structural characteristics different from those in the training dataset [12] (see Section 2.2 for further discussion). Additionally, the number of training parameters in learning-based methods is significantly larger than our approach. For instance, the network used in the most recent state-of-the-art (SOTA) method, DIFUSCO [14], consists of 12 layers, each with five trainable weight matrices of dimensions 256×256256256256\times 256256 × 256, resulting in nearly four million trainable parameters for the SATLIB dataset, which has graphs of at most 1347134713471347 nodes. By comparison, our approach would require n=1347𝑛1347n=1347italic_n = 1347 trainable parameters. Moreover, many of these existing methods achieve SOTA results only when employing various MIS-specific subroutines, as thoroughly analyzed and elucidated in the recent work by [12]. The limitations of these data-dependent methods lead to an open question:

(Q): Can we develop a new dataless neural-network-based approach to retain the use of gradient-based optimization algorithms (e.g., Adam [15]), so that it still rivals the competitive results achieved by learning-based methods while avoiding their pitfalls?

In this paper, we answer this question affirmatively by proposing a dataless quadratic neural network approach (dQNN). Collectively, our proposed dataless neural network offers a novel neural-network-based method for addressing combinatorial optimization problems without the need for any training data, hence resolving the out-of-distribution generalization challenges in existing learning-based methods. More concretely, given the MIS instance G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), we encode G𝐺Gitalic_G into the parameters of a neural network such that those parameters yield the solution to the given MIS problem after training. Such neural architecture designs can further benefit from GPU implementations with massive parallelism when compared to classical non-learning methods. The main contributions of our work are summarized as follows:

  • We first propose dataless quadratic networks (𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net) that encode the input graph and its complement. The neural architecture of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net implicitly defines a continuous and differentiable relaxation of the MIS problem, thereby enabling an efficient optimization process and paving the way for enhanced performance for solving the MIS problem.

  • To improve the exploration of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net for solving the MIS problem, we propose three initialization schemes: (i) a sampling from the uniform distribution when the degrees of all nodes are similar, (ii) a sampling scheme based on a continuous semidefinite programming (SDP) relaxation of the MIS problem for sparse graphs, and (iii) a degree-based initialization scheme for dense graphs.

  • We provide a theoretical analysis on sufficient and necessary conditions of the edges-penalty parameter for 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net. Furthermore, we provide theoretical insights on the local minimizers. These derivations shed light on the underlying dynamics of the optimization process associated with our proposed dataless neural network. The theoretical foundation strengthens the understanding of how these parameters influence the behavior of the optimization algorithm.

  • Our experiments on known challenging graph datasets, utilizing standard tools such as the Adam optimizer and GPU libraries, establish the efficacy of our proposed 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net approach, which shows competitive or superior performance compared to SOTA data-driven learning approaches.

2 Preliminaries and related work

2.1 The MIS problem formulations

Notations:

Consider an undirected graph represented as G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), where V𝑉Vitalic_V is the vertex set and EV×V𝐸𝑉𝑉E\subseteq V\times Vitalic_E ⊆ italic_V × italic_V is the edge set. The cardinality of a set is denoted by |||\cdot|| ⋅ |. The number of nodes (resp. edges) is denoted by |V|=n𝑉𝑛|V|=n| italic_V | = italic_n (resp. |E|=m𝐸𝑚|E|=m| italic_E | = italic_m). Unless otherwise stated, for a node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, we use 𝒩(v)={uV(u,v)E}𝒩𝑣conditional-set𝑢𝑉𝑢𝑣𝐸\mathcal{N}(v)=\{u\in V\mid(u,v)\in E\}caligraphic_N ( italic_v ) = { italic_u ∈ italic_V ∣ ( italic_u , italic_v ) ∈ italic_E } to denote the set of its neighbors. The degree of a node vV𝑣𝑉v\in Vitalic_v ∈ italic_V is denoted by d(v)=|𝒩(v)|d𝑣𝒩𝑣\textrm{d}(v)=|\mathcal{N}(v)|d ( italic_v ) = | caligraphic_N ( italic_v ) |, and the maximum degree of the graph by Δ(G)Δ𝐺\Delta(G)roman_Δ ( italic_G ). For a subset of nodes UV𝑈𝑉U\subseteq Vitalic_U ⊆ italic_V, we use G[U]=(U,E[U])𝐺delimited-[]𝑈𝑈𝐸delimited-[]𝑈G[U]=(U,E[U])italic_G [ italic_U ] = ( italic_U , italic_E [ italic_U ] ) to represent the subgraph induced by the nodes in U𝑈Uitalic_U, where E[U]={(u,v)Eu,vU}𝐸delimited-[]𝑈conditional-set𝑢𝑣𝐸𝑢𝑣𝑈E[U]=\{(u,v)\in E\mid u,v\in U\}italic_E [ italic_U ] = { ( italic_u , italic_v ) ∈ italic_E ∣ italic_u , italic_v ∈ italic_U }. Given a graph G𝐺Gitalic_G, its complement is denoted by G=(V,E)superscript𝐺𝑉superscript𝐸G^{\prime}=(V,E^{\prime})italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_V , italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where E=V×VEsuperscript𝐸𝑉𝑉𝐸E^{\prime}=V\times V\setminus Eitalic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_V × italic_V ∖ italic_E is the set of all the edges between nodes that are not connected in G𝐺Gitalic_G. Consequently, if |E|=msuperscript𝐸superscript𝑚|E^{\prime}|=m^{\prime}| italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then m+m=n(n1)/2𝑚superscript𝑚𝑛𝑛12m+m^{\prime}=n(n-1)/2italic_m + italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_n ( italic_n - 1 ) / 2 represents the number of edges in the complete graph on V𝑉Vitalic_V. The graph adjacency matrix of graph G𝐺Gitalic_G is denoted by 𝐀G{0,1}n×nsubscript𝐀𝐺superscript01𝑛𝑛\mathbf{A}_{G}\in\{0,1\}^{n\times n}bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT. We use 𝐈𝐈\mathbf{I}bold_I to denote the identity matrix. The element-wise product of two matrices 𝐀𝐀\mathbf{A}bold_A and 𝐁𝐁\mathbf{B}bold_B is denoted by 𝐀𝐁𝐀𝐁\mathbf{A}\circ\mathbf{B}bold_A ∘ bold_B. The trace of a matrix 𝐀𝐀\mathbf{A}bold_A is denoted by tr(𝐀)tr𝐀\mathrm{tr}(\mathbf{A})roman_tr ( bold_A ). We use diag(𝐀)diag𝐀\mathrm{diag}(\mathbf{A})roman_diag ( bold_A ) to denote the diagonal of 𝐀𝐀\mathbf{A}bold_A. For any positive integer n𝑛nitalic_n, [n]:={1,,n}assigndelimited-[]𝑛1𝑛[n]:=\{1,\ldots,n\}[ italic_n ] := { 1 , … , italic_n }. The vector (resp. matrix) of all ones and size n𝑛nitalic_n (resp. n×n𝑛𝑛n\times nitalic_n × italic_n) is denoted by 𝐞nsubscript𝐞𝑛\mathbf{e}_{n}bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (resp. 𝐉nsubscript𝐉𝑛\mathbf{J}_{n}bold_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT). Furthermore, we use 𝟙()1\mathds{1}(\cdot)blackboard_1 ( ⋅ ) to denote the indicator function that returns 1111 (resp. 00) when its argument is True (resp. False).

Problem Statement:

In this paper, we consider the 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP-𝗁𝖺𝗋𝖽𝗁𝖺𝗋𝖽\mathsf{hard}sansserif_hard problem of obtaining the maximum independent sets (MIS). Next, we formally define MIS and the complementary Maximum Clique (MC) problems.

Definition 1 (MIS Problem).

Given an undirected graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), the goal of MIS is to find a subset of vertices V𝑉\mathcal{I}\subseteq Vcaligraphic_I ⊆ italic_V such that E([])=𝐸delimited-[]E([\mathcal{I}])=\emptysetitalic_E ( [ caligraphic_I ] ) = ∅, and |||\mathcal{I}|| caligraphic_I | is maximized.

Definition 2 (MC Problem).

Given an undirected graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), the goal of MC is to find a subset of vertices CV𝐶𝑉C\subseteq Vitalic_C ⊆ italic_V such that G[C]𝐺delimited-[]𝐶G[C]italic_G [ italic_C ] is a complete graph, and |C|𝐶|C|| italic_C | is maximized.

For the MC problem, the MIS of a graph is an MC of the complement graph [1]. Let each entry of binary vector 𝐳{0,1}n𝐳superscript01𝑛\mathbf{z}\in\{0,1\}^{n}bold_z ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT correspond to a node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, and is denoted by 𝐳v{0,1}subscript𝐳𝑣01\mathbf{z}_{v}\in\{0,1\}bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ { 0 , 1 }. An integer linear program (ILP) for MIS can be formulated as follows [16]:

ILP:max𝐳{0,1}nvV𝐳vs.t. 𝐳v+𝐳u1,(v,u)E.formulae-sequenceILP:subscript𝐳superscript01𝑛subscript𝑣𝑉subscript𝐳𝑣s.t. subscript𝐳𝑣subscript𝐳𝑢1for-all𝑣𝑢𝐸\displaystyle\text{{\bf ILP:}}\max_{\mathbf{z}\in\{0,1\}^{n}}\sum_{v\in V}% \mathbf{z}_{v}\quad\text{s.t.~{} }\quad\mathbf{z}_{v}+\mathbf{z}_{u}\leq 1\>,% \forall(v,u)\in E.ILP: roman_max start_POSTSUBSCRIPT bold_z ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT s.t. bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≤ 1 , ∀ ( italic_v , italic_u ) ∈ italic_E . (1)

The following quadratic integer program (QIP) in (2) (with an optimal solution that is equivalent to the optimal solution of the above ILP) can also be used to formulate the MIS problem [17]:

QIP: max𝐳{0,1}n𝐳T(𝐈𝐀G)𝐳.QIP: subscript𝐳superscript01𝑛superscript𝐳𝑇𝐈subscript𝐀𝐺𝐳\text{{\bf QIP: }}\max_{\mathbf{z}\in\{0,1\}^{n}}\mathbf{z}^{T}(\mathbf{I}-% \mathbf{A}_{G})\mathbf{z}\>.bold_QIP: roman_max start_POSTSUBSCRIPT bold_z ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_z start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_I - bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) bold_z . (2)

Furthermore, the work in [18] introduced the following semidefinite programming (SDP) relaxation of the MIS problem.

SDP: max𝐗n×ntr(𝐉n𝐗)s.t. tr(𝐗)=1,𝐗u,v=0,(u,v)E,𝐗0,formulae-sequenceSDP: subscript𝐗superscript𝑛𝑛trsubscript𝐉𝑛𝐗s.t. tr𝐗1formulae-sequencesubscript𝐗𝑢𝑣0formulae-sequencefor-all𝑢𝑣𝐸succeeds-or-equals𝐗0\begin{gathered}\text{{\bf SDP: }}\max_{\mathbf{X}\in\mathbb{R}^{n\times n}}~{% }\mathrm{tr}(\mathbf{J}_{n}\mathbf{X})~{}~{}\text{s.t.~{} }\mathrm{tr}(\mathbf% {X})=1,~{}\mathbf{X}_{u,v}=0,\forall(u,v)\in E,~{}\mathbf{X}\succeq 0\>,\end{gathered}start_ROW start_CELL SDP: roman_max start_POSTSUBSCRIPT bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_tr ( bold_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_X ) s.t. roman_tr ( bold_X ) = 1 , bold_X start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT = 0 , ∀ ( italic_u , italic_v ) ∈ italic_E , bold_X ⪰ 0 , end_CELL end_ROW (3)

where the third constraint denotes the positive semi-definiteness on the optimization matrix 𝐗𝐗\mathbf{X}bold_X. We denote the optimal solution of (3) by 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The diagonal of 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represent the ‘weight’ of each vertex in being part of the MIS, whereas the off-diagonal non-zero elements, i.e., for indices (u,v)E𝑢𝑣𝐸(u,v)\notin E( italic_u , italic_v ) ∉ italic_E, indicate how likely nodes u𝑢uitalic_u and v𝑣vitalic_v are in the MIS. While Eq. (3) is convex (which means there exists one unique optimal solution) and runs in polynomial time, obtaining the MIS from 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT requires rounding techniques (such as spectral clustering [19]) that, in most cases, do not result in an optimal MIS in the graph.

2.2 Related work

1) Exact and Heuristic Solvers: Exact approaches for 𝖭𝖯𝖭𝖯\mathsf{NP}sansserif_NP-𝗁𝖺𝗋𝖽𝗁𝖺𝗋𝖽\mathsf{hard}sansserif_hard problems typically rely on branch-and-bound global optimization techniques. However, exact approaches suffer from poor scalability, which limits their uses in large MIS problems [20]. This limitation has spurred the development of efficient approximation algorithms and heuristics. For instance, the well-known NetworkX library [21] implements a heuristic procedure for solving the MIS problem [6]. These polynomial-time heuristics often incorporate a mix of sub-procedures, including greedy algorithms, local search sub-routines, and genetic algorithms [22]. However, such heuristics generally cannot theoretically guarantee that the resulting solution is within a small factor of optimality. In fact, inapproximability results have been established for the MIS problem [23].

Among existing MIS heuristics, ReduMIS [3] has emerged as the leading approach. The ReduMIS framework contains two primary components: (i) an iterative application of various graph reduction techniques (e.g., the linear programming (LP) reduction method in [16]) with a stop** rule based on the non-applicability of these techniques; and (ii) an evolutionary algorithm. The ReduMIS algorithm initiates with a pool of independent sets and evolves them through multiple rounds. In each round, a selection procedure identifies favorable nodes by executing graph partitioning, which clusters the graph nodes into disjoint clusters and separators to enhance the solution. In contrast, our 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net approach does not require such complex algorithmic operations (e.g., solution combination operation, community detection, and local search algorithms for solution improvement) as used in ReduMIS. More importantly, ReduMIS and ILP solvers scale with the number of nodes and the number of edges (which constraints their application on highly dense graphs), whereas 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net only scales w.r.t. the number nodes, as will be demonstrated in our experimental results.

2) Data-Driven Learning-Based Solvers: As mentioned in Section 1, data-driven approaches for MIS problems can be classified into SL and RL methods. A notable SL method is proposed in [24], which combines several components including graph reductions [3], Graph Convolutional Networks (GCN) [25], guided tree search, and a solution improvement local search algorithm [26]. The GCN is trained on benchmark graphs using their solutions as ground truth labels, enabling the learning of probability maps for the inclusion of each vertex in the optimal solution. Then, a subset of ReduMIS subroutines is used to improve their solution. While the work in [24] reported on-par results to ReduMIS, it was later shown by [12] that replacing the GCN output with random values performs similarly to using the trained GCN network. Recently, DIFUSCO was introduced in [14], an approach that integrates Graph Neural Networks (GNNs) with diffusion models [27] to create a graph-based diffusion denoiser. DIFUSCO formulates the MIS problem in the discrete domain and trains a diffusion model to improve a single or a pool of solutions.

On the other hand, RL-based methods have achieved more success in solving the MIS problem when compared to SL methods. In the work of [28], a Deep Q-Network (DQN) is combined with graph embeddings, facilitating the discrimination of vertices based on their influence on the solution and ensuring scalability to larger instances. Meanwhile, the study presented in [29] introduces the Learning What to Defer (LwD) method, an unsupervised deep RL solver resembling tree search, where vertices are iteratively assigned to the independent set. Their model is trained using Proximal Policy Optimization (PPO) [30]. The work in [31] introduces DIMES, which combines a compact continuous space to parameterize the distribution of potential solutions and a meta-learning framework to facilitate the effective initialization of model parameters during the fine-tuning stage that is required for each graph.

It is worth noting that the majority of SL and RL methods are data-dependent in the sense that they require the training of a separate network for each dataset of graphs. These data-dependent methods exhibit limited generalization performance when applied to out-of-distribution graph data. This weak generalization stems from the need to train a different network for each graph dataset. In contrast, our dQNN approach differs from SL- and RL-based methods in that it does not rely on any training datasets. Instead, our dQNN approach utilizes a simple yet effective graph-encoded continuous objective function, which is defined solely in terms of the connectivity of a given graph.

3) Dataless Differentiable Methods: The most related work to ours is [32], which introduces dataless neural networks (dNNs) tailored for the MIS problem. Notably, their method operates without the need for training data and relies on n𝑛nitalic_n trainable parameters. Their proposed methodology advocates using a ReLU-based continuous objective to solve the MIS problem. However, to scale up and improve their method, graph partitioning and local search algorithms were employed.

4) Discrete Sampling Solvers: In recent studies, researchers have explored the integration of energy-based models with parallel implementations of simulated annealing to address combinatorial optimization problems [33] without relying on any training data. For example, in tackling the Maximum Independent Set (MIS) problem, Sun et al. [34] proposed a solver that combines (i) Path Auxiliary Sampling [35] and (ii) the binary quadratic integer program in (2). However, unlike 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net, these approaches entail prolonged sequential runtime and require fine-tuning of multiple hyperparameters. Moreover, the energy models utilized in this method for addressing the MIS problem may generate binary vectors that violate the “no edges” constraint inherent to the MIS problem. Consequently, a post-processing procedure becomes necessary.

3 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net: Dataless quadratic neural networks for the MIS problem

In this section, we introduce our dataless neural network, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net , designed to solve the MIS problem through neural training without the need for any training data.

3.1 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net: The model.

We will first present the model of our proposed 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net that is (i) differentiable everywhere w.r.t. 𝐱𝐱\mathbf{x}bold_x, and (ii) free of hyper-parameter scheduling. A continuous relaxation of QIP (2) is

min𝐱[0,1]nvV𝐱v+(u,v)E𝐱v𝐱u,ormin𝐱[0,1]n𝐞nT𝐱+12𝐱T𝐀G𝐱.\begin{gathered}\min_{\mathbf{x}\in[0,1]^{n}}-\sum_{v\in V}\mathbf{x}_{v}+\sum% _{(u,v)\in E}\mathbf{x}_{v}\mathbf{x}_{u}\>,\quad\text{or}~{}\min_{\mathbf{x}% \in[0,1]^{n}}-\mathbf{e}_{n}^{T}\mathbf{x}+\frac{1}{2}\mathbf{x}^{T}\mathbf{A}% _{G}\mathbf{x}\>.\end{gathered}start_ROW start_CELL roman_min start_POSTSUBSCRIPT bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT ( italic_u , italic_v ) ∈ italic_E end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , or roman_min start_POSTSUBSCRIPT bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT bold_x . end_CELL end_ROW (4)

Let 𝐳superscript𝐳\mathbf{z}^{*}bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denote a binary minimizer of Problem (4). Then, it was shown in [36] that it corresponds to an MIS of size vV𝟙(𝐳v=1)subscript𝑣𝑉1subscriptsuperscript𝐳𝑣1\sum_{v\in V}\mathds{1}(\mathbf{z}^{*}_{v}=1)∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT blackboard_1 ( bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 ). Based on the quadratic MIS formulation in Problem (4), our proposed 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net introduces several improvements and modifications to efficiently solve the MIS problem. In particular, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net incorporates an edges-penalty parameter γ𝛾\gammaitalic_γ, which scales the influence of the edges of the graph G𝐺Gitalic_G on the optimization objective. Furthermore, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net uses the adjacency matrices of G𝐺Gitalic_G and Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. To see how 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net is designed, we first consider the following γ𝛾\gammaitalic_γ-parameterized augmented quadratic formulation for the MIS problem:

min𝐱[0,1]nf(𝐱):=𝐞nT𝐱+γ2𝐱T𝐀G𝐱12𝐱T𝐀G𝐱,assignsubscript𝐱superscript01𝑛𝑓𝐱superscriptsubscript𝐞𝑛𝑇𝐱𝛾2superscript𝐱𝑇subscript𝐀𝐺𝐱12superscript𝐱𝑇subscript𝐀superscript𝐺𝐱\begin{gathered}\!\!\!\!\min_{\mathbf{x}\in[0,1]^{n}}f(\mathbf{x}):=-\mathbf{e% }_{n}^{T}\mathbf{x}+\frac{\gamma}{2}\mathbf{x}^{T}\mathbf{A}_{G}\mathbf{x}-% \frac{1}{2}\mathbf{x}^{T}\mathbf{A}_{G^{\prime}}\mathbf{x},\!\!\!\end{gathered}start_ROW start_CELL roman_min start_POSTSUBSCRIPT bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_x ) := - bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x + divide start_ARG italic_γ end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT bold_x - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_x , end_CELL end_ROW (5)

where γ>1𝛾1\gamma>1italic_γ > 1 is the edges-penalty parameter. The rationale behind the third augmented term 12𝐱T𝐀G𝐱12superscript𝐱𝑇subscript𝐀superscript𝐺𝐱-\frac{1}{2}\mathbf{x}^{T}\mathbf{A}_{G^{\prime}}\mathbf{x}- divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_x in Problem (5) (corresponding to the edges of the complement graph Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) is to encourage the optimizer to select two nodes with no edge connecting them in G𝐺Gitalic_G (implying an edge in Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT). We will theoretically show later in Theorem 4 that any MIS minimizer is a local minimizer of Problem (5) with an appropriately chosen γ𝛾\gammaitalic_γ-value. Interestingly, the above continuous quadratic formulation of the MIS problem in Problem (5) admits a dataless implementation of the quadratic neural network (QNN), which was recently introduced in [37].

To see how Problem (5) corresponds to a 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net, consider the graph example in Figure 1 (left), for which the 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net is illustrated in Figure 1 (right). Here, the 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net comprises two fully connected layers. The initial activation-free layer encodes information about the nodes (top n=5𝑛5n=5italic_n = 5 connections), edges of G𝐺Gitalic_G (middle m=4𝑚4m=4italic_m = 4 connections), and edges of Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (bottom m=6superscript𝑚6m^{\prime}=6italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 6 connections), all without a bias vector. The subsequent fully connected layer is an activation-free layer performing a vector dot-product between the fixed weight vector (with 11-1- 1 corresponding to the nodes and edges of Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the edges-penalty parameter γ𝛾\gammaitalic_γ), and the output of the first layer.

Utilizing the SDP solution 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT of Problem (3), along with its interpretation discussed in Section 2.2, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net has the flexibility of incorporating 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT into the objective in Problem (5) as follows:

min𝐱[0,1]n𝐬T𝐱+γ2𝐱T𝐀G𝐱12𝐱T(𝐀G𝐍)𝐱,𝐬=diag(𝐗)maxv𝐗v,v,𝐍=𝐗max(u,v)E𝐗u,v.formulae-sequencesubscript𝐱superscript01𝑛superscript𝐬𝑇𝐱𝛾2superscript𝐱𝑇subscript𝐀𝐺𝐱12superscript𝐱𝑇subscript𝐀superscript𝐺𝐍𝐱𝐬diagsuperscript𝐗subscript𝑣subscriptsuperscript𝐗𝑣𝑣𝐍superscript𝐗subscript𝑢𝑣𝐸subscriptsuperscript𝐗𝑢𝑣\begin{gathered}\!\!\!\!\min_{\mathbf{x}\in[0,1]^{n}}-\mathbf{s}^{T}\mathbf{x}% +\frac{\gamma}{2}\mathbf{x}^{T}\mathbf{A}_{G}\mathbf{x}-\frac{1}{2}\mathbf{x}^% {T}(\mathbf{A}_{G^{\prime}}\circ\mathbf{N})\mathbf{x}\>,~{}\mathbf{s}=\frac{% \mathrm{diag}(\mathbf{X}^{*})}{\max_{v}\mathbf{X}^{*}_{v,v}}\>,~{}\mathbf{N}=% \frac{\mathbf{X}^{*}}{\max_{(u,v)\notin E}\mathbf{X}^{*}_{u,v}}\>.\end{gathered}start_ROW start_CELL roman_min start_POSTSUBSCRIPT bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - bold_s start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x + divide start_ARG italic_γ end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT bold_x - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∘ bold_N ) bold_x , bold_s = divide start_ARG roman_diag ( bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v , italic_v end_POSTSUBSCRIPT end_ARG , bold_N = divide start_ARG bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT ( italic_u , italic_v ) ∉ italic_E end_POSTSUBSCRIPT bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT end_ARG . end_CELL end_ROW (6)

𝐬𝐬\mathbf{s}bold_s and 𝐍𝐍\mathbf{N}bold_N represent the likelihood of nodes in the graph G𝐺Gitalic_G and complement graph Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, respectively, to be included in the MIS. This serves as our rationale behind the objective function in Problem (6).

Remark 1.

Despite the polynomial-time complexity of the SDP formulation in (3) and the availability of efficient solvers such as MOSEK [38], efficiently obtaining the optimal solution in (3) is predominantly achievable for sparse graphs. This constraint emerges from the fact that the computational complexity of SDP grows proportionally with both the number of nodes and the number of edges in the graph. Consequently, the practical applicability of utilizing the objective function in Problem (6) is limited to sparse graphs.

Refer to caption
Figure 1: 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net (right) for graph G𝐺Gitalic_G (left). Sets MIS1={v1,v4,v5}subscriptMIS1subscript𝑣1subscript𝑣4subscript𝑣5\textrm{MIS}_{1}=\{v_{1},v_{4},v_{5}\}MIS start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT } and MIS2={v3,v4,v5}subscriptMIS2subscript𝑣3subscript𝑣4subscript𝑣5\textrm{MIS}_{2}=\{v_{3},v_{4},v_{5}\}MIS start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT } correspond to an MIS in G𝐺Gitalic_G and an MC in Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Set MIS3={v2,v3}subscriptMIS3subscript𝑣2subscript𝑣3\textrm{MIS}_{3}=\{v_{2},v_{3}\}MIS start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } corresponds to a maximal independent set but not a maximum independent set in G𝐺Gitalic_G.

3.2 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net: The training algorithm

Drawing from the objective function of Problem (5) and the network structure of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net, we introduce an MIS training algorithm. Notably, in contrast to the ReLU-based dNN proposed in [32], the 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net formulation in Problem (5) is characterized by being fully differentiable across 𝐱𝐱\mathbf{x}bold_x, enabling more numerically stable optimization [39], as demonstrated in our experimental findings discussed in Section 4.

Algorithm 1 The 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net MIS Training Algorithm.

Input: Graph G𝐺Gitalic_G, number of iterations T𝑇Titalic_T, edge-penalty parameters γ𝛾\gammaitalic_γ, set of initializations S𝑆Sitalic_S, and learning rate α𝛼\alphaitalic_α of Adam.
Output: The best obtained MIS superscript\mathcal{I}^{*}caligraphic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in G𝐺Gitalic_G.
1: Initialize SQ={}subscript𝑆𝑄S_{Q}=\{\cdot\}italic_S start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT = { ⋅ } (an empty set to collect MISs).
2: For 𝐱[0]S𝐱delimited-[]0𝑆\mathbf{x}[0]\in Sbold_x [ 0 ] ∈ italic_S (This runs in parallel)
3:    For t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ]
4:      Run an Adam iteration (with α𝛼\alphaitalic_α) to get 𝐱[t]𝐱delimited-[]𝑡\mathbf{x}[t]bold_x [ italic_t ] from 𝐱[t1]𝐱delimited-[]𝑡1\mathbf{x}[t-1]bold_x [ italic_t - 1 ] on Problem (5) (or Problem (6)) with γ𝛾\gammaitalic_γ.
5:      Obtain 𝐱[t]Proj[0,1]n(𝐱[t])𝐱delimited-[]𝑡subscriptProjsuperscript01𝑛𝐱delimited-[]𝑡\mathbf{x}[t]\leftarrow\mathrm{Proj}_{[0,1]^{n}}(\mathbf{x}[t])bold_x [ italic_t ] ← roman_Proj start_POSTSUBSCRIPT [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_x [ italic_t ] ) (box constraints).
6:      Obtain (𝐱[t]):={vV:𝐱v[t]>0}assign𝐱delimited-[]𝑡conditional-set𝑣𝑉subscript𝐱𝑣delimited-[]𝑡0\mathcal{I}(\mathbf{x}[t]):=\{v\in V:\mathbf{x}_{v}[t]>0\}caligraphic_I ( bold_x [ italic_t ] ) := { italic_v ∈ italic_V : bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT [ italic_t ] > 0 }.
7:      If (𝐱[t])𝐱delimited-[]𝑡\mathcal{I}(\mathbf{x}[t])caligraphic_I ( bold_x [ italic_t ] ) is a MIS in G𝐺Gitalic_G: Then SQSQ(𝐱[t])subscript𝑆𝑄subscript𝑆𝑄𝐱delimited-[]𝑡S_{Q}\leftarrow S_{Q}\cup\mathcal{I}(\mathbf{x}[t])italic_S start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ← italic_S start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ∪ caligraphic_I ( bold_x [ italic_t ] ). Break the inner for loop.
8: Return =argmaxSQ||superscriptsubscriptargmaxsubscript𝑆𝑄\mathcal{I}^{*}=\operatorname*{argmax}_{\mathcal{I}\in S_{Q}}|\mathcal{I}|caligraphic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmax start_POSTSUBSCRIPT caligraphic_I ∈ italic_S start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_I |

Our objective functions in Problems (5) and (6) are highly non-convex which makes finding the global minimizer(s) a challenging task. The work in [40] details the complexity of box-constrained continuous quadratic optimization problems. Gradient-based optimizers like Adam [15] are effective for finding a local minimizer given an initialization in [0,1]nsuperscript01𝑛[0,1]^{n}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Due to the full differentiability of our objective (Problems (5) and (6)), Adam empirically proves to be computationally highly efficient. Consequently, for a single graph, we can initiate multiple optimizations from various points in [0,1]nsuperscript01𝑛[0,1]^{n}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and execute Adam in parallel for each. Specifically, with a specified number of batches (parallel processes) B𝐵Bitalic_B, we define set S𝑆Sitalic_S to denote all the initializations, where |S|=B𝑆𝐵|S|=B| italic_S | = italic_B, and consider the following three approaches:

  • Random Initialization: Here, each vector in set S𝑆Sitalic_S is obtained by sampling from each entry independently from the uniform distribution. This strategy is effective when the degree of each vertex is similar to all other vertices such as the Erdos-Renyi (ER) [41] graphs.

  • SDP-based Initialization: Given that the diagonal of 𝐗superscript𝐗\mathbf{X}^{*}bold_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represents the ‘weight’ of each vertex to be in the MIS, we propose using the SDP solution by which set S𝑆Sitalic_S consists of vector 𝐬𝐬\mathbf{s}bold_s and B1𝐵1B-1italic_B - 1 samples drawn from a Gaussian distribution with mean vector 𝐬𝐬\mathbf{s}bold_s and covariance matrix η𝐈𝜂𝐈\eta\mathbf{I}italic_η bold_I. Here, η𝜂\etaitalic_η serves as a hyper-parameter that governs the exploration around 𝐬𝐬\mathbf{s}bold_s. In this case, we use Problem (6). This strategy is effective particularly in scenarios where solving the SDP is computationally tractable, such as in the case of sparse graphs.

  • Degree-based Initialization: Following the intuition that vertices with higher degrees are less likely to belong to an MIS compared to those with lower degrees, we propose using set S𝑆Sitalic_S with B𝐵Bitalic_B samples drawn from a Gaussian distribution with mean vector 𝐠𝐠\mathbf{g}bold_g obtained as 𝐠v=1d(v)Δ(G),vV,𝐠𝐠maxv𝐠v,formulae-sequencesubscript𝐠𝑣1d𝑣Δ𝐺formulae-sequencefor-all𝑣𝑉𝐠𝐠subscript𝑣subscript𝐠𝑣\mathbf{g}_{v}=1-\frac{\mathrm{d}(v)}{\Delta(G)},\forall v\in V\>,\mathbf{g}% \leftarrow\frac{\mathbf{g}}{\max_{v}\mathbf{g}_{v}}\>,bold_g start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 - divide start_ARG roman_d ( italic_v ) end_ARG start_ARG roman_Δ ( italic_G ) end_ARG , ∀ italic_v ∈ italic_V , bold_g ← divide start_ARG bold_g end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG , and covariance matrix η𝐈𝜂𝐈\eta\mathbf{I}italic_η bold_I. This will be our choice when computing the SDP solution is computationally expensive, i.e., for dense graphs.

We outline the MIS training procedure for 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net in Algorithm 1. As shown, the algorithm takes a graph G𝐺Gitalic_G, a set of initializations S𝑆Sitalic_S, the maximum number of iterations per batch T𝑇Titalic_T (with iteration index t𝑡titalic_t), an edge-penalty parameter γ𝛾\gammaitalic_γ, and Adam learning rate α𝛼\alphaitalic_α as inputs. For each batch and iteration t𝑡titalic_t, the Adam optimizer updates 𝐱𝐱\mathbf{x}bold_x (Line 4). In Line 5, a projection onto [0,1]nsuperscript01𝑛[0,1]^{n}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is employed. In Lines 6 and 7, the algorithm checks whether the updated 𝐱𝐱\mathbf{x}bold_x corresponds to an MIS in the graph. If yes, the algorithm stops for this batch. Finally, the best-found MIS, determined by its cardinality, is returned in Line 8. The blue font is used to indicate the case when we use the SDP initialization.

3.3 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net: Theoretical foundation

Here, we provide the necessary and sufficient condition on the γ𝛾\gammaitalic_γ-value for any MIS to correspond to local minimizers of Problem (5). Moreover, we also provide a sufficient condition for all local minimizers of Problem (5) to be associated with a MIS. We relegate the proofs to Appendix A.

Definition 3 (MIS vector).

Given a graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), a binary vector 𝐱{0,1}n𝐱superscript01𝑛\mathbf{x}\in\{0,1\}^{n}bold_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is called a MIS vector if there exists a MIS S𝑆Sitalic_S of G𝐺Gitalic_G such that 𝐱i=1subscript𝐱𝑖1\mathbf{x}_{i}=1bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 for all iS𝑖𝑆i\in Sitalic_i ∈ italic_S, and 𝐱i=0subscript𝐱𝑖0\mathbf{x}_{i}=0bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 for all iS𝑖𝑆i\notin Sitalic_i ∉ italic_S.

Theorem 4 (Necessary and Sufficient Condition on γ𝛾\gammaitalic_γ for MIS vectors to be local minimizers of Problem (5)).

Given an arbitrary graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and its corresponding 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net formulation in Problem (5), suppose the size of the largest MIS of G𝐺Gitalic_G is k𝑘kitalic_k. Then, γk+1𝛾𝑘1\gamma\geq k+1italic_γ ≥ italic_k + 1 is necessary and sufficient for all MIS vectors to be local minimizers of Problem (5) for arbitrary graphs.

Remark 2.

Theorem 4 provides a guideline for choosing γ𝛾\gammaitalic_γ in 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net. Note that the MIS set size k𝑘kitalic_k is usually unknown a priori. But we may employ any classical estimate of the MIS size k𝑘kitalic_k to guide the choice of γ𝛾\gammaitalic_γ (e.g., we know from [42] that kvV11+d(v)𝑘subscript𝑣𝑉11d𝑣k\geq\sum_{v\in V}\frac{1}{1+\textrm{d}(v)}italic_k ≥ ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + d ( italic_v ) end_ARG ).

Next, we provide a stronger condition on γ𝛾\gammaitalic_γ that ensures all local minimizers of Problem (5) correspond to a MIS.

Theorem 5 (𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net Local Minimizers).

Given graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and set γn𝛾𝑛\gamma\geq nitalic_γ ≥ italic_n, all local minimizers of (5) are MIS vectors of G𝐺Gitalic_G.

Remark 3.

The assumption γn𝛾𝑛\gamma\geq nitalic_γ ≥ italic_n in Theorem 5 is stronger than that in Theorem 4 (as for any graph G𝐺Gitalic_G with E𝐸E\neq\emptysetitalic_E ≠ ∅ (non-empty graph), we have n>k𝑛𝑘n>kitalic_n > italic_k). The trade-off of choosing a large γ𝛾\gammaitalic_γ-value is that while larger values of γ𝛾\gammaitalic_γ (γn𝛾𝑛\gamma\geq nitalic_γ ≥ italic_n) ensure that only MIS are local minimizers, they also increase the non-convexity of the optimization problem, thereby making it more difficult to solve.

Remark 4.

Although the proposed constrained quadratic Problem (5) is still NP-hard to solve for the global minimizer, it is a relaxation of the original integer programming problem. It can leverage gradient information, allowing the use of high-performance computational resources and parallel processing to enhance the efficiency and scalability of our approach.

4 Experimental results

1) Settings, Baselines, & Benchmarks: Graphs are processed using the NetworkX library [21]. For baselines, we utilize Gurobi [9] and the recent Google solver CP-SAT [10] for the ILP in (1), ReduMIS [3], iSCO111https://github.com/google-research/discs [34], and four learning-based methods: DIMES [31], DIFUSCO [14], LwD [29], and the GCN method in [24] (commonly referred to as ‘Intel’). We note that, following the analysis in [12], GCN’s code cloning to ReduMIS is disabled, which was also done in [14]. Aligned with recent SOTA methods (DIMES, DIFUSCO, and iSCO), we employ the Erdos-Renyi (ER) [41] graphs from [31] and the SATLIB graphs [43] as benchmarks. The ER dataset222https://github.com/DIMESTeam/DIMES consists of 128128128128 graphs with 700700700700 to 800800800800 nodes and p=0.15𝑝0.15p=0.15italic_p = 0.15, where p𝑝pitalic_p is the probability of edge creation. The SATLIB dataset consists of 500500500500 graphs (with at most 1,34713471,3471 , 347 nodes and 5,97859785,9785 , 978 edges). Additionally, the GNM random graph generator function of NetworkX is utilized for our scalability experiment. For 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net, the edges-penalty parameter γ𝛾\gammaitalic_γ is selected as 775775775775. The initial learning rate is 0.60.60.60.6 for ER, 0.9 for SATLIB, and 0.5 for GNM. The number of iterations per initialization, T𝑇Titalic_T, is set to 150150150150 for ER, 50505050 for SATLIB, and 350 for GNM. The exploration parameter for SDP-based and Degree-based initialization is set to η=2.25𝜂2.25\eta=2.25italic_η = 2.25. Our code333https://anonymous.4open.science/r/Quant-Net/README.md uses PyTorch [44] to construct the objective function in 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net , and PyTorch’s implementation of Adam to optimize. Further implementation details and results are provided in Appendix B and Appendix D, respectively.

2) ER and SATLIB Benchmark Results:

Here, we present the results of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net, along with the other considered baselines using the SATLIB (Table 1(a)) and ER (Table 1(b)) benchmarks in terms of average MIS size over the graphs in the dataset and the total sequential run-time required to obtain the results for all the graphs. We note that the results of the learning-based methods are sourced from [14]. In what follows, we provide observations on these results.

  • All learning-based methods, except for GCN, require training a separate network for each graph dataset, as indicated in the third column of Table 1(a) and Table 1(b). This illustrates the generalization limitations of these methods. In contrast, our method is more generalizable, as it only requires tuning a few hyper-parameters for each set of graphs.

  • When compared to learning-based approaches, on ER (resp. SATLIB), our method outperforms all (resp. most) baseline methods, all without requiring any training data. Run-time comparison with these methods is not considered, as the reported numbers exclude training time, which may vary depending on multiple factors such as graph size, available compute, number of data points, and the used neural network architecture. Furthermore, our approach does not rely on additional techniques such as Greedy Decoding [45] and Monte Carlo Tree Search [46].

  • When compared to iSCO, our method reports almost similar MIS size for SATLIB, while falling by nearly two nodes on ER. Nevertheless, our method requires significantly reduced sequential run time. It is important to note that the iSCO paper  [34] reports a lower run time as compared to other methods. This reported run time is achieved by evaluating the test graphs in parallel, in contrast to all other methods that evaluated them sequentially. To fairly compare methods in our experiments, we opted to report sequential test run time only. The extended sequential run-time of iSCO, compared to its parallel run-time, is due to its use of simulated annealing. Because simulated annealing depends on knowing the energy of the previous step when determining the next step, it is inherently more efficient for iSCO to solve many graphs in parallel than in series.

  • For SATLIB, which consists of highly sparse graphs, on average, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net falls short by a few nodes when compared to ReduMIS, Gurobi, and CP-SAT. The reason ReduMIS achieves SOTA of results here is that a large set of MIS-specific graph reductions can be applied. However, for denser graphs, most of these graph reductions are not applicable. Gurobi (and CP-SAT) solves the ILP in (1) by which the number of constraints is equal to the number of edges in the graph. This means that Gurobi and CP-SAT are expected to perform much better in sparse graph such as SATLIB.

  • On ER, when compared to Gurobi and CP-SAT, our method not only reports a larger average MIS size but also requires less than half the run-time. This is because ER is relatively denser compared to SATLIB. As a result, when run for 64 minutes on ER, Gurobi and CP-SAT fall short compared to our method and ReduMIS, while reporting the same average MIS as ReduMIS for SATLIB.

Table 1: Benchmark dataset results in terms of average MIS size and total sequential run-time (minutes). RL, SL, G, S, and TS represent Reinforcement Learning, Supervised Learning, Greedy decoding, Sampling, and Tree Search, respectively. We note that the run time reported in iSCO (Table 1 in [34]) is for running multiple graphs in parallel, not a sequential total run time. Therefore, we ran a few graphs sequentially and obtained the extrapolated run-time in column 5. SDP, Degree-based Initialization (DI), and random initialization (RI) represent the initializations used with 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net. ReduMIS employs the local search procedure from [26], which no other method in the table uses, following the study in [12]. For more details about the requirements of each method, see Appendix C.
(a) SATLIB Dataset
Method Type Training Data MIS Size Total Run-time (m) Run-time Comment
ReduMIS Heuristics Not Required 425.96 37.58 Run until completion
CP-SAT Exact Not Required 425.96 0.78 Run until completion
Gurobi Exact Not Required 425.96 8.16 Run until completion
GCN SL+G SATLIB 420.66 23.05 This excludes training time
LwD RL+S SATLIB 422.22 18.83 This excludes training time
DIMES RL+TS SATLIB 422.22 18.83 This excludes training time
DIMES RL+S SATLIB 423.28 20.26 This excludes training time
DIFUSCO SL+G SATLIB 424.5 8.76 This excludes training time
DIFUSCO SL+S SATLIB 425.13 23.74 This excludes training time
iSCO Sampling Not Required 423.7 similar-to\sim7500 Sequential runtime; original paper shows parallelized runtime
𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net + SDP (Ours) dQNN Not Required 423.22 89.69 This excludes SDP time (30 seconds per graph)
𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net + DI (Ours) dQNN Not Required 423.03 64.9 Run until completion
(b) ER Dataset
Method Type Training Data MIS Size Total Run-time (m) Run-time Comment
ReduMIS Heuristics Not Required 44.87 52.13 Run until completion
CP-SAT Exact Not Required 41.09 64.00 Run with 30 second time limit per graph
Gurobi Exact Not Required 39.19 64.00 Run with 30 second time limit per graph
GCN SL+G SATLIB 34.86 6.06 This excludes training time
GCN SL+TS SATLIB 38.8 20 This excludes training time
LwD RL+S ER 41.17 6.33 This excludes training time
DIMES RL+TS ER 38.24 6.12 This excludes training time
DIMES RL+S ER 42.06 12.01 This excludes training time
DIFUSCO SL+G ER 38.83 8.8 This excludes training time
DIFUSCO SL+S ER 41.12 26.27 This excludes training time
iSCO Sampling Not Required 44.8 similar-to\sim384 Sequential runtime; original paper shows parallelized runtime
𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net + RI (Ours) dQNN Not Required 43.52 21 Run until completion

3) Scalability Results:

It is well-established that relatively denser graphs pose greater computational challenges compared to sparse graphs. This observation diverges from the trends exhibited by other baselines, which predominantly excel on sparse graphs. We argue that this is due to the applicability of graph reduction techniques such as the LP reduction method in [16], and the unconfined vertices rule [47] (see [3] for a complete list of the graph reduction rules that apply only on sparse graphs). For instance, by simply applying the LP graph reduction technique, the large-scale highly sparse graphs (with several hundred thousand nodes), considered in Table 5 of [24], reduce to graphs of a few thousands nodes with often dis-connected sub-graphs that can be treated independently.

Therefore, the scalability and performance of ReduMIS are significantly dependent by the sparsity of the graph. This dependence emerges from the iterative application of various graph reduction techniques in ReduMIS, specifically tailored for sparse graphs. For instance, the ReduMIS results presented in Table 2 of [29] are exclusively based on extensive and highly sparse graphs. This conclusion is substantiated by both the sizes of the considered graphs and the corresponding sizes of the obtained MIS solutions. As such, in this experiment, we investigate the scalability of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net for the MIS problem against the SOTA data-independent methods: ReduMIS, Gurobi, and CP-SAT. Here, we use randomly generated graphs with the GNM generator by which the number of edges is set to m=n(n1)4𝑚𝑛𝑛14m=\lceil\frac{n(n-1)}{4}\rceilitalic_m = ⌈ divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 4 end_ARG ⌉. It is important to note that the density of these graphs is significantly higher than those considered in previous works. This choice of the number of edges in the GNM function indicate that half of the total possible edges (w.r.t. the complete graph) exist.

Results are provided in Figure 2. As observed, for dense graphs, as the graph size increases, our method requires significantly less run-time (Figure 2(a)) compared to all baselines, while reporting almost the same average MIS size (Table 2(b)). For instance, when n𝑛nitalic_n is 500500500500, on average, our method requires around 1 minute to solve the 5 graphs, whereas other baselines require approximately 45 minutes or more to achieve the same MIS size. These results indicate that, unlike ReduMIS and ILP solvers, the run-time of our method scales only with the number of nodes in the graph, which is a significant improvement.

Refer to caption
(a) Total Run-time (minutes).
(n,m)𝑛𝑚(n,m)( italic_n , italic_m ) ReduMIS Gurobi CP-SAT 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net (Ours)
(50,613)50613(50,613)( 50 , 613 ) 7.67.67.67.6 7.67.67.67.6 7.67.67.67.6 7.67.67.67.6
(500,62375)50062375(500,62375)( 500 , 62375 ) 13.413.413.413.4 13.413.413.413.4 13.413.413.413.4 13.413.413.413.4
(1000,249750)1000249750(1000,249750)( 1000 , 249750 ) 15.015.015.015.0 N/A N/A 14.814.814.814.8
(1500,562125)1500562125(1500,562125)( 1500 , 562125 ) 16.016.016.016.0 N/A N/A 15.215.215.215.2
(2000,999500)2000999500(2000,999500)( 2000 , 999500 ) 16.416.416.416.4 N/A N/A 16161616
(b) Average MIS size.
Figure 2: Scalability results for the GNM graphs with m=n(n1)4𝑚𝑛𝑛14m=\lceil\frac{n(n-1)}{4}\rceilitalic_m = ⌈ divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 4 end_ARG ⌉. Each entry in the table corresponds to an average of 5 graphs. This choice of the number of edges indicates that half of the total possible edges (w.r.t. the complete graph) exist. The ‘N/A’ entries are due excessive run-times. Degree-based initializations are used here for 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net .

5 Conclusion

This study addressed the challenging Maximum Independent Set (MIS) Problem within the domain of Combinatorial Optimization by introducing an innovative continuous formulation employing dataless quadratic neural networks. By eliminating the need for any training data, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net sets itself apart from conventional learning approaches. Through the utilization of gradient-based optimization using ADAM and a GPU implementation, our straightforward yet effective approach demonstrates competitive performance compared to state-of-the-art learning-based and sampling-based methods. This research offers a distinctive perspective on approaching discrete optimization problems through parameter-efficient neural networks that are trained from the problem structure, not from datasets.

References

  • Karp [1972] Richard M Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85–103. Springer, 1972.
  • Bengio et al. [2021] Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021.
  • Lamm et al. [2016] Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, and Renato F Werneck. Finding near-optimal independent sets at scale. In 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 138–150. SIAM, 2016.
  • Akiba and Iwata [2016] Takuya Akiba and Yoichi Iwata. Branch-and-reduce exponential/fpt algorithms in practice: A case study of vertex cover. Theoretical Computer Science, 609:211–225, 2016.
  • San Segundo et al. [2011] Pablo San Segundo, Diego Rodríguez-Losada, and Agustín Jiménez. An exact bit-parallel algorithm for the maximum clique problem. Computers & Operations Research, 38(2):571–581, 2011.
  • Boppana and Halldórsson [1992] Ravi Boppana and Magnús M Halldórsson. Approximating maximum independent sets by excluding subgraphs. BIT Numerical Mathematics, 32(2):180–196, 1992.
  • Tarjan and Trojanowski [1977] Robert Endre Tarjan and Anthony E Trojanowski. Finding a maximum independent set. SIAM Journal on Computing, 6(3):537–546, 1977.
  • [8] IBM. IBM ILOG CPLEX Optimization Studio. URL https://www.ibm.com/products/ilog-cplex-optimization-studio.
  • [9] Gurobi. Gurobi Optimization. URL https://www.gurobi.com.
  • Google, Inc. [2022] Google, Inc. Google or-tools. 2022. URL https://developers.google.com/optimization.
  • He et al. [2014] He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms. Advances in neural information processing systems, 27:3293–3301, 2014.
  • Böther et al. [2022] Maximilian Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, and Tobias Friedrich. What’s wrong with deep learning in tree search for combinatorial optimization. arXiv preprint arXiv:2201.10494, 2022.
  • Dong et al. [2021] Yuanyuan Dong, Andrew V Goldberg, Alexander Noe, Nikos Parotsidis, Mauricio GC Resende, and Quico Spaen. New instances for maximum weight independent set from a vehicle routing application. In Operations Research Forum, volume 2, pages 1–6. Springer, 2021.
  • Sun and Yang [2023] Zhiqing Sun and Yiming Yang. Difusco: Graph-based diffusion solvers for combinatorial optimization. arXiv preprint arXiv:2302.08224, 2023.
  • Kingma and Ba [2015] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015.
  • Nemhauser and Trotter [1975] George L Nemhauser and Leslie Earl Trotter. Vertex packings: Structural properties and algorithms. Mathematical Programming, 8(1):232–248, 1975.
  • Pardalos and Rodgers [1992] Panos M Pardalos and Gregory P Rodgers. A branch and bound algorithm for the maximum clique problem. Computers & operations research, 19(5):363–375, 1992.
  • Lovász [1979] László Lovász. On the shannon capacity of a graph. IEEE Transactions on Information theory, 25(1):1–7, 1979.
  • Von Luxburg [2007] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
  • Dai et al. [2016] Hanjun Dai, Bo Dai, and Le Song. Discriminative embeddings of latent variable models for structured data. In International conference on machine learning, pages 2702–2711. PMLR, 2016.
  • Hagberg et al. [2008] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA, 2008.
  • Williamson and Shmoys [2011] David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge university press, 2011.
  • Berman and Schnitger [1992] Piotr Berman and Georg Schnitger. On the complexity of approximating the independent set problem. Information and Computation, 96(1):77–94, 1992.
  • Li et al. [2018] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolutional networks and guided tree search. In NeurIPS, 2018.
  • Defferrard et al. [2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29:3844–3852, 2016.
  • Andrade et al. [2012] Diogo V Andrade, Mauricio GC Resende, and Renato F Werneck. Fast local search for the maximum independent set problem. Journal of Heuristics, 18(4):525–547, 2012.
  • Ho et al. [2020] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  • Dai et al. [2017] Hanjun Dai, Elias B Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6351–6361, 2017.
  • Ahn et al. [2020] Sungsoo Ahn, Younggyo Seo, and **woo Shin. Learning what to defer for maximum independent sets. In International Conference on Machine Learning, pages 134–144. PMLR, 2020.
  • Schulman et al. [2017] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  • Qiu et al. [2022] Ruizhong Qiu, Zhiqing Sun, and Yiming Yang. Dimes: A differentiable meta solver for combinatorial optimization problems. Advances in Neural Information Processing Systems, 35:25531–25546, 2022.
  • Alkhouri et al. [2022] Ismail R Alkhouri, George K Atia, and Alvaro Velasquez. A differentiable approach to the maximum independent set problem using dataless neural networks. Neural Networks, 155:168–176, 2022.
  • Goshvadi et al. [2024] Katayoon Goshvadi, Haoran Sun, Xingchao Liu, Azade Nova, Ruqi Zhang, Will Grathwohl, Dale Schuurmans, and Hanjun Dai. Discs: A benchmark for discrete sampling. Advances in Neural Information Processing Systems, 36, 2024.
  • Sun et al. [2023] Haoran Sun, Katayoon Goshvadi, Azade Nova, Dale Schuurmans, and Hanjun Dai. Revisiting sampling for combinatorial optimization. In International Conference on Machine Learning, pages 32859–32874. PMLR, 2023.
  • Sun et al. [2021] Haoran Sun, Hanjun Dai, Wei Xia, and Arun Ramamurthy. Path auxiliary proposal for mcmc in discrete space. In International Conference on Learning Representations, 2021.
  • Mahdavi Pajouh et al. [2013] Foad Mahdavi Pajouh, Balabhaskar Balasundaram, and Oleg A Prokopyev. On characterization of maximal independent sets via quadratic optimization. Journal of Heuristics, 19:629–644, 2013.
  • Fan et al. [2020] Fenglei Fan, **jun Xiong, and Ge Wang. Universal approximation with quadratic deep networks. Neural Networks, 124:383–392, 2020.
  • [38] MOSEK ApS. MOSEK: Optimization software. https://www.mosek.com.
  • Liu et al. [2021] Bo Liu, Zhaoying Liu, Ting Zhang, and Tongtong Yuan. Non-differentiable saddle points and sub-optimal local minima exist for deep relu networks. Neural Networks, 144:75–89, 2021.
  • Burer and Letchford [2009] Samuel Burer and Adam N Letchford. On nonconvex quadratic programming with box constraints. SIAM Journal on Optimization, 20(2):1073–1089, 2009.
  • Erdos et al. [1960] Paul Erdos, Alfréd Rényi, et al. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
  • Wei [1981] Victor K Wei. A lower bound on the stability number of a simple graph. Technical report, Bell Laboratories Technical Memorandum Murray Hill, NJ, USA, 1981.
  • Hoos and Stützle [2000] Holger H Hoos and Thomas Stützle. Satlib: An online resource for research on sat. Sat, 2000:283–292, 2000.
  • [44] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch. https://pytorch.org/.
  • Graikos et al. [2022] Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, and Dimitris Samaras. Diffusion models as plug-and-play priors. Advances in Neural Information Processing Systems, 35:14715–14728, 2022.
  • Fu et al. [2021] Zhang-Hua Fu, Kai-Bin Qiu, and Hongyuan Zha. Generalize a small pre-trained model to arbitrarily large tsp instances. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 7474–7482, 2021.
  • Xiao and Nagamochi [2013] Mingyu Xiao and Hiroshi Nagamochi. Confining sets and avoiding bottleneck cases: A simple maximum independent set algorithm in degree-3 graphs. Theoretical Computer Science, 469:92–104, 2013.

Appendix A Proofs

Let’s begin by re-stating our main optimization problem:

min𝐱[0,1]nf(𝐱):=𝐞nT𝐱+γ2𝐱T𝐀G𝐱12𝐱T𝐀G𝐱.assignsubscript𝐱superscript01𝑛𝑓𝐱superscriptsubscript𝐞𝑛𝑇𝐱𝛾2superscript𝐱𝑇subscript𝐀𝐺𝐱12superscript𝐱𝑇subscript𝐀superscript𝐺𝐱\begin{gathered}\!\!\!\!\min_{\mathbf{x}\in[0,1]^{n}}f(\mathbf{x}):=-\mathbf{e% }_{n}^{T}\mathbf{x}+\frac{\gamma}{2}\mathbf{x}^{T}\mathbf{A}_{G}\mathbf{x}-% \frac{1}{2}\mathbf{x}^{T}\mathbf{A}_{G^{\prime}}\mathbf{x}.\!\!\!\end{gathered}start_ROW start_CELL roman_min start_POSTSUBSCRIPT bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_x ) := - bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x + divide start_ARG italic_γ end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT bold_x - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_x . end_CELL end_ROW (7)

The gradient of (7) is:

𝐱f(𝐱)=𝐞n+(γ𝐀G𝐀G)𝐱.subscript𝐱𝑓𝐱subscript𝐞𝑛𝛾subscript𝐀𝐺subscript𝐀superscript𝐺𝐱\nabla_{\mathbf{x}}f(\mathbf{x})=-\mathbf{e}_{n}+(\gamma\mathbf{A}_{G}-\mathbf% {A}_{G^{\prime}})\mathbf{x}\>.∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x ) = - bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( italic_γ bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) bold_x . (8)

For some vV𝑣𝑉v\in Vitalic_v ∈ italic_V, we have

f(𝐱)𝐱v=1+γu𝒩(v)𝐱uu𝒩(v)𝐱u𝑓𝐱subscript𝐱𝑣1𝛾subscript𝑢𝒩𝑣subscript𝐱𝑢subscript𝑢superscript𝒩𝑣subscript𝐱𝑢\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}_{v}}=-1+\gamma\sum_{u\in% \mathcal{N}(v)}\mathbf{x}_{u}-\sum_{u\in\mathcal{N}^{\prime}(v)}\mathbf{x}_{u}divide start_ARG ∂ italic_f ( bold_x ) end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG = - 1 + italic_γ ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT (9)

A.1 Proof of Theorem 4

Proof.

Let S𝑆Sitalic_S be an MIS. Define the vector 𝐱Ssuperscript𝐱𝑆\mathbf{x}^{S}bold_x start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT such that it contains 1111’s at positions corresponding to the nodes in the set S𝑆Sitalic_S, and 00’s at all other positions. For any MIS to be a local minimizer of Problem (6), it is sufficient and necessary to require that

f(𝐱)𝐱v0,vS andformulae-sequence𝑓𝐱subscript𝐱𝑣0for-all𝑣𝑆 and\displaystyle\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}_{v}}\geq 0,\quad% \forall v\notin S\textrm{ and}divide start_ARG ∂ italic_f ( bold_x ) end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG ≥ 0 , ∀ italic_v ∉ italic_S and (10)
f(𝐱)𝐱v0,vS.formulae-sequence𝑓𝐱subscript𝐱𝑣0for-all𝑣𝑆\displaystyle\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}_{v}}\leq 0,\quad% \forall v\in S.divide start_ARG ∂ italic_f ( bold_x ) end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG ≤ 0 , ∀ italic_v ∈ italic_S . (11)

Here, 𝐱vsubscript𝐱𝑣\mathbf{x}_{v}bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the element of 𝐱𝐱\mathbf{x}bold_x at the position corresponding to the node v𝑣vitalic_v. (10) is derived because if vS𝑣𝑆v\notin Sitalic_v ∉ italic_S, then 𝐱vS=0subscriptsuperscript𝐱𝑆𝑣0\mathbf{x}^{S}_{v}=0bold_x start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 0 (by the definition of 𝐱Ssuperscript𝐱𝑆\mathbf{x}^{S}bold_x start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT) so it is at the left boundary of the interval [0,1]01[0,1][ 0 , 1 ]. For the left boundary point to be a local minimizer, it requires the derivative to be non-negative (i.e., moving towards the right only increases the objective). Similarly, when vS𝑣𝑆v\in Sitalic_v ∈ italic_S, 𝐱vS=1subscriptsuperscript𝐱𝑆𝑣1\mathbf{x}^{S}_{v}=1bold_x start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1, is at the right boundary for (11), at which the derivative should be non-positive.

The derivative of f𝑓fitalic_f computed in (9) can be rewritten as

f(𝐱)𝐱v=1+γmvv,vS,formulae-sequence𝑓𝐱subscript𝐱𝑣1𝛾subscript𝑚𝑣subscript𝑣for-all𝑣𝑆\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}_{v}}=-1+\gamma m_{v}-\ell_{v}% ,\quad\forall v\notin S,divide start_ARG ∂ italic_f ( bold_x ) end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG = - 1 + italic_γ italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT - roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , ∀ italic_v ∉ italic_S , (12)

where mv:=|{u𝒩(v)S}|assignsubscript𝑚𝑣𝑢𝒩𝑣𝑆m_{v}:=\left|\{u\in\mathcal{N}(v)\cap S\}\right|italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT := | { italic_u ∈ caligraphic_N ( italic_v ) ∩ italic_S } | is the number of neighbours of v𝑣vitalic_v in S𝑆Sitalic_S and vsubscript𝑣\ell_{v}roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the number of non-neighbours of v𝑣vitalic_v in S𝑆Sitalic_S i.e., v:=|{u𝒩(v)S}|assignsubscript𝑣𝑢superscript𝒩𝑣𝑆\ell_{v}:=\left|\{u\in\mathcal{N}^{\prime}(v)\cap S\}\right|roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT := | { italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) ∩ italic_S } | where 𝒩(v)={u:(u,v)E}superscript𝒩𝑣conditional-set𝑢𝑢𝑣superscript𝐸\mathcal{N}^{\prime}(v)=\{u:(u,v)\in E^{\prime}\}caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) = { italic_u : ( italic_u , italic_v ) ∈ italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. By this definition, we immediately have 1mv|S|1subscript𝑚𝑣𝑆1\leq m_{v}\leq|S|1 ≤ italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≤ | italic_S | and 0v|S|0subscript𝑣𝑆0\leq\ell_{v}\leq|S|0 ≤ roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≤ | italic_S |, where the upper and lower bounds for mvsubscript𝑚𝑣m_{v}italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and vsubscript𝑣\ell_{v}roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT are all attainable by some special graphs. Note that the lower bound of mvsubscript𝑚𝑣m_{v}italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is 1111, and that is due the fact that S𝑆Sitalic_S is a MIS, so any other node (say v𝑣vitalic_v) will have at least 1111 edge connected to a node in S𝑆Sitalic_S.

Plugging (12) into (10), we obtain

γ1+vmv.𝛾1subscript𝑣subscript𝑚𝑣\gamma\geq\frac{1+\ell_{v}}{m_{v}}.italic_γ ≥ divide start_ARG 1 + roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG . (13)

Since we’re seeking a universal γ𝛾\gammaitalic_γ for all the graphs, we must set mvsubscript𝑚𝑣m_{v}italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT to its lowest possible value, 1111 , and vsubscript𝑣\ell_{v}roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT to its highest possible value k𝑘kitalic_k (both are attainable by some graphs), and still requires γ𝛾\gammaitalic_γ to satisfy (13). This means it is necessary and sufficient to require γk+1𝛾𝑘1\gamma\geq k+1italic_γ ≥ italic_k + 1. In addition, (11) is satisfied unconditionally and therefore does not impose any extra condition on γ𝛾\gammaitalic_γ. ∎

A.2 Proof of Theorem 5

Lemma 6.

All local minimizers of Problem (7) are binary vectors.

Proof.

Let 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be any local minimizer of (7), if all the coordinates of 𝐱𝐱\mathbf{x}bold_x are either 0 or 1, then 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is binary and the proof is complete, otherwise, at least one coordinate of 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is in the interior (0,1)01(0,1)( 0 , 1 ) and we aim to prove that this is not possible (i.e. such a non-binary 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT cannot exist as a minimizer) by contradiction. We assume the non-binary 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT exists, and denote the set of non-binary coordinates as

J:={j:𝐱j(0,1)}.assign𝐽conditional-set𝑗subscriptsuperscript𝐱𝑗01J:=\{j:\mathbf{x}^{*}_{j}\in(0,1)\}\>.italic_J := { italic_j : bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ ( 0 , 1 ) } . (14)

Since 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is non-binary, J𝐽J\neq\emptysetitalic_J ≠ ∅. Since the objective function f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) of (7) is twice differentiable with respect to all 𝐱jsubscript𝐱𝑗\mathbf{x}_{j}bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with 𝐱j(0,1)subscript𝐱𝑗01\mathbf{x}_{j}\in(0,1)bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ ( 0 , 1 ), then a necessary condition for 𝐱superscript𝐱\mathbf{x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to be a local minimizer is that

f(𝐱)|J=0,2f(𝐱)|J0,formulae-sequenceevaluated-at𝑓superscript𝐱𝐽0succeeds-or-equalsevaluated-atsuperscript2𝑓superscript𝐱𝐽0\nabla f(\mathbf{x}^{*})\big{|}_{J}=0,\quad\nabla^{2}f(\mathbf{x}^{*})\big{|}_% {J}\succeq 0,∇ italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT = 0 , ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ⪰ 0 ,

where f(𝐱)|Jevaluated-at𝑓superscript𝐱𝐽\nabla f(\mathbf{x}^{*})\big{|}_{J}∇ italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT is the vector f(𝐱)𝑓superscript𝐱\nabla f(\mathbf{x}^{*})∇ italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) restricted to the index set J𝐽Jitalic_J, and 2f(𝐱)|Jevaluated-atsuperscript2𝑓superscript𝐱𝐽\nabla^{2}f(\mathbf{x}^{*})\big{|}_{J}∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT is the matrix 2f(𝐱)superscript2𝑓superscript𝐱\nabla^{2}f(\mathbf{x}^{*})∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) whose row and column indices are both restricted to the set J𝐽Jitalic_J.

However, the second necessary condition 2f(𝐱)|J0succeeds-or-equalsevaluated-atsuperscript2𝑓superscript𝐱𝐽0\nabla^{2}f(\mathbf{x}^{*})\big{|}_{J}\succeq 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ⪰ 0 cannot hold. Because if it does, then we must have tr(2f(𝐱)|J)>0trevaluated-atsuperscript2𝑓superscript𝐱𝐽0\mathrm{tr}(\nabla^{2}f(\mathbf{x}^{*})\big{|}_{J})>0roman_tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) > 0 (the trace cannot strictly equal to 0 as 2f(𝐱)|J=γ𝐀G𝐀G0evaluated-atsuperscript2𝑓superscript𝐱𝐽𝛾subscript𝐀𝐺subscript𝐀superscript𝐺0\nabla^{2}f(\mathbf{x}^{*})\big{|}_{J}=\gamma\mathbf{A}_{G}-\mathbf{A}_{G^{% \prime}}\neq 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT = italic_γ bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≠ 0). However, on the other hand, we have

tr(2f(𝐱)|J)=tr(𝐈J(γ𝐀G𝐀G)𝐈JT)=0trevaluated-atsuperscript2𝑓superscript𝐱𝐽trsubscript𝐈𝐽𝛾subscript𝐀𝐺subscript𝐀superscript𝐺superscriptsubscript𝐈𝐽𝑇0\mathrm{tr}(\nabla^{2}f(\mathbf{x}^{*})\big{|}_{J})=\mathrm{tr}(\mathbf{I}_{J}% (\gamma\mathbf{A}_{G}-\mathbf{A}_{G^{\prime}})\mathbf{I}_{J}^{T})=0roman_tr ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) = roman_tr ( bold_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_γ bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) bold_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = 0

as the diagonal entries of 𝐀Gsubscript𝐀𝐺\mathbf{A}_{G}bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and 𝐀Gsubscript𝐀superscript𝐺\mathbf{A}_{G^{\prime}}bold_A start_POSTSUBSCRIPT italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are all 0, which leads to a contradiction. Here 𝐈jsubscript𝐈𝑗\mathbf{I}_{j}bold_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denotes the identity matrix with row indices restricted to the index set J𝐽Jitalic_J. ∎

Theorem 7 (Re-statement of Theorem 5).

Given graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and set γn𝛾𝑛\gamma\geq nitalic_γ ≥ italic_n, all local minimizers of (5) correspond to an MIS in G𝐺Gitalic_G.

Proof.

By lemma 6, we can only consider binary vectors as local minimizers. With this, we first prove that all local minimizers are Independent Sets (ISs). Then, we show that any IS, that is not a maximal IS, is not a local minimizer.

  • Here, we show that any local minimizer is an IS. By contradiction, assume that vector 𝐱𝐱\mathbf{x}bold_x, by which 𝐱v=𝐱w=1subscript𝐱𝑣subscript𝐱𝑤1\mathbf{x}_{v}=\mathbf{x}_{w}=1bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = bold_x start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 1 such that (v,w)E𝑣𝑤𝐸(v,w)\in E( italic_v , italic_w ) ∈ italic_E (a binary vector with an edge in G𝐺Gitalic_G), is a local minimizer. Since 𝐱v=1subscript𝐱𝑣1\mathbf{x}_{v}=1bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 is at the right boundary of the interval [0,1]01[0,1][ 0 , 1 ], for it to be a local minimizer, we must have f𝐱v0𝑓subscript𝐱𝑣0\frac{\partial f}{\partial\mathbf{x}_{v}}\leq 0divide start_ARG ∂ italic_f end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG ≤ 0. Together with (9), this implies

    1+γu𝒩(v)𝐱uu𝒩(v)𝐱u0.1𝛾subscript𝑢𝒩𝑣subscript𝐱𝑢subscript𝑢superscript𝒩𝑣subscript𝐱𝑢0-1+\gamma\sum_{u\in\mathcal{N}(v)}\mathbf{x}_{u}-\sum_{u\in\mathcal{N}^{\prime% }(v)}\mathbf{x}_{u}\leq 0\>.- 1 + italic_γ ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≤ 0 . (15)

    Re-arranging (15) and using γn𝛾𝑛\gamma\geq nitalic_γ ≥ italic_n yields to

    nu𝒩(v)𝐱u1+u𝒩(v)𝐱u.𝑛subscript𝑢𝒩𝑣subscript𝐱𝑢1subscript𝑢superscript𝒩𝑣subscript𝐱𝑢n\sum_{u\in\mathcal{N}(v)}\mathbf{x}_{u}\leq 1+\sum_{u\in\mathcal{N}^{\prime}(% v)}\mathbf{x}_{u}\>.italic_n ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≤ 1 + ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT . (16)

    Given that n>Δ(G)𝑛Δsuperscript𝐺n>\Delta(G^{\prime})italic_n > roman_Δ ( italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), the condition in (16) can not be satisfied even if the LHS attains its minimum value (which is n𝑛nitalic_n) and the RHS attains a maximum value. The maximum possible value of the RHS is 1+d(v)=nd(v)1superscriptd𝑣𝑛d𝑣1+\mathrm{d}^{\prime}(v)=n-\mathrm{d}(v)1 + roman_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) = italic_n - roman_d ( italic_v ), where d(v)superscriptd𝑣\mathrm{d}^{\prime}(v)roman_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) is the degree of node v𝑣vitalic_v in Gsuperscript𝐺G^{\prime}italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and the maximum possible value of d(v)superscriptd𝑣\mathrm{d}^{\prime}(v)roman_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) is Δ(G)Δsuperscript𝐺\Delta(G^{\prime})roman_Δ ( italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). This means that when an edge exists in 𝐱𝐱\mathbf{x}bold_x, it can not be a fixed point. Thus, only ISs are local minimizers.

  • Here, we show that Independent Sets that are not maximal are not local minimizers. Define vector 𝐱{0,1}n𝐱superscript01𝑛\mathbf{x}\in\{0,1\}^{n}bold_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT that corresponds to an IS (𝐱)𝐱\mathcal{I}(\mathbf{x})caligraphic_I ( bold_x ). This means that there exists a node uV𝑢𝑉u\in Vitalic_u ∈ italic_V that is not in the IS and is not in the neighbor set of all nodes in the IS. Formally, if there exists u(𝐱)𝑢𝐱u\notin\mathcal{I}(\mathbf{x})italic_u ∉ caligraphic_I ( bold_x ) such that w(𝐱),u𝒩(w)formulae-sequencefor-all𝑤𝐱𝑢𝒩𝑤\forall w\in\mathcal{I}(\mathbf{x}),u\notin\mathcal{N}(w)∀ italic_w ∈ caligraphic_I ( bold_x ) , italic_u ∉ caligraphic_N ( italic_w ), then (𝐱)𝐱\mathcal{I}(\mathbf{x})caligraphic_I ( bold_x ) is an IS, not a maximal IS. Note that such an 𝐱𝐱\mathbf{x}bold_x satisfies 𝐱u=0subscript𝐱𝑢0\mathbf{x}_{u}=0bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0 and

    f𝐱u=1+γu𝒩(v)𝐱uu𝒩(v)𝐱u=1+γu𝒩(v)𝐱uu𝒩(v)𝐱u<0,𝑓subscript𝐱𝑢1𝛾subscript𝑢𝒩𝑣subscript𝐱𝑢subscript𝑢superscript𝒩𝑣subscript𝐱𝑢1𝛾subscript𝑢𝒩𝑣subscript𝐱𝑢subscript𝑢superscript𝒩𝑣subscript𝐱𝑢0\frac{\partial f}{\partial\mathbf{x}_{u}}=-1+\gamma\sum_{u\in\mathcal{N}(v)}% \mathbf{x}_{u}-\sum_{u\in\mathcal{N}^{\prime}(v)}\mathbf{x}_{u}=-1+\gamma\sum_% {u\in\mathcal{N}(v)}\mathbf{x}_{u}-\sum_{u\in\mathcal{N}^{\prime}(v)}\mathbf{x% }_{u}<0\>,divide start_ARG ∂ italic_f end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG = - 1 + italic_γ ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = - 1 + italic_γ ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT < 0 , (17)

    which implies increasing 𝐱usubscript𝐱𝑢\mathbf{x}_{u}bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can further decrease the function value, contradicting to 𝐱𝐱\mathbf{x}bold_x being a local minimizer. In (17), the second summation is 00 as 𝒩(v)(𝐱)=𝒩𝑣𝐱\mathcal{N}(v)\cap\mathcal{I}(\mathbf{x})=\emptysetcaligraphic_N ( italic_v ) ∩ caligraphic_I ( bold_x ) = ∅, which results in (1+u𝒩(v)𝐱u)1subscript𝑢superscript𝒩𝑣subscript𝐱𝑢-(1+\sum_{u\in\mathcal{N}^{\prime}(v)}\mathbf{x}_{u})- ( 1 + ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) that is always negative. Thus, any binary vector that corresponds to an IS that is not maximal is not a local minimizer.

Remark 5.

The above theorem implies that although there still exist non-binary stationary points, they are saddle points instead of local minimizers. Methods with momentum such as ADAGRAD and Adam can usually break out of saddle points and land on local minimizers.

Appendix B Further Implementation Details

In Algorithm 1, we use S𝑆Sitalic_S as the set of initializations we would like to solve. To solve this set of initializations, we choose batch size K𝐾Kitalic_K and number of batches B𝐵Bitalic_B, such that |S|=KB𝑆𝐾𝐵|S|=KB| italic_S | = italic_K italic_B. If we choose a large batch size K𝐾Kitalic_K, then we increase the time to first solution. Conversely, if we choose a large number of batches B𝐵Bitalic_B, then we decrease the number of initializations explored in a given batch, and potentially delay better solutions. Because of this relationship, K𝐾Kitalic_K and B𝐵Bitalic_B need to be chosen carefully. Our K𝐾Kitalic_K is 128 for SATLIB, 256 for ER, and 1024 for GNM. Our B𝐵Bitalic_B is 40 for SATLIB, 28 for ER, 10 for the GNM convergence results, and 5 for the GNM scalability results.

The results of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net and the baselines were obtained across three different machine configurations, with the runtime of the fastest configuration being reported. The NVIDIA A100 80GB PCIe machine utilizes an AMD EPYC 9554 CPU (30 vCores) with 236 GBs of DDR5-4800. The NVIDIA RTX4090 24GB machine utilizes an AMD EPYC 75F3 CPU (16 vCores) with 64 GBs of DDR4-2800. The NVIDIA RTX3070 8GB machine utilizes an Intel i9 12900K with 64 GBs of DDR5-6000. Note: the i9 has Intel Hyper-Threading and E-Cores disabled to maximize single core performance.

B.1 Efficient Implementation of MIS Checking

Based on the characteristics of the local minimizers of Problem (5), discussed in Lemma 6 and Theorem 5, we propose an efficient implementation to check whether a vector 𝐱[0,1]n𝐱superscript01𝑛\mathbf{x}\in[0,1]^{n}bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT corresponds to a MIS. This means Line 7 in Algorithm 1.

We note that we need to check two subsequent conditions. The first is whether a binary vector corresponds to an IS (no nodes in IS contain any edges), and the second is whether this IS is maximal.

Given a vector 𝐱[0,1]n𝐱superscript01𝑛\mathbf{x}\in[0,1]^{n}bold_x ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we obtain a binary representation of 𝐱𝐱\mathbf{x}bold_x, denoted by 𝐱𝐱\mathbf{x}bold_x denoted by 𝐳{0,1}n𝐳superscript01𝑛\mathbf{z}\in\{0,1\}^{n}bold_z ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, such that for all vV𝑣𝑉v\in Vitalic_v ∈ italic_V, 𝐳v=1subscript𝐳𝑣1\mathbf{z}_{v}=1bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 if 𝐱v>0subscript𝐱𝑣0\mathbf{x}_{v}>0bold_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT > 0, and 𝐳v=0subscript𝐳𝑣0\mathbf{z}_{v}=0bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 0 otherwise.

Based on the results of Appendix A, for some α>0𝛼0\alpha>0italic_α > 0, our MIS checking involves verifying whether the following equality is True.

𝐳=Proj[0,1]n(𝐳α𝐱f(𝐳))=Proj[0,1]n(𝐳+α𝐞α2𝐳T(γ𝐀G𝐀G)𝐳).𝐳subscriptProjsuperscript01𝑛𝐳𝛼subscript𝐱𝑓𝐳subscriptProjsuperscript01𝑛𝐳𝛼𝐞𝛼2superscript𝐳𝑇𝛾subscript𝐀𝐺superscriptsubscript𝐀𝐺𝐳\mathbf{z}=\mathrm{Proj}_{[0,1]^{n}}\Big{(}\mathbf{z}-\alpha\nabla_{\mathbf{x}% }f(\mathbf{z})\Big{)}=\mathrm{Proj}_{[0,1]^{n}}\Big{(}\mathbf{z}+\alpha\mathbf% {e}-\frac{\alpha}{2}\mathbf{z}^{T}(\gamma\mathbf{A}_{G}-\mathbf{A}_{G}^{\prime% })\mathbf{z}\Big{)}\>.bold_z = roman_Proj start_POSTSUBSCRIPT [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_z - italic_α ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_z ) ) = roman_Proj start_POSTSUBSCRIPT [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_z + italic_α bold_e - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG bold_z start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_γ bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT - bold_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) bold_z ) . (18)

In (18), a simple projected gradient descent step to check whether 𝐳𝐳\mathbf{z}bold_z is at the boundary. We note that, computationally, this only requires a matrix-vector multiplication. Compared to the traditional method, which iterates over all the nodes in the MIS to check their neighbours, using (18) is 8X faster.

Appendix C Requirements Comparison with Baselines

In Table 2, we provide an overview comparison of the number of trainable parameters, hyper-parameters, and additional techniques needed for each baseline.

ReduMIS depends on a large set of graph reductions (see Section 3.1 in [3]) and graph clustering, which is used for solution improvement.

For learning-based methods, the parameters of a neural network architecture are optimized during training. This architecture is typically much larger than the number of input coordinates (>>nmuch-greater-thanabsent𝑛>>n> > italic_n). For instance, the network used in DIFUSCO consists of 12 layers, each with 5 trainable weight matrices. Each weight matrix is 256×256256256256\times 256256 × 256, resulting in 3932160393216039321603932160 trainable parameters for the SATLIB dataset (which has at most 1347 nodes).

Moreover, this dependence on training a NN introduces several hyper-parameters such as the number of layers, size of layers, choice of activation functions, etc.

It’s important to note that the choice of the sampler in iSCO introduces additional hyper-parameters. For instance, the PAS sampler [35] used in iSCO depends on the choice of the neighborhood function, a prior on the path length, and the choice of the probability of acceptance.

In terms of the number of optimization variables, 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net only requires n𝑛nitalic_n variables and a much-reduced number of hyper-parameters compared to iSCO.

Appendix D Additional Results

D.1 Convergence Plots with Fixed Run-time

In this section, we conduct an additional experiment to highlight the effectiveness of our proposed approach. Specifically, we use four ER (resp. GNM) graphs with n=700𝑛700n=700italic_n = 700 (resp. n=300𝑛300n=300italic_n = 300) and run our method and each baseline for a fixed run time of 14 (resp. 12) seconds. We then show the progress of the best obtained MIS over time. Figure 3 and Figure 4 present the results.

As observed, our method finds very good solutions early in the optimization process. For ER, within the 14-second time budget, we outperform the ILP commercial solvers in almost all cases. Additionally, ReduMIS takes 4 to 7 seconds to generate the first solution, whereas 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net produces a good solution within the first second. For GNM, most methods reaches the 12 nodes mark. However, our method reaches the 12 node solution within the first or second second.

These convergence plots provide additional evidence of the scalability of 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net .

Refer to caption
Figure 3: Converge plots for 4444 ER graph instances using 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net , the SOTA heuristic solver ReduMIS, and two commercial ILP solvers.
Refer to caption
Figure 4: Converge plots for 4444 GNM (with n=300𝑛300n=300italic_n = 300 and m=22425𝑚22425m=22425italic_m = 22425) graph instances using 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net , the SOTA heuristic solver ReduMIS, and two commercial ILP solvers.
Table 2: Requirements comparison with baselines. For the ILPs (Gurobi and CP-SAT), trainable parameters correspond to n𝑛nitalic_n binary decision variables. ReduMIS is not an optimization method. However, they use n𝑛nitalic_n binary variables, one for each node.
Method Size Hyper-Parameters Additional Techniques/Procedures
ReduMIS n𝑛nitalic_n variables N/A Many graph reductions, and graph clustering
Gurobi n𝑛nitalic_n variables N/A N/A
CP-SAT n𝑛nitalic_n variables N/A N/A
GCN >>nmuch-greater-thanabsent𝑛>>n> > italic_n trainable parameters Many as it is learning-based Tree Search
LwD >>nmuch-greater-thanabsent𝑛>>n> > italic_n trainable parameters Many as it is learning-based Entropy Regularization
DIMES >>nmuch-greater-thanabsent𝑛>>n> > italic_n trainable parameters Many as it is learning-based Tree Search or Sampling Decoding
DIFUSCO >>nmuch-greater-thanabsent𝑛>>n> > italic_n trainable parameters Many as it is learning-based Greedy Decoding or Sampling Decoding
iSCO n𝑛nitalic_n variables Temperature, Sampler, Chain length Post Processing for Correction
𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net n𝑛nitalic_n trainable parameters Learning rate, exploration parameter η𝜂\etaitalic_η, number of steps T𝑇Titalic_T Optional SDP initialization

D.2 Impact of the Compliment Graph term in 𝖰𝗎𝖺𝗇𝗍𝖰𝗎𝖺𝗇𝗍\mathsf{Quant}sansserif_Quant-𝖭𝖾𝗍𝖭𝖾𝗍\mathsf{Net}sansserif_Net 

In this subsection, we demonstrate the impact of incorporating the proposed compliment graph term. Specifically, we use three GNM graphs with (n,m)=(100,2475)𝑛𝑚1002475(n,m)=(100,2475)( italic_n , italic_m ) = ( 100 , 2475 ) and run Algorithm 1 for 10,0001000010,00010 , 000 iterations, both with (γ=n𝛾𝑛\gamma=nitalic_γ = italic_n) and without (γ=1.0001𝛾1.0001\gamma=1.0001italic_γ = 1.0001, similar to iSCO [34]) the complement graph term. Each time a solution is found, we sample from the uniform distribution and optimize using Adam until the 10,0001000010,00010 , 000 iterations are complete. For both cases, we used an initial learning rate of 0.5. The results are presented in Figure 5.

As shown, when the third term is included, our algorithm finds larger MISs while requiring fewer iterations (the first three plots). The fourth plot illustrates the number of MISs found (which may not be unique) with and without the third term across the three graph instances (x-axis). It is evident that including the third term results in finding more than 100 solutions, whereas disabling the third term yields fewer than 5 solutions within the 10,000 iterations. This indicates that, given one initialization, utilizing the third term significantly accelerates the optimizer’s convergence to a local minimizer. Furthermore, fast convergence means that the number of initializations in the search sparse also increases which yields to improving the exploration.

Refer to caption
Figure 5: The first three plots show the convergence results for 3333 GNM Graph instances with and without the proposed compliment graph term w.r.t. iteration t[10000]𝑡delimited-[]10000t\in[10000]italic_t ∈ [ 10000 ]. The fourth plot shows the number of found MISs for each graph instance and case.