-
Computing mixed Schatten norm of completely positive maps
Authors:
Mohammad ShahverdiKondori,
Sio On Chan
Abstract:
Computing $p \rightarrow q$ norm for matrices is a classical problem in computational mathematics and power iteration is a well-known method for computing $p \rightarrow q $ norm for a matrix with nonnegative entries. Here we define an equivalent iteration method for computing $ S_p \rightarrow S_q $ norm for completely positive maps where $S_p$ is the Schatten $p$ norm. We generalize almost all o…
▽ More
Computing $p \rightarrow q$ norm for matrices is a classical problem in computational mathematics and power iteration is a well-known method for computing $p \rightarrow q $ norm for a matrix with nonnegative entries. Here we define an equivalent iteration method for computing $ S_p \rightarrow S_q $ norm for completely positive maps where $S_p$ is the Schatten $p$ norm. We generalize almost all of the definitions, properties, lemmas, etc. in the matrix setting to completely positive maps and prove an important theorem in this setting.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Rethinking Graph Neural Networks for the Graph Coloring Problem
Authors:
Wei Li,
Ruxuan Li,
Yuzhe Ma,
Siu On Chan,
David Pan,
Bei Yu
Abstract:
Graph coloring, a classical and critical NP-hard problem, is the problem of assigning connected nodes as different colors as possible. However, we observe that state-of-the-art GNNs are less successful in the graph coloring problem. We analyze the reasons from two perspectives. First, most GNNs fail to generalize the task under homophily to heterophily, i.e., graphs where connected nodes are assig…
▽ More
Graph coloring, a classical and critical NP-hard problem, is the problem of assigning connected nodes as different colors as possible. However, we observe that state-of-the-art GNNs are less successful in the graph coloring problem. We analyze the reasons from two perspectives. First, most GNNs fail to generalize the task under homophily to heterophily, i.e., graphs where connected nodes are assigned different colors. Second, GNNs are bounded by the network depth, making them possible to be a local method, which has been demonstrated to be non-optimal in Maximum Independent Set (MIS) problem. In this paper, we focus on the aggregation-combine GNNs (AC-GNNs), a popular class of GNNs. We first define the power of AC-GNNs in the coloring problem as the capability to assign nodes different colors. The definition is different with previous one that is based on the assumption of homophily. We identify node pairs that AC-GNNs fail to discriminate. Furthermore, we show that any AC-GNN is a local coloring method, and any local coloring method is non-optimal by exploring the limits of local methods over sparse random graphs, thereby demonstrating the non-optimality of AC-GNNs due to its local property. We then prove the positive correlation between model depth and its coloring power. Moreover, we discuss the color equivariance of graphs to tackle some practical constraints such as the pre-fixing constraints. Following the discussions above, we summarize a series of rules a series of rules that make a GNN color equivariant and powerful in the coloring problem. Then, we propose a simple AC-GNN variation satisfying these rules. We empirically validate our theoretical findings and demonstrate that our simple model substantially outperforms state-of-the-art heuristic algorithms in both quality and runtime.
△ Less
Submitted 19 August, 2022; v1 submitted 14 August, 2022;
originally announced August 2022.
-
The Gambler's Problem and Beyond
Authors:
Baoxiang Wang,
Shuai Li,
Jia** Li,
Siu On Chan
Abstract:
We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points…
▽ More
We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.
△ Less
Submitted 12 July, 2020; v1 submitted 31 December, 2019;
originally announced January 2020.
-
On the Worst-Case Approximability of Sparse PCA
Authors:
Siu On Chan,
Dimitris Papailiopoulos,
Aviad Rubinstein
Abstract:
It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances. What is the complexity of solving Sparse PCA approximately? Our contributions include: 1) a simple and efficient algorithm that achieves an $n^{-1/3}$-approximation; 2) NP-hardness of approximation to within $(1-\varepsilon)$, for some small constant $\varepsilon > 0$; 3) SSE…
▽ More
It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances. What is the complexity of solving Sparse PCA approximately? Our contributions include: 1) a simple and efficient algorithm that achieves an $n^{-1/3}$-approximation; 2) NP-hardness of approximation to within $(1-\varepsilon)$, for some small constant $\varepsilon > 0$; 3) SSE-hardness of approximation to within any constant factor; and 4) an $\exp\exp\left(Ω\left(\sqrt{\log \log n}\right)\right)$ ("quasi-quasi-polynomial") gap for the standard semidefinite program.
△ Less
Submitted 21 July, 2015;
originally announced July 2015.
-
Random Walks and Evolving Sets: Faster Convergences and Limitations
Authors:
Siu On Chan,
Tsz Chiu Kwok,
Lap Chi Lau
Abstract:
Analyzing the mixing time of random walks is a well-studied problem with applications in random sampling and more recently in graph partitioning. In this work, we present new analysis of random walks and evolving sets using more combinatorial graph structures, and show some implications in approximating small-set expansion. On the other hand, we provide examples showing the limitations of using ra…
▽ More
Analyzing the mixing time of random walks is a well-studied problem with applications in random sampling and more recently in graph partitioning. In this work, we present new analysis of random walks and evolving sets using more combinatorial graph structures, and show some implications in approximating small-set expansion. On the other hand, we provide examples showing the limitations of using random walks and evolving sets in disproving the small-set expansion hypothesis.
- We define a combinatorial analog of the spectral gap, and use it to prove the convergence of non-lazy random walks. A corollary is a tight lower bound on the small-set expansion of graph powers for any graph.
- We prove that random walks converge faster when the robust vertex expansion of the graph is larger. This provides an improved analysis of the local graph partitioning algorithm using the evolving set process.
- We give an example showing that the evolving set process fails to disprove the small-set expansion hypothesis. This refutes a conjecture of Oveis Gharan and shows the limitations of local graph partitioning algorithms in approximating small-set expansion.
△ Less
Submitted 8 July, 2015;
originally announced July 2015.
-
Sum of Squares Lower Bounds from Pairwise Independence
Authors:
Boaz Barak,
Siu On Chan,
Pravesh Kothari
Abstract:
We prove that for every $ε>0$ and predicate $P:\{0,1\}^k\rightarrow \{0,1\}$ that supports a pairwise independent distribution, there exists an instance $\mathcal{I}$ of the $\mathsf{Max}P$ constraint satisfaction problem on $n$ variables such that no assignment can satisfy more than a $\tfrac{|P^{-1}(1)|}{2^k}+ε$ fraction of $\mathcal{I}$'s constraints but the degree $Ω(n)$ Sum of Squares semidef…
▽ More
We prove that for every $ε>0$ and predicate $P:\{0,1\}^k\rightarrow \{0,1\}$ that supports a pairwise independent distribution, there exists an instance $\mathcal{I}$ of the $\mathsf{Max}P$ constraint satisfaction problem on $n$ variables such that no assignment can satisfy more than a $\tfrac{|P^{-1}(1)|}{2^k}+ε$ fraction of $\mathcal{I}$'s constraints but the degree $Ω(n)$ Sum of Squares semidefinite programming hierarchy cannot certify that $\mathcal{I}$ is unsatisfiable. Similar results were previously only known for weaker hierarchies.
△ Less
Submitted 26 March, 2015; v1 submitted 4 January, 2015;
originally announced January 2015.
-
Approximate Constraint Satisfaction Requires Large LP Relaxations
Authors:
Siu On Chan,
James R. Lee,
Prasad Raghavendra,
David Steurer
Abstract:
We prove super-polynomial lower bounds on the size of linear programming relaxations for approximation versions of constraint satisfaction problems. We show that for these problems, polynomial-sized linear programs are exactly as powerful as programs arising from a constant number of rounds of the Sherali-Adams hierarchy.
In particular, any polynomial-sized linear program for Max Cut has an inte…
▽ More
We prove super-polynomial lower bounds on the size of linear programming relaxations for approximation versions of constraint satisfaction problems. We show that for these problems, polynomial-sized linear programs are exactly as powerful as programs arising from a constant number of rounds of the Sherali-Adams hierarchy.
In particular, any polynomial-sized linear program for Max Cut has an integrality gap of 1/2 and any such linear program for Max 3-Sat has an integrality gap of 7/8.
△ Less
Submitted 8 February, 2016; v1 submitted 2 September, 2013;
originally announced September 2013.
-
On extracting common random bits from correlated sources on large alphabets
Authors:
Siu On Chan,
Elchanan Mossel,
Joe Neeman
Abstract:
Suppose Alice and Bob receive strings $X=(X_1,...,X_n)$ and $Y=(Y_1,...,Y_n)$ each uniformly random in $[s]^n$ but so that $X$ and $Y$ are correlated . For each symbol $i$, we have that $Y_i = X_i$ with probability $1-\eps$ and otherwise $Y_i$ is chosen independently and uniformly from $[s]$.
Alice and Bob wish to use their respective strings to extract a uniformly chosen common sequence from…
▽ More
Suppose Alice and Bob receive strings $X=(X_1,...,X_n)$ and $Y=(Y_1,...,Y_n)$ each uniformly random in $[s]^n$ but so that $X$ and $Y$ are correlated . For each symbol $i$, we have that $Y_i = X_i$ with probability $1-\eps$ and otherwise $Y_i$ is chosen independently and uniformly from $[s]$.
Alice and Bob wish to use their respective strings to extract a uniformly chosen common sequence from $[s]^k$ but without communicating. How well can they do? The trivial strategy of outputting the first $k$ symbols yields an agreement probability of $(1 - \eps + \eps/s)^k$. In a recent work by Bogdanov and Mossel it was shown that in the binary case where $s=2$ and $k = k(\eps)$ is large enough then it is possible to extract $k$ bits with a better agreement probability rate. In particular, it is possible to achieve agreement probability $(k\eps)^{-1/2} \cdot 2^{-k\eps/(2(1 - \eps/2))}$ using a random construction based on Hamming balls, and this is optimal up to lower order terms.
In the current paper we consider the same problem over larger alphabet sizes $s$ and we show that the agreement probability rate changes dramatically as the alphabet grows. In particular we show no strategy can achieve agreement probability better than $(1-\eps)^k (1+δ(s))^k$ where $δ(s) \to 0$ as $s \to \infty$. We also show that Hamming ball based constructions have {\em much lower} agreement probability rate than the trivial algorithm as $s \to \infty$. Our proofs and results are intimately related to subtle properties of hypercontractive inequalities.
△ Less
Submitted 29 August, 2012;
originally announced August 2012.