-
Generalized Hypergraph Matching via Iterated Packing and Local Ratio
Authors:
Ojas Parekh,
David Pritchard
Abstract:
In $k$-hypergraph matching, we are given a collection of sets of size at most $k$, each with an associated weight, and we seek a maximum-weight subcollection whose sets are pairwise disjoint. More generally, in $k$-hypergraph $b$-matching, instead of disjointness we require that every element appears in at most $b$ sets of the subcollection. Our main result is a linear-programming based…
▽ More
In $k$-hypergraph matching, we are given a collection of sets of size at most $k$, each with an associated weight, and we seek a maximum-weight subcollection whose sets are pairwise disjoint. More generally, in $k$-hypergraph $b$-matching, instead of disjointness we require that every element appears in at most $b$ sets of the subcollection. Our main result is a linear-programming based $(k-1+\tfrac{1}{k})$-approximation algorithm for $k$-hypergraph $b$-matching. This settles the integrality gap when $k$ is one more than a prime power, since it matches a previously-known lower bound. When the hypergraph is bipartite, we are able to improve the approximation ratio to $k-1$, which is also best possible relative to the natural LP. These results are obtained using a more careful application of the \emph{iterated packing} method.
Using the bipartite algorithmic integrality gap upper bound, we show that for the family of combinatorial auctions in which anyone can win at most $t$ items, there is a truthful-in-expectation polynomial-time auction that $t$-approximately maximizes social welfare. We also show that our results directly imply new approximations for a generalization of the recently introduced bounded-color matching problem.
We also consider the generalization of $b$-matching to \emph{demand matching}, where edges have nonuniform demand values. The best known approximation algorithm for this problem has ratio $2k$ on $k$-hypergraphs. We give a new algorithm, based on local ratio, that obtains the same approximation ratio in a much simpler way.
△ Less
Submitted 1 April, 2016;
originally announced April 2016.
-
Frequency Distribution of Error Messages
Authors:
David Pritchard
Abstract:
Which programming error messages are the most common? We investigate this question, motivated by writing error explanations for novices. We consider large data sets in Python and Java that include both syntax and run-time errors. In both data sets, after grou** essentially identical messages, the error message frequencies empirically resemble Zipf-Mandelbrot distributions. We use a maximum-likel…
▽ More
Which programming error messages are the most common? We investigate this question, motivated by writing error explanations for novices. We consider large data sets in Python and Java that include both syntax and run-time errors. In both data sets, after grou** essentially identical messages, the error message frequencies empirically resemble Zipf-Mandelbrot distributions. We use a maximum-likelihood approach to fit the distribution parameters. This gives one possible way to contrast languages or compilers quantitatively.
△ Less
Submitted 24 September, 2015;
originally announced September 2015.
-
CS Circles: An In-Browser Python Course for Beginners
Authors:
David Pritchard,
Troy Vasiga
Abstract:
Computer Science Circles is a free programming website for beginners that is designed to be fun, easy to use, and accessible to the broadest possible audience. We teach Python since it is simple yet powerful, and the course content is well-structured but written in plain language. The website has over one hundred exercises in thirty lesson pages, plus special features to help teachers support thei…
▽ More
Computer Science Circles is a free programming website for beginners that is designed to be fun, easy to use, and accessible to the broadest possible audience. We teach Python since it is simple yet powerful, and the course content is well-structured but written in plain language. The website has over one hundred exercises in thirty lesson pages, plus special features to help teachers support their students. It is available in both English and French. We discuss the philosophy behind the course and its design, we describe how it was implemented, and we give statistics on its use.
△ Less
Submitted 10 December, 2012; v1 submitted 10 September, 2012;
originally announced September 2012.
-
On Approximating String Selection Problems with Outliers
Authors:
Christina Boucher,
Gad M. Landau,
Avivit Levy,
David Pritchard,
Oren Weimann
Abstract:
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We pro…
▽ More
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
△ Less
Submitted 13 February, 2012;
originally announced February 2012.
-
Counting large distances in convex polygons
Authors:
Filip Morić,
David Pritchard
Abstract:
In a convex n-gon, let d[1] > d[2] > ... denote the set of all distances between pairs of vertices, and let m[i] be the number of pairs of vertices at distance d[i] from one another. Erdos, Lovasz, and Vesztergombi conjectured that m[1] + ... + m[k] <= k*n. Using a new computational approach, we prove their conjecture when k <= 4 and n is large; we also make some progress for arbitrary k by provin…
▽ More
In a convex n-gon, let d[1] > d[2] > ... denote the set of all distances between pairs of vertices, and let m[i] be the number of pairs of vertices at distance d[i] from one another. Erdos, Lovasz, and Vesztergombi conjectured that m[1] + ... + m[k] <= k*n. Using a new computational approach, we prove their conjecture when k <= 4 and n is large; we also make some progress for arbitrary k by proving that m[1] + ... + m[k] <= (2k-1)n. Our main approach revolves around a few known facts about distances, together with a computer program that searches all distance configurations of two disjoint convex hull intervals up to some finite size. We thereby obtain other new bounds such as m[3] <= 3n/2 for large n.
△ Less
Submitted 29 July, 2011; v1 submitted 2 March, 2011;
originally announced March 2011.
-
Cover-Decomposition and Polychromatic Numbers
Authors:
Béla Bollobás,
David Pritchard,
Thomas Rothvoß,
Alex Scott
Abstract:
A colouring of a hypergraph's vertices is polychromatic if every hyperedge contains at least one vertex of each colour; the polychromatic number is the maximum number of colours in such a colouring. Its dual, the cover-decomposition number, is the maximum number of disjoint hyperedge-covers. In geometric hypergraphs, there is extensive work on lower-bounding these numbers in terms of their trivial…
▽ More
A colouring of a hypergraph's vertices is polychromatic if every hyperedge contains at least one vertex of each colour; the polychromatic number is the maximum number of colours in such a colouring. Its dual, the cover-decomposition number, is the maximum number of disjoint hyperedge-covers. In geometric hypergraphs, there is extensive work on lower-bounding these numbers in terms of their trivial upper bounds (minimum hyperedge size and degree); our goal here is to broaden the study beyond geometric settings. We obtain algorithms yielding near-tight bounds for three families of hypergraphs: bounded hyperedge size, paths in trees, and bounded VC-dimension. This reveals that discrepancy theory and iterated linear program relaxation are useful for cover-decomposition. Finally, we discuss the generalization of cover-decomposition to sensor cover.
△ Less
Submitted 29 May, 2012; v1 submitted 30 September, 2010;
originally announced September 2010.
-
Integrality Gap of the Hypergraphic Relaxation of Steiner Trees: a short proof of a 1.55 upper bound
Authors:
Deeparnab Chakrabarty,
Jochen Koenemann,
David Pritchard
Abstract:
Recently Byrka, Grandoni, Rothvoss and Sanita (at STOC 2010) gave a 1.39-approximation for the Steiner tree problem, using a hypergraph-based linear programming relaxation. They also upper-bounded its integrality gap by 1.55. We describe a shorter proof of the same integrality gap bound, by applying some of their techniques to a randomized loss-contracting algorithm.
Recently Byrka, Grandoni, Rothvoss and Sanita (at STOC 2010) gave a 1.39-approximation for the Steiner tree problem, using a hypergraph-based linear programming relaxation. They also upper-bounded its integrality gap by 1.55. We describe a shorter proof of the same integrality gap bound, by applying some of their techniques to a randomized loss-contracting algorithm.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
An LP with Integrality Gap 1+epsilon for Multidimensional Knapsack
Authors:
David Pritchard
Abstract:
In this note we study packing or covering integer programs with at most k constraints, which are also known as k-dimensional knapsack problems. For any integer k > 0 and real epsilon > 0, we observe there is a polynomial-sized LP for the k-dimensional knapsack problem with integrality gap at most 1+epsilon. The variables may be unbounded or have arbitrary upper bounds. In the packing case, we can…
▽ More
In this note we study packing or covering integer programs with at most k constraints, which are also known as k-dimensional knapsack problems. For any integer k > 0 and real epsilon > 0, we observe there is a polynomial-sized LP for the k-dimensional knapsack problem with integrality gap at most 1+epsilon. The variables may be unbounded or have arbitrary upper bounds. In the packing case, we can also remove the dependence of the LP on the cost-function, yielding a polyhedral approximation of the integer hull. This generalizes a recent result of Bienstock on the classical knapsack problem.
△ Less
Submitted 2 February, 2011; v1 submitted 18 May, 2010;
originally announced May 2010.
-
k-Edge-Connectivity: Approximation and LP Relaxation
Authors:
David Pritchard
Abstract:
In the k-edge-connected spanning subgraph problem we are given a graph (V, E) and costs for each edge, and want to find a minimum-cost subset F of E such that (V, F) is k-edge-connected. We show there is a constant eps > 0 so that for all k > 1, finding a (1 + eps)-approximation for k-ECSS is NP-hard, establishing a gap between the unit-cost and general-cost versions. Next, we consider the multi-s…
▽ More
In the k-edge-connected spanning subgraph problem we are given a graph (V, E) and costs for each edge, and want to find a minimum-cost subset F of E such that (V, F) is k-edge-connected. We show there is a constant eps > 0 so that for all k > 1, finding a (1 + eps)-approximation for k-ECSS is NP-hard, establishing a gap between the unit-cost and general-cost versions. Next, we consider the multi-subgraph cousin of k-ECSS, in which we purchase a multi-subset F of E, with unlimited parallel copies available at the same cost as the original edge. We conjecture that a (1 + Theta(1/k))-approximation algorithm exists, and we describe an approach based on graph decompositions applied to its natural linear programming (LP) relaxation. The LP is essentially equivalent to the Held-Karp LP for TSP and the undirected LP for Steiner tree. We give a family of extreme points for the LP which are more complex than those previously known.
△ Less
Submitted 4 October, 2010; v1 submitted 12 April, 2010;
originally announced April 2010.
-
Limits of Approximation Algorithms: PCPs and Unique Games (DIMACS Tutorial Lecture Notes)
Authors:
Prahladh Harsha,
Moses Charikar,
Matthew Andrews,
Sanjeev Arora,
Subhash Khot,
Dana Moshkovitz,
Lisa Zhang,
Ashkan Aazami,
Dev Desai,
Igor Gorodezky,
Geetha Jagannathan,
Alexander S. Kulikov,
Darakhshan J. Mir,
Alantha Newman,
Aleksandar Nikolov,
David Pritchard,
Gwen Spencer
Abstract:
These are the lecture notes for the DIMACS Tutorial "Limits of Approximation Algorithms: PCPs and Unique Games" held at the DIMACS Center, CoRE Building, Rutgers University on 20-21 July, 2009. This tutorial was jointly sponsored by the DIMACS Special Focus on Hardness of Approximation, the DIMACS Special Focus on Algorithmic Foundations of the Internet, and the Center for Computational Intracta…
▽ More
These are the lecture notes for the DIMACS Tutorial "Limits of Approximation Algorithms: PCPs and Unique Games" held at the DIMACS Center, CoRE Building, Rutgers University on 20-21 July, 2009. This tutorial was jointly sponsored by the DIMACS Special Focus on Hardness of Approximation, the DIMACS Special Focus on Algorithmic Foundations of the Internet, and the Center for Computational Intractability with support from the National Security Agency and the National Science Foundation.
The speakers at the tutorial were Matthew Andrews, Sanjeev Arora, Moses Charikar, Prahladh Harsha, Subhash Khot, Dana Moshkovitz and Lisa Zhang. The sribes were Ashkan Aazami, Dev Desai, Igor Gorodezky, Geetha Jagannathan, Alexander S. Kulikov, Darakhshan J. Mir, Alantha Newman, Aleksandar Nikolov, David Pritchard and Gwen Spencer.
△ Less
Submitted 20 February, 2010;
originally announced February 2010.
-
Hypergraphic LP Relaxations for Steiner Trees
Authors:
Deeparnab Chakrabarty,
Jochen Koenemann,
David Pritchard
Abstract:
We investigate hypergraphic LP relaxations for the Steiner tree problem, primarily the partition LP relaxation introduced by Koenemann et al. [Math. Programming, 2009]. Specifically, we are interested in proving upper bounds on the integrality gap of this LP, and studying its relation to other linear relaxations. Our results are the following. Structural results: We extend the technique of uncross…
▽ More
We investigate hypergraphic LP relaxations for the Steiner tree problem, primarily the partition LP relaxation introduced by Koenemann et al. [Math. Programming, 2009]. Specifically, we are interested in proving upper bounds on the integrality gap of this LP, and studying its relation to other linear relaxations. Our results are the following. Structural results: We extend the technique of uncrossing, usually applied to families of sets, to families of partitions. As a consequence we show that any basic feasible solution to the partition LP formulation has sparse support. Although the number of variables could be exponential, the number of positive variables is at most the number of terminals. Relations with other relaxations: We show the equivalence of the partition LP relaxation with other known hypergraphic relaxations. We also show that these hypergraphic relaxations are equivalent to the well studied bidirected cut relaxation, if the instance is quasibipartite. Integrality gap upper bounds: We show an upper bound of sqrt(3) ~ 1.729 on the integrality gap of these hypergraph relaxations in general graphs. In the special case of uniformly quasibipartite instances, we show an improved upper bound of 73/60 ~ 1.216. By our equivalence theorem, the latter result implies an improved upper bound for the bidirected cut relaxation as well.
△ Less
Submitted 6 March, 2010; v1 submitted 1 October, 2009;
originally announced October 2009.
-
Approximability of Sparse Integer Programs
Authors:
David Pritchard,
Deeparnab Chakrabarty
Abstract:
The main focus of this paper is a pair of new approximation algorithms for certain integer programs. First, for covering integer programs {min cx: Ax >= b, 0 <= x <= d} where A has at most k nonzeroes per row, we give a k-approximation algorithm. (We assume A, b, c, d are nonnegative.) For any k >= 2 and eps>0, if P != NP this ratio cannot be improved to k-1-eps, and under the unique games conje…
▽ More
The main focus of this paper is a pair of new approximation algorithms for certain integer programs. First, for covering integer programs {min cx: Ax >= b, 0 <= x <= d} where A has at most k nonzeroes per row, we give a k-approximation algorithm. (We assume A, b, c, d are nonnegative.) For any k >= 2 and eps>0, if P != NP this ratio cannot be improved to k-1-eps, and under the unique games conjecture this ratio cannot be improved to k-eps. One key idea is to replace individual constraints by others that have better rounding properties but the same nonnegative integral solutions; another critical ingredient is knapsack-cover inequalities. Second, for packing integer programs {max cx: Ax <= b, 0 <= x <= d} where A has at most k nonzeroes per column, we give a (2k^2+2)-approximation algorithm. Our approach builds on the iterated LP relaxation framework. In addition, we obtain improved approximations for the second problem when k=2, and for both problems when every A_{ij} is small compared to b_i. Finally, we demonstrate a 17/16-inapproximability for covering integer programs with at most two nonzeroes per column.
△ Less
Submitted 9 February, 2010; v1 submitted 6 April, 2009;
originally announced April 2009.
-
A Partition-Based Relaxation For Steiner Trees
Authors:
Jochen Konemann,
David Pritchard,
Kunlun Tan
Abstract:
The Steiner tree problem is a classical NP-hard optimization problem with a wide range of practical applications. In an instance of this problem, we are given an undirected graph G=(V,E), a set of terminals R, and non-negative costs c_e for all edges e in E. Any tree that contains all terminals is called a Steiner tree; the goal is to find a minimum-cost Steiner tree. The nodes V R are called St…
▽ More
The Steiner tree problem is a classical NP-hard optimization problem with a wide range of practical applications. In an instance of this problem, we are given an undirected graph G=(V,E), a set of terminals R, and non-negative costs c_e for all edges e in E. Any tree that contains all terminals is called a Steiner tree; the goal is to find a minimum-cost Steiner tree. The nodes V R are called Steiner nodes.
The best approximation algorithm known for the Steiner tree problem is due to Robins and Zelikovsky (SIAM J. Discrete Math, 2005); their greedy algorithm achieves a performance guarantee of 1+(ln 3)/2 ~ 1.55. The best known linear (LP)-based algorithm, on the other hand, is due to Goemans and Bertsimas (Math. Programming, 1993) and achieves an approximation ratio of 2-2/|R|. In this paper we establish a link between greedy and LP-based approaches by showing that Robins and Zelikovsky's algorithm has a natural primal-dual interpretation with respect to a novel partition-based linear programming relaxation. We also exhibit surprising connections between the new formulation and existing LPs and we show that the new LP is stronger than the bidirected cut formulation.
An instance is b-quasi-bipartite if each connected component of G R has at most b vertices. We show that Robins' and Zelikovsky's algorithm has an approximation ratio better than 1+(ln 3)/2 for such instances, and we prove that the integrality gap of our LP is between 8/7 and (2b+1)/(b+1).
△ Less
Submitted 20 December, 2007;
originally announced December 2007.
-
Efficient Divide-and-Conquer Implementations Of Symmetric FSAs
Authors:
David Pritchard
Abstract:
A deterministic finite-state automaton (FSA) is an abstract sequential machine that reads the symbols comprising an input word one at a time. An FSA is symmetric if its output is independent of the order in which the input symbols are read, i.e., if the output is invariant under permutations of the input. We show how to convert a symmetric FSA A into an automaton-like divide-and-conquer process wh…
▽ More
A deterministic finite-state automaton (FSA) is an abstract sequential machine that reads the symbols comprising an input word one at a time. An FSA is symmetric if its output is independent of the order in which the input symbols are read, i.e., if the output is invariant under permutations of the input. We show how to convert a symmetric FSA A into an automaton-like divide-and-conquer process whose intermediate results are no larger than the size of A's memory. In comparison, a similar result for general FSA's has been long known via functional composition, but entails an exponential increase in memory size. The new result has applications to parallel processing and symmetric FSA networks.
△ Less
Submitted 5 August, 2010; v1 submitted 3 August, 2007;
originally announced August 2007.
-
Nearest Neighbor Network Traversal
Authors:
David Pritchard
Abstract:
A mobile agent in a network wants to visit every node of an n-node network, using a small number of steps. We investigate the performance of the following ``nearest neighbor'' heuristic: always go to the nearest unvisited node. If the network graph never changes, then from (Rosenkrantz, Stearns and Lewis, 1977) and (Hurkens and Woeginger, 2004) it follows that Theta(n log n) steps are necessary…
▽ More
A mobile agent in a network wants to visit every node of an n-node network, using a small number of steps. We investigate the performance of the following ``nearest neighbor'' heuristic: always go to the nearest unvisited node. If the network graph never changes, then from (Rosenkrantz, Stearns and Lewis, 1977) and (Hurkens and Woeginger, 2004) it follows that Theta(n log n) steps are necessary and sufficient in the worst case. We give a simpler proof of the upper bound and an example that improves the best known lower bound.
We investigate how the performance of this heuristic changes when it is distributively implemented in a network. Even if network edges are allow to fail over time, we show that the nearest neighbor strategy never runs for more than O(n^2) iterations. We also show that any strategy can be forced to take at least n(n-1)/2 steps before all nodes are visited, if the edges of the network are deleted in an adversarial way.
△ Less
Submitted 19 February, 2007;
originally announced February 2007.
-
Fast Computation of Small Cuts via Cycle Space Sampling
Authors:
David Pritchard,
Ramakrishna Thurimella
Abstract:
We describe a new sampling-based method to determine cuts in an undirected graph. For a graph (V, E), its cycle space is the family of all subsets of E that have even degree at each vertex. We prove that with high probability, sampling the cycle space identifies the cuts of a graph. This leads to simple new linear-time sequential algorithms for finding all cut edges and cut pairs (a set of 2 edges…
▽ More
We describe a new sampling-based method to determine cuts in an undirected graph. For a graph (V, E), its cycle space is the family of all subsets of E that have even degree at each vertex. We prove that with high probability, sampling the cycle space identifies the cuts of a graph. This leads to simple new linear-time sequential algorithms for finding all cut edges and cut pairs (a set of 2 edges that form a cut) of a graph.
In the model of distributed computing in a graph G=(V, E) with O(log V)-bit messages, our approach yields faster algorithms for several problems. The diameter of G is denoted by Diam, and the maximum degree by Delta. We obtain simple O(Diam)-time distributed algorithms to find all cut edges, 2-edge-connected components, and cut pairs, matching or improving upon previous time bounds. Under natural conditions these new algorithms are universally optimal --- i.e. a Omega(Diam)-time lower bound holds on every graph. We obtain a O(Diam+Delta/log V)-time distributed algorithm for finding cut vertices; this is faster than the best previous algorithm when Delta, Diam = O(sqrt(V)). A simple extension of our work yields the first distributed algorithm with sub-linear time for 3-edge-connected components. The basic distributed algorithms are Monte Carlo, but they can be made Las Vegas without increasing the asymptotic complexity.
In the model of parallel computing on the EREW PRAM our approach yields a simple algorithm with optimal time complexity O(log V) for finding cut pairs and 3-edge-connected components.
△ Less
Submitted 21 July, 2010; v1 submitted 19 February, 2007;
originally announced February 2007.
-
An Optimal Distributed Edge-Biconnectivity Algorithm
Authors:
David Pritchard
Abstract:
We describe a synchronous distributed algorithm which identifies the edge-biconnected components of a connected network. It requires a leader, and uses messages of size O(log |V|). The main idea is to preorder a BFS spanning tree, and then to efficiently compute least common ancestors so as to mark cycle edges. This algorithm takes O(Diam) time and uses O(|E|) messages. Furthermore, we show that…
▽ More
We describe a synchronous distributed algorithm which identifies the edge-biconnected components of a connected network. It requires a leader, and uses messages of size O(log |V|). The main idea is to preorder a BFS spanning tree, and then to efficiently compute least common ancestors so as to mark cycle edges. This algorithm takes O(Diam) time and uses O(|E|) messages. Furthermore, we show that no correct singly-initiated edge-biconnectivity algorithm can beat either bound on any graph by more than a constant factor. We also describe a near-optimal local algorithm for edge-biconnectivity.
△ Less
Submitted 5 February, 2006;
originally announced February 2006.