-
New Tools for Peak Memory Scheduling
Authors:
Ce **,
Manish Purohit,
Zoya Svitkina,
Erik Vee,
Joshua R. Wang
Abstract:
We study scheduling of computation graphs to minimize peak memory consumption, an increasingly critical task due to the surge in popularity of large deep-learning models. This problem corresponds to the weighted version of the classical one-shot black pebbling game. We propose the notion of a dominant schedule to capture the idea of finding the ``best'' schedule for a subgraph and introduce new to…
▽ More
We study scheduling of computation graphs to minimize peak memory consumption, an increasingly critical task due to the surge in popularity of large deep-learning models. This problem corresponds to the weighted version of the classical one-shot black pebbling game. We propose the notion of a dominant schedule to capture the idea of finding the ``best'' schedule for a subgraph and introduce new tools to compute and utilize dominant schedules. Surprisingly, we show that despite the strong requirements, a dominant schedule exists for any computation graph; and, moreover, that it is possible to compute the dominant schedule efficiently whenever we can find optimal schedules efficiently for a particular class of graphs (under mild technical conditions).
We apply these new tools to analyze trees and series-parallel graphs. We show that the weighted one-shot black pebbling game is strongly NP-complete even when the graph is an out-tree -- or simpler still, a pumpkin, one of the simplest series-parallel graphs. On the positive side, we design a fixed-parameter tractable algorithm to find a dominant schedule (hence also a peak memory minimizing schedule) for series-parallel graphs when parameterized by the out-degree. This algorithm runs in time $2^{O(d \log d)} \cdot poly(n)$ for series-parallel graphs with $n$ nodes and maximum out-degree $d$; for pumpkins, we can improve the dependence on $d$ to $O(2^d \cdot poly(n))$.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Efficient Caching with Reserves via Marking
Authors:
Sharat Ibrahimpur,
Manish Purohit,
Zoya Svitkina,
Erik Vee,
Joshua R. Wang
Abstract:
Online caching is among the most fundamental and well-studied problems in the area of online algorithms. Innovative algorithmic ideas and analysis -- including potential functions and primal-dual techniques -- give insight into this still-growing area. Here, we introduce a new analysis technique that first uses a potential function to upper bound the cost of an online algorithm and then pairs that…
▽ More
Online caching is among the most fundamental and well-studied problems in the area of online algorithms. Innovative algorithmic ideas and analysis -- including potential functions and primal-dual techniques -- give insight into this still-growing area. Here, we introduce a new analysis technique that first uses a potential function to upper bound the cost of an online algorithm and then pairs that with a new dual-fitting strategy to lower bound the cost of an offline optimal algorithm. We apply these techniques to the Caching with Reserves problem recently introduced by Ibrahimpur et al. [10] and give an O(log k)-competitive fractional online algorithm via a marking strategy, where k denotes the size of the cache. We also design a new online rounding algorithm that runs in polynomial time to obtain an O(log k)-competitive randomized integral algorithm. Additionally, we provide a new, simple proof for randomized marking for the classical unweighted paging problem.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Caching with Reserves
Authors:
Sharat Ibrahimpur,
Manish Purohit,
Zoya Svitkina,
Erik Vee,
Joshua Wang
Abstract:
Caching is a crucial component of many computer systems, so naturally it is a well-studied topic in algorithm design. Much of traditional caching research studies cache management for a single-user or single-processor environment. In this paper, we propose two related generalizations of the classical caching problem that capture issues that arise in a multi-user or multi-processor environment. In…
▽ More
Caching is a crucial component of many computer systems, so naturally it is a well-studied topic in algorithm design. Much of traditional caching research studies cache management for a single-user or single-processor environment. In this paper, we propose two related generalizations of the classical caching problem that capture issues that arise in a multi-user or multi-processor environment. In the caching with reserves problem, a caching algorithm is required to maintain at least $k_i$ pages belonging to user $i$ in the cache at any time, for some given reserve capacities $k_i$. In the public-private caching problem, the cache of total size $k$ is partitioned into subcaches, a private cache of size $k_i$ for each user $i$ and a shared public cache usable by any user. In both of these models, as in the classical caching framework, the objective of the algorithm is to dynamically maintain the cache so as to minimize the total number of cache misses.
We show that caching with reserves and public-private caching models are equivalent up to constant factors, and thus focus on the former. Unlike classical caching, both of these models turn out to be NP-hard even in the offline setting, where the page sequence is known in advance. For the offline setting, we design a 2-approximation algorithm, whose analysis carefully keeps track of a potential function to bound the cost. In the online setting, we first design an $O(\ln k)$-competitive fractional algorithm using the primal-dual framework, and then show how to convert it online to a randomized integral algorithm with the same guarantee.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Scheduling with Communication Delay in Near-Linear Time
Authors:
Quanquan C. Liu,
Manish Purohit,
Zoya Svitkina,
Erik Vee,
Joshua R. Wang
Abstract:
We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. Such precedence-constrained jobs can be modeled as a directed acyclic graph, $G = (V, E)$. In this setting, if two precedence-constrained jobs $u$ and $v$, with $v$ dependent on $u$ ($u \prec v$), are scheduled on different machines, th…
▽ More
We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. Such precedence-constrained jobs can be modeled as a directed acyclic graph, $G = (V, E)$. In this setting, if two precedence-constrained jobs $u$ and $v$, with $v$ dependent on $u$ ($u \prec v$), are scheduled on different machines, then $v$ must start at least $ρ$ time units after $u$ completes. The scheduling objective is to minimize makespan, i.e. the total time from when the first job starts to when the last job finishes. The focus of this paper is to provide an efficient approximation algorithm with near-linear running time. We build on the algorithm of Lepere and Rapine [STACS 2002] for this problem to give an $O\left(\frac{\ln ρ}{\ln \ln ρ} \right)$-approximation algorithm that runs in $\tilde{O}(|V| + |E|)$ time.
△ Less
Submitted 29 January, 2022; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Scheduling Precedence-Constrained Jobs on Related Machines with Communication Delay
Authors:
Biswaroop Maiti,
Rajmohan Rajaraman,
David Stalfa,
Zoya Svitkina,
Aravindan Vijayaraghavan
Abstract:
We consider the problem of scheduling $n$ precedence-constrained jobs on $m$ uniformly-related machines in the presence of an arbitrary, fixed communication delay $ρ$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an…
▽ More
We consider the problem of scheduling $n$ precedence-constrained jobs on $m$ uniformly-related machines in the presence of an arbitrary, fixed communication delay $ρ$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an $O(\log m \log ρ/ \log \log ρ)$-approximation algorithm for minimizing makespan, assuming the minimum makespan is at least $ρ$. Our algorithm is based on rounding a linear programming relaxation for the problem, which includes carefully designed constraints capturing the interaction among communication delay, precedence requirements, varying speeds, and job duplication. Our result builds on two previous lines of work, one with communication delay but identical machines (Lepere, Rapine 2002) and the other with uniformly-related machines but no communication delay (Chudak, Shmoys 1999). We next show that the integrality gap of our mathematical program is $Ω(\sqrt{\log ρ})$. Our gap construction employs expander graphs and exploits a property of robust expansion and its generalization to paths of longer length. Finally, we quantify the advantage of duplication in scheduling with communication delay. We show that the best schedule without duplication can have makespan $Ω(ρ/\log ρ)$ or $Ω(\log m/\log\log m)$ or $Ω(\log n/\log \log n)$ times that of an optimal schedule allowing duplication. Nevertheless, we present a polynomial time algorithm to transform any schedule to a schedule without duplication at the cost of a $O(\log^2 n \log m)$ factor increase in makespan. Together with our makespan approximation algorithm for schedules allowing duplication, this also yields a polylogarithmic-approximation algorithm for the setting where duplication is not allowed.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Semi-Online Bipartite Matching
Authors:
Ravi Kumar,
Manish Purohit,
Aaron Schild,
Zoya Svitkina,
Erik Vee
Abstract:
In this paper we introduce the \emph{semi-online} model that generalizes the classical online computational model. The semi-online model postulates that the unknown future has a predictable part and an adversarial part; these parts can be arbitrarily interleaved. An algorithm in this model operates as in the standard online model, i.e., makes an irrevocable decision at each step.
We consider bip…
▽ More
In this paper we introduce the \emph{semi-online} model that generalizes the classical online computational model. The semi-online model postulates that the unknown future has a predictable part and an adversarial part; these parts can be arbitrarily interleaved. An algorithm in this model operates as in the standard online model, i.e., makes an irrevocable decision at each step.
We consider bipartite matching in the semi-online model, for both integral and fractional cases. Our main contributions are competitive algorithms for this problem that are close to or match a hardness bound. The competitive ratio of the algorithms nicely interpolates between the truly offline setting (no adversarial part) and the truly online setting (no predictable part).
△ Less
Submitted 4 September, 2019; v1 submitted 30 November, 2018;
originally announced December 2018.
-
Asymmetric Traveling Salesman Path and Directed Latency Problems
Authors:
Zachary Friggstad,
Mohammad R. Salavatipour,
Zoya Svitkina
Abstract:
We study integrality gaps and approximability of two closely related problems on directed graphs. Given a set V of n nodes in an underlying asymmetric metric and two specified nodes s and t, both problems ask to find an s-t path visiting all other nodes. In the asymmetric traveling salesman path problem (ATSPP), the objective is to minimize the total cost of this path. In the directed latency prob…
▽ More
We study integrality gaps and approximability of two closely related problems on directed graphs. Given a set V of n nodes in an underlying asymmetric metric and two specified nodes s and t, both problems ask to find an s-t path visiting all other nodes. In the asymmetric traveling salesman path problem (ATSPP), the objective is to minimize the total cost of this path. In the directed latency problem, the objective is to minimize the sum of distances on this path from s to each node. Both of these problems are NP-hard. The best known approximation algorithms for ATSPP had ratio O(log n) until the very recent result that improves it to O(log n/ log log n). However, only a bound of O(sqrt(n)) for the integrality gap of its linear programming relaxation has been known. For directed latency, the best previously known approximation algorithm has a guarantee of O(n^(1/2+eps)), for any constant eps > 0. We present a new algorithm for the ATSPP problem that has an approximation ratio of O(log n), but whose analysis also bounds the integrality gap of the standard LP relaxation of ATSPP by the same factor. This solves an open problem posed by Chekuri and Pal [2007]. We then pursue a deeper study of this linear program and its variations, which leads to an algorithm for the k-person ATSPP (where k s-t paths of minimum total length are sought) and an O(log n)-approximation for the directed latency problem.
△ Less
Submitted 1 June, 2010; v1 submitted 3 July, 2009;
originally announced July 2009.
-
Submodular approximation: sampling-based algorithms and lower bounds
Authors:
Zoya Svitkina,
Lisa Fleischer
Abstract:
We introduce several generalizations of classical computer science problems obtained by replacing simpler objective functions with general submodular functions. The new problems include submodular load balancing, which generalizes load balancing or minimum-makespan scheduling, submodular sparsest cut and submodular balanced cut, which generalize their respective graph cut problems, as well as subm…
▽ More
We introduce several generalizations of classical computer science problems obtained by replacing simpler objective functions with general submodular functions. The new problems include submodular load balancing, which generalizes load balancing or minimum-makespan scheduling, submodular sparsest cut and submodular balanced cut, which generalize their respective graph cut problems, as well as submodular function minimization with a cardinality lower bound. We establish upper and lower bounds for the approximability of these problems with a polynomial number of queries to a function-value oracle. The approximation guarantees for most of our algorithms are of the order of sqrt(n/ln n). We show that this is the inherent difficulty of the problems by proving matching lower bounds. We also give an improved lower bound for the problem of approximately learning a monotone submodular function. In addition, we present an algorithm for approximately learning submodular functions with special structure, whose guarantee is close to the lower bound. Although quite restrictive, the class of functions with this structure includes the ones that are used for lower bounds both by us and in previous work. This demonstrates that if there are significantly stronger lower bounds for this problem, they rely on more general submodular functions.
△ Less
Submitted 31 May, 2010; v1 submitted 7 May, 2008;
originally announced May 2008.
-
Stochastic Models for Budget Optimization in Search-Based Advertising
Authors:
S. Muthukrishnan,
Martin Pal,
Zoya Svitkina
Abstract:
Internet search companies sell advertisement slots based on users' search queries via an auction. Advertisers have to determine how to place bids on the keywords of their interest in order to maximize their return for a given budget: this is the budget optimization problem. The solution depends on the distribution of future queries.
In this paper, we formulate stochastic versions of the budget…
▽ More
Internet search companies sell advertisement slots based on users' search queries via an auction. Advertisers have to determine how to place bids on the keywords of their interest in order to maximize their return for a given budget: this is the budget optimization problem. The solution depends on the distribution of future queries.
In this paper, we formulate stochastic versions of the budget optimization problem based on natural probabilistic models of distribution over future queries, and address two questions that arise.
[Evaluation] Given a solution, can we evaluate the expected value of the objective function?
[Optimization] Can we find a solution that maximizes the objective function in expectation?
Our main results are approximation and complexity results for these two problems in our three stochastic models. In particular, our algorithmic results show that simple prefix strategies that bid on all cheap keywords up to some level are either optimal or good approximations for many cases; we show other cases to be NP-hard.
△ Less
Submitted 24 September, 2007; v1 submitted 14 December, 2006;
originally announced December 2006.
-
On the Complexity of Processing Massive, Unordered, Distributed Data
Authors:
Jon Feldman,
S. Muthukrishnan,
Anastasios Sidiropoulos,
Cliff Stein,
Zoya Svitkina
Abstract:
An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the data with a processor is prohibitive. Successful log processing systems in practice such as Google's MapReduce and Apache's Hadoop use multiple machines. They…
▽ More
An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the data with a processor is prohibitive. Successful log processing systems in practice such as Google's MapReduce and Apache's Hadoop use multiple machines. They efficiently perform a certain class of highly distributable computations defined by local computations that can be applied in any order to the input.
Motivated by the success of these systems, we introduce a simple algorithmic model for massive, unordered, distributed (mud) computation. We initiate the study of understanding its computational complexity. Our main result is a positive one: any unordered function that can be computed by a streaming algorithm can also be computed with a mud algorithm, with comparable space and communication complexity. We extend this result to some useful classes of approximate and randomized streaming algorithms. We also give negative results, using communication complexity arguments to prove that extensions to private randomness, promise problems and indeterminate functions are impossible.
We believe that the line of research we introduce in this paper has the potential for tremendous impact. The distributed systems that motivate our work successfully process data at an unprecedented scale, distributed over hundreds or even thousands of machines, and perform hundreds of such analyses each day. The mud model (and its generalizations) inspire a set of complexity-theoretic questions that lie at their heart.
△ Less
Submitted 22 May, 2007; v1 submitted 21 November, 2006;
originally announced November 2006.