Skip to main content

Showing 1–25 of 25 results for author: Harchol-Balter, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15560  [pdf, other

    cs.DC cs.PF

    How to Rent GPUs on a Budget

    Authors: Zhouzi Li, Benjamin Berg, Arpan Mukhopadhyay, Mor Harchol-Balter

    Abstract: The explosion in Machine Learning (ML) over the past ten years has led to a dramatic increase in demand for GPUs to train ML models. Because it is prohibitively expensive for most users to build and maintain a large GPU cluster, large cloud providers (Microsoft Azure, Amazon AWS, Google Cloud) have seen explosive growth in demand for renting cloud-based GPUs. In this cloud-computing paradigm, a us… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2405.04102  [pdf, ps, other

    cs.PF math.PR

    Analysis of Markovian Arrivals and Service with Applications to Intermittent Overload

    Authors: Isaac Grosof, Yige Hong, Mor Harchol-Balter

    Abstract: Almost all queueing analysis assumes i.i.d. arrivals and service. In reality, arrival and service rates fluctuate over time. In particular, it is common for real systems to intermittently experience overload, where the arrival rate temporarily exceeds the service rate, which an i.i.d. model cannot capture. We consider the MAMS system, where the arrival and service rates each vary according to an a… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 27 pages

  3. arXiv:2404.16219  [pdf, other

    cs.PF

    Can Increasing the Hit Ratio Hurt Cache Throughput?

    Authors: Ziyue Qiu, Juncheng Yang, Mor Harchol-Balter

    Abstract: Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2404.00346  [pdf, other

    cs.PF cs.DC

    Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes

    Authors: Benjamin Berg, Benjamin Moseley, Weina Wang, Mor Harchol-Balter

    Abstract: Many modern computing workloads are composed of parallelizable jobs. A single parallelizable job can be completed more quickly if it is run on additional servers, however each job is typically limited in the number of servers it can run on (its parallelizability level). A job's parallelizability level is determined by the type of computation the job performs and how it was implemented. As a result… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  5. arXiv:2310.01621  [pdf, ps, other

    cs.PF

    The RESET and MARC Techniques, with Application to Multiserver-Job Analysis

    Authors: Isaac Grosof, Yige Hong, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: Multiserver-job (MSJ) systems, where jobs need to run concurrently across many servers, are increasingly common in practice. The default service ordering in many settings is First-Come First-Served (FCFS) service. Virtually all theoretical work on MSJ FCFS models focuses on characterizing the stability region, with almost nothing known about mean response time. We derive the first explicit chara… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 39 pages, IFIP Performance 2023

  6. Optimal Scheduling in the Multiserver-job Model under Heavy Traffic

    Authors: Isaac Grosof, Ziv Scully, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 32 pages, to appear in ACM SIGMETRICS 2023

  7. arXiv:2111.10703  [pdf, other

    math.PR cs.PF

    The Gittins Policy in the M/G/1 Queue

    Authors: Ziv Scully, Mor Harchol-Balter

    Abstract: The Gittins policy is a highly general scheduling policy that minimizes a wide variety of mean holding cost metrics in the M/G/1 queue. Perhaps most famously, Gittins minimizes mean response time in the M/G/1 when jobs' service times are unknown to the scheduler. Gittins also minimizes weighted versions of mean response time. For example, the well-known "$cμ$ rule", which minimizes class-weighted… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: Originally published at WiOpt 2021; this extended and revised version includes additional discussion throughout and fixes a minor error in Section VII

  8. arXiv:2110.11579  [pdf, other

    cs.PF

    How to Schedule Near-Optimally under Real-World Constraints

    Authors: Ziv Scully, Mor Harchol-Balter

    Abstract: Scheduling is a critical part of practical computer systems, and scheduling has also been extensively studied from a theoretical perspective. Unfortunately, there is a gap between theory and practice, as the optimal scheduling policies presented by theory can be difficult or impossible to perfectly implement in practice. In this work, we use recent breakthroughs in queueing theory to begin to brid… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  9. Computing the Death Rate of COVID-19

    Authors: Naveen Pai, Sean Zhang, Mor Harchol-Balter

    Abstract: The Infection Fatality Rate (IFR) of COVID-19 is difficult to estimate because the number of infections is unknown and there is a lag between each infection and the potentially subsequent death. We introduce a new approach for estimating the IFR by first estimating the entire sequence of daily infections. Unlike prior approaches, we incorporate existing data on the number of daily COVID-19 tests i… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Journal ref: Computer Science Protecting Human Society Against Epidemics. ANTICOVID 2021. IFIP Advances in Information and Communication Technology, vol 616. Springer, Cham

  10. arXiv:2109.12663  [pdf, other

    cs.PF

    WCFS: A new framework for analyzing multiserver systems

    Authors: Isaac Grosof, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time. We do so by introducing the work-conserving finite-skip (WCFS) framework, which encompasses a… ▽ More

    Submitted 12 June, 2022; v1 submitted 26 September, 2021; originally announced September 2021.

    Comments: 29 pages. Under submission

  11. Nudge: Stochastically Improving upon FCFS

    Authors: Isaac Grosof, Kunhe Yang, Ziv Scully, Mor Harchol-Balter

    Abstract: The First-Come First-Served (FCFS) scheduling policy is the most popular scheduling algorithm used in practice. Furthermore, its usage is theoretically validated: for light-tailed job size distributions, FCFS has weakly optimal asymptotic tail of response time. But what if we don't just care about the asymptotic tail? What if we also care about the 99th percentile of response time, or the fraction… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 29 pages, 4 figures. To appear in SIGMETRICS 2021

  12. arXiv:2011.10521  [pdf, other

    cs.PF math.PR

    Zero Queueing for Multi-Server Jobs

    Authors: Weina Wang, Qiaomin Xie, Mor Harchol-Balter

    Abstract: Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job-per-server model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. Fro… ▽ More

    Submitted 4 February, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  13. heSRPT: Parallel Scheduling to Minimize Mean Slowdown

    Authors: Benjamin Berg, Rein Vesilo, Mor Harchol-Balter

    Abstract: Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs.… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1903.09346

    Journal ref: Performance Evaluation (2020) 102147

  14. arXiv:2010.00631  [pdf, other

    cs.PF

    Stability for Two-class Multiserver-job Systems

    Authors: Isaac Grosof, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Much is known in the drop** setting, where jobs are immediately discarded if they require more servers than are currently available. However, very little is known in the more practical setting where jobs queue instead. In this paper, we derive a closed-form analytical expression for the st… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 29 pages, 8 figures

  15. arXiv:2005.09745  [pdf, ps, other

    cs.PF

    Optimal Resource Allocation for Elastic and Inelastic Jobs

    Authors: Benjamin Berg, Mor Harchol-Balter, Benjamin Moseley, Weina Wang, Justin Whitehouse

    Abstract: Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  16. arXiv:2003.13232  [pdf, ps, other

    cs.PF math.PR

    Optimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traffic

    Authors: Ziv Scully, Isaac Grosof, Mor Harchol-Balter

    Abstract: We consider scheduling to minimize mean response time of the M/G/k queue with unknown job sizes. In the single-server case, the optimal policy is the Gittins policy, but it is not known whether Gittins or any other policy is optimal in the multiserver case. Exactly analyzing the M/G/k under any scheduling policy is intractable, and Gittins is a particularly complicated policy that is hard to analy… ▽ More

    Submitted 26 October, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

  17. Simple Near-Optimal Scheduling for the M/G/1

    Authors: Ziv Scully, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When we know each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do not have access to each job's size. Instead, we know only the job size distribution. In this setting the Gittins policy is known to minimize mean response time,… ▽ More

    Submitted 22 January, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: POMACS, 2020 (SIGMETRICS 2020 issue)

  18. arXiv:1905.03439  [pdf, other

    cs.PF

    Load Balancing Guardrails: Kee** Your Heavy Traffic on the Road to Low Response Times

    Authors: Isaac Grosof, Ziv Scully, Mor Harchol-Balter

    Abstract: Load balancing systems, comprising a central dispatcher and a scheduling policy at each server, are widely used in practice, and their response time has been extensively studied in the theoretical literature. While much is known about the scenario where the scheduling at the servers is First-Come-First-Served (FCFS), to minimize mean response time we must use Shortest-Remaining-Processing-Time (SR… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: 31 pages. To appear in ACM SIGMETRICS 2019

  19. arXiv:1903.09346  [pdf, other

    cs.PF

    heSRPT: Optimal Parallel Scheduling of Jobs With Known Sizes

    Authors: Benjamin Berg, Rein Vesilo, Mor Harchol-Balter

    Abstract: When parallelizing a set of jobs across many servers, one must balance a trade-off between granting priority to short jobs and maintaining the overall efficiency of the system. When the goal is to minimize the mean flow time of a set of jobs, it is usually the case that one wants to complete short jobs before long jobs. However, since jobs usually cannot be parallelized with perfect efficiency, gr… ▽ More

    Submitted 20 November, 2020; v1 submitted 21 March, 2019; originally announced March 2019.

  20. arXiv:1805.07686  [pdf, ps, other

    cs.PF

    SRPT for Multiserver Systems

    Authors: Isaac Grosof, Ziv Scully, Mor Harchol-Balter

    Abstract: The Shortest Remaining Processing Time (SRPT) scheduling policy and its variants have been extensively studied in both theoretical and practical settings. While beautiful results are known for single-server SRPT, much less is known for multiserver SRPT. In particular, stochastic analysis of the M/G/k under multiserver SRPT is entirely open. Intuition suggests that multiserver SRPT should be optima… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

    Comments: 15 pages. Submitted to IFIP Performance 2018

  21. arXiv:1805.06865  [pdf, other

    cs.PF math.OC

    Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs

    Authors: Ziv Scully, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) policy is… ▽ More

    Submitted 12 November, 2018; v1 submitted 17 May, 2018; originally announced May 2018.

  22. arXiv:1712.00790  [pdf, ps, other

    cs.PF

    SOAP: One Clean Analysis of All Age-Based Scheduling Policies

    Authors: Ziv Scully, Mor Harchol-Balter, Alan Scheller-Wolf

    Abstract: We consider an extremely broad class of M/G/1 scheduling policies called SOAP: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe not even conceived. SOAP policies range from classic policies, like first-come, first-serve (FCFS), foreground-background (… ▽ More

    Submitted 17 February, 2018; v1 submitted 3 December, 2017; originally announced December 2017.

  23. Practical Bounds on Optimal Caching with Variable Object Sizes

    Authors: Daniel S. Berger, Nathan Beckmann, Mor Harchol-Balter

    Abstract: Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners of how much further miss ratios can be improved. In other words, should the systems community continue working on this problem? Currently, there is no principled answer to this question. In practice, object sizes often vary by several orders of magnitude, where computing the optimal miss ratio (O… ▽ More

    Submitted 5 July, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

    Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems, Article 32, Volume 2, Issue 2, June 2018

  24. arXiv:1710.00296  [pdf, other

    cs.PF

    Delay Asymptotics and Bounds for Multi-Task Parallel Jobs

    Authors: Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, R. Srikant

    Abstract: We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem ha… ▽ More

    Submitted 15 September, 2018; v1 submitted 1 October, 2017; originally announced October 2017.

  25. Towards Optimality in Parallel Scheduling

    Authors: Benjamin Berg, Jan-Pieter Dorsman, Mor Harchol-Balter

    Abstract: To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-core chips, one must address the question of how many cores to assign to each job. Given that jobs receive sublinear speedups from additional cor… ▽ More

    Submitted 31 October, 2017; v1 submitted 21 July, 2017; originally announced July 2017.

    Journal ref: Proc. ACM Meas. Anal. Comput. Syst. 1, 2, Article 40 (December 2017), 30 pages