Skip to main content

Showing 1–22 of 22 results for author: Tarnawski, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08917  [pdf, other

    cs.CR cs.DS cs.LG

    Efficiently Computing Similarities to Private Datasets

    Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski

    Abstract: Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) da… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: To appear at ICLR 2024

  2. arXiv:2403.01876  [pdf, other

    cs.DC

    DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

    Authors: Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic

    Abstract: Distributed LLM serving is costly and often underutilizes hardware accelerators due to three key challenges: bubbles in pipeline-parallel deployments caused by the bimodal latency of prompt and token processing, GPU memory overprovisioning, and long recovery times in case of failures. In this paper, we propose DéjàVu, a system to address all these challenges using a versatile and efficient KV cach… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2312.14299  [pdf, ps, other

    cs.LG cs.CY cs.DM cs.DS math.CO math.OC

    Fairness in Submodular Maximization over a Matroid Constraint

    Authors: Marwa El Halabi, Jakub Tarnawski, Ashkan Norouzi-Fard, Thuy-Duong Vuong

    Abstract: Submodular maximization over a matroid constraint is a fundamental problem with various applications in machine learning. Some of these applications involve decision-making over datapoints with sensitive attributes such as gender or race. In such settings, it is crucial to guarantee that the selected solution is fairly distributed with respect to this attribute. Recently, fairness has been investi… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  4. arXiv:2305.15118  [pdf, other

    cs.LG cs.CY cs.DS

    Fairness in Streaming Submodular Maximization over a Matroid Constraint

    Authors: Marwa El Halabi, Federico Fusco, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

    Abstract: Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in develo** fair machine learning algorithms. Recently, such algorithms have been develope… ▽ More

    Submitted 19 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to ICML 23

  5. arXiv:2203.01440  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Near-Optimal Correlation Clustering with Privacy

    Authors: Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

    Abstract: Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes'… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  6. Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers

    Authors: Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, Nam Sung Kim

    Abstract: Deep neural networks (DNNs) have grown exponentially in size over the past decade, leaving only those who have massive datacenter-based resources with the ability to develop and train such models. One of the main challenges for the long tail of researchers who might have only limited resources (e.g., a single multi-GPU server) is limited GPU memory capacity compared to model size. The problem is s… ▽ More

    Submitted 1 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted at VLDB 2022

  7. arXiv:2111.00721  [pdf, ps, other

    cs.DS

    Online Edge Coloring via Tree Recurrences and Correlation Decay

    Authors: Janardhan Kulkarni, Yang P. Liu, Ashwin Sah, Mehtaab Sawhney, Jakub Tarnawski

    Abstract: We give an online algorithm that with high probability computes a $\left(\frac{e}{e-1} + o(1)\right)Δ$ edge coloring on a graph $G$ with maximum degree $Δ= ω(\log n)$ under online edge arrivals against oblivious adversaries, making first progress on the conjecture of Bar-Noy, Motwani, and Naor in this general setting. Our algorithm is based on reducing to a matching problem on locally treelike gra… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: 22 pages, 1 figure

  8. arXiv:2106.08448  [pdf, other

    cs.DS cs.DC cs.LG

    Correlation Clustering in Constant Many Parallel Rounds

    Authors: Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

    Abstract: Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular,… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ICML 2021 (long talk)

  9. arXiv:2105.00111  [pdf, ps, other

    cs.DS

    On the Hardness of Scheduling With Non-Uniform Communication Delays

    Authors: Sami Davies, Janardhan Kulkarni, Thomas Rothvoss, Sai Sandeep, Jakub Tarnawski, Yihao Zhang

    Abstract: In the scheduling with non-uniform communication delay problem, the input is a set of jobs with precedence constraints. Associated with every precedence constraint between a pair of jobs is a communication delay, the time duration the scheduler has to wait between the two jobs if they are scheduled on different machines. The objective is to assign the jobs to machines to minimize the makespan of t… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  10. arXiv:2010.07431  [pdf, other

    cs.LG cs.DS

    Fairness in Streaming Submodular Maximization: Algorithms and Hardness

    Authors: Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

    Abstract: Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algori… ▽ More

    Submitted 18 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted to NeurIPS 2020

  11. arXiv:2006.16423  [pdf, other

    cs.LG cs.DC stat.ML

    Efficient Algorithms for Device Placement of DNN Graph Operators

    Authors: Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino

    Abstract: Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020

  12. Fully Dynamic Algorithm for Constrained Submodular Optimization

    Authors: Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakub Tarnawski, Morteza Zadimoghaddam

    Abstract: The task of maximizing a monotone submodular function under a cardinality constraint is at the core of many machine learning and data mining applications, including data summarization, sparse regression and coverage problems. We study this classic problem in the fully dynamic setting, where elements can be both inserted and removed. Our main result is a randomized algorithm that maintains an effic… ▽ More

    Submitted 24 May, 2023; v1 submitted 8 June, 2020; originally announced June 2020.

    Journal ref: NeurIPS 2020

  13. Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

    Authors: Janardhan Kulkarni, Shi Li, Jakub Tarnawski, Minwei Ye

    Abstract: We consider the classic problem of scheduling jobs with precedence constraints on a set of identical machines to minimize the makespan objective function. Understanding the exact approximability of the problem when the number of machines is a constant is a well-known question in scheduling theory. Indeed, an outstanding open problem from the classic book of Garey and Johnson asks whether this prob… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Journal ref: Proc. of ACM-SIAM Symposium on Discrete Algorithms (SODA), 2020, pages 2770-2789

  14. arXiv:2004.09682  [pdf, ps, other

    cs.DS

    Scheduling with Communication Delays via LP Hierarchies and Clustering

    Authors: Sami Davies, Janardhan Kulkarni, Thomas Rothvoss, Jakub Tarnawski, Yihao Zhang

    Abstract: We consider the classic problem of scheduling jobs with precedence constraints on identical machines to minimize makespan, in the presence of communication delays. In this setting, denoted by $\mathsf{P} \mid \mathsf{prec}, c \mid C_{\mathsf{max}}$, if two dependent jobs are scheduled on different machines, then at least $c$ units of time must pass between their executions. Despite its relevance t… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  15. arXiv:1808.01842  [pdf, other

    cs.LG stat.ML

    Beyond $1/2$-Approximation for Submodular Maximization on Massive Data Streams

    Authors: Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrović, Amir Zandieh, Aida Mousavifar, Ola Svensson

    Abstract: Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Journal ref: Proc. of 35th International Conference on Machine Learning (ICML), 2018, pages 3829-3838

  16. arXiv:1711.02598  [pdf, other

    cs.DS stat.ML

    Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach

    Authors: Slobodan Mitrović, Ilija Bogunovic, Ashkan Norouzi-Fard, Jakub Tarnawski, Volkan Cevher

    Abstract: We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

    Comments: To appear in NIPS 2017

    Journal ref: Proc. of 30th Advances in Neural Information Processing Systems (NIPS) 2017, pages 4558-4567

  17. arXiv:1708.04215  [pdf, ps, other

    cs.DS

    A Constant-Factor Approximation Algorithm for the Asymmetric Traveling Salesman Problem

    Authors: Ola Svensson, Jakub Tarnawski, László A. Végh

    Abstract: We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem (ATSP). Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation. The main idea of our approach is a reduction to Subtour Partition Cover, an easier problem obtained by significantly rel… ▽ More

    Submitted 15 September, 2020; v1 submitted 14 August, 2017; originally announced August 2017.

    Comments: This is an extended version of the paper also incorporating the results of the paper arxiv:1502.02051

  18. The Matching Problem in General Graphs is in Quasi-NC

    Authors: Ola Svensson, Jakub Tarnawski

    Abstract: We show that the perfect matching problem in general graphs is in Quasi-NC. That is, we give a deterministic parallel algorithm which runs in $O(\log^3 n)$ time on $n^{O(\log^2 n)}$ processors. The result is obtained by a derandomization of the Isolation Lemma for perfect matchings, which was introduced in the classic paper by Mulmuley, Vazirani and Vazirani [1987] to obtain a Randomized NC algori… ▽ More

    Submitted 4 September, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

    Comments: Accepted to FOCS 2017 (58th Annual IEEE Symposium on Foundations of Computer Science)

    Journal ref: Proc. of 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2017, pages 696-707

  19. Active Learning and Proofreading for Delineation of Curvilinear Structures

    Authors: Agata Mosinska, Jakub Tarnawski, Pascal Fua

    Abstract: Many state-of-the-art delineation methods rely on supervised machine learning algorithms. As a result, they require manually annotated training data, which is tedious to obtain. Furthermore, even minor classification errors may significantly affect the topology of the final result. In this paper we propose a generic approach to addressing both of these problems by taking into account the influence… ▽ More

    Submitted 13 March, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

    Comments: extended version with fast reconstruction

    Journal ref: Proc. of Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017, pages 165-173

  20. Unrelated Machine Scheduling of Jobs with Uniform Smith Ratios

    Authors: Christos Kalaitzis, Ola Svensson, Jakub Tarnawski

    Abstract: We consider the classic problem of scheduling jobs on unrelated machines so as to minimize the weighted sum of completion times. Recently, for a small constant $\varepsilon >0 $, Bansal et al. gave a $(3/2-\varepsilon)$-approximation algorithm improving upon the natural barrier of $3/2$ which follows from independent randomized rounding. In simplified terms, their result is obtained by an enhancem… ▽ More

    Submitted 3 November, 2016; v1 submitted 26 July, 2016; originally announced July 2016.

    Comments: Accepted to ACM-SIAM Symposium on Discrete Algorithms (SODA) 2017

    Journal ref: Proc. of 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2017, pages 2654-2669

  21. Constant Factor Approximation for ATSP with Two Edge Weights

    Authors: Ola Svensson, Jakub Tarnawski, László A. Végh

    Abstract: We give a constant factor approximation algorithm for the Asymmetric Traveling Salesman Problem on shortest path metrics of directed graphs with two different edge weights. For the case of unit edge weights, the first constant factor approximation was given recently by Svensson. This was accomplished by introducing an easier problem called Local-Connectivity ATSP and showing that a good solution t… ▽ More

    Submitted 4 September, 2017; v1 submitted 22 November, 2015; originally announced November 2015.

    Journal ref: Proc. of Integer Programming and Combinatorial Optimization: 18th International Conference, IPCO 2016, pages 226-237

  22. Fast Generation of Random Spanning Trees and the Effective Resistance Metric

    Authors: Aleksander Madry, Damian Straszak, Jakub Tarnawski

    Abstract: We present a new algorithm for generating a uniformly random spanning tree in an undirected graph. Our algorithm samples such a tree in expected $\tilde{O}(m^{4/3})$ time. This improves over the best previously known bound of $\min(\tilde{O}(m\sqrt{n}),O(n^ω))$ -- that follows from the work of Kelner and Mądry [FOCS'09] and of Colbourn et al. [J. Algorithms'96] -- whenever the input graph is suffi… ▽ More

    Submitted 1 January, 2015; originally announced January 2015.

    Journal ref: Proc. of 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2015, pages 2019-2036