-
Rightsizing Clusters for Time-Limited Tasks
Authors:
Venkatesan T. Chakaravarthy,
Padmanabha V. Seshadri,
Pooja Aggarwal,
Anamitra R. Choudhury,
Ashok Pon Kumar,
Yogish Sabharwal,
Amith Singhee
Abstract:
In conventional public clouds, designing a suitable initial cluster for a given application workload is important in reducing the computational foot-print during run-time. In edge or on-premise clouds, cold-start rightsizing the cluster at the time of installation is crucial in avoiding the recurrent capital expenditure. In both these cases, rightsizing has to balance cost-performance trade-off fo…
▽ More
In conventional public clouds, designing a suitable initial cluster for a given application workload is important in reducing the computational foot-print during run-time. In edge or on-premise clouds, cold-start rightsizing the cluster at the time of installation is crucial in avoiding the recurrent capital expenditure. In both these cases, rightsizing has to balance cost-performance trade-off for a given application with multiple tasks, where each task can demand multiple resources, and the cloud offers nodes with different capacity and cost. Multidimensional bin-packing can address this cold-start rightsizing problem, but assumes that every task is always active. In contrast, real-world tasks (e.g. load bursts, batch and dead-lined tasks with time-limits) may be active only during specific time-periods or may have dynamic load profiles. The cluster cost can be reduced by reusing resources via time sharing and optimal packing. This motivates our generalized problem of cold-start rightsizing for time-limited tasks: given a timeline, time-periods and resource demands for tasks, the objective is to place the tasks on a minimum cost cluster of nodes without violating node capacities at any time instance. We design a baseline two-phase algorithm that performs penalty-based map** of task to node-type and then, solves each node-type independently. We prove that the algorithm has an approximation ratio of O(D min(m, T)), where D, m and T are the number of resources, node-types and timeslots, respectively. We then present an improved linear programming based map** strategy, enhanced further with a cross-node-type filling mechanism. Our experiments on synthetic and real-world cluster traces show significant cost reduction by LP-based map** compared to the baseline, and the filling mechanism improves further to produce solutions within 20% of (a lower-bound to) the optimal solution.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Authors:
Saurabh Goyal,
Anamitra R. Choudhury,
Saurabh M. Raje,
Venkatesan T. Chakaravarthy,
Yogish Sabharwal,
Ashish Verma
Abstract:
We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by develo** a strategy for measuring their significance, based on the self-att…
▽ More
We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by develo** a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT.
△ Less
Submitted 8 September, 2020; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Efficient Inferencing of Compressed Deep Neural Networks
Authors:
Dharma Teja Vooturi,
Saurabh Goyal,
Anamitra R. Choudhury,
Yogish Sabharwal,
Ashish Verma
Abstract:
Large number of weights in deep neural networks makes the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as "inferencing as a service" environments on cloud. Prior work has considered reduction in the size of the models, through compression techniques like pruning, quantization, Huffman encoding etc. However, efficient inferencing using…
▽ More
Large number of weights in deep neural networks makes the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as "inferencing as a service" environments on cloud. Prior work has considered reduction in the size of the models, through compression techniques like pruning, quantization, Huffman encoding etc. However, efficient inferencing using the compressed models has received little attention, specially with the Huffman encoding in place. In this paper, we propose efficient parallel algorithms for inferencing of single image and batches, under various memory constraints. Our experimental results show that our approach of using variable batch size for inferencing achieves 15-25\% performance improvement in the inference throughput for AlexNet, while maintaining memory and latency constraints.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
Rejecting Jobs to Minimize Load and Maximum Flow-time
Authors:
Anamitra Roy Choudhury,
Syamantak Das,
Naveen Garg,
Amit Kumar
Abstract:
Online algorithms are usually analyzed using the notion of competitive ratio which compares the solution obtained by the algorithm to that obtained by an online adversary for the worst possible input sequence. Often this measure turns out to be too pessimistic, and one popular approach especially for scheduling problems has been that of "resource augmentation" which was first proposed by Kalyanasu…
▽ More
Online algorithms are usually analyzed using the notion of competitive ratio which compares the solution obtained by the algorithm to that obtained by an online adversary for the worst possible input sequence. Often this measure turns out to be too pessimistic, and one popular approach especially for scheduling problems has been that of "resource augmentation" which was first proposed by Kalyanasundaram and Pruhs. Although resource augmentation has been very successful in dealing with a variety of objective functions, there are problems for which even a (arbitrary) constant speedup cannot lead to a constant competitive algorithm. In this paper we propose a "rejection model" which requires no resource augmentation but which permits the online algorithm to not serve an epsilon-fraction of the requests.
The problems considered in this paper are in the restricted assignment setting where each job can be assigned only to a subset of machines. For the load balancing problem where the objective is to minimize the maximum load on any machine, we give $O(\log^2 1/\eps)$-competitive algorithm which rejects at most an $\eps$-fraction of the jobs. For the problem of minimizing the maximum weighted flow-time, we give an $O(1/\eps^4)$-competitive algorithm which can reject at most an $\eps$-fraction of the jobs by weight. We also extend this result to a more general setting where the weights of a job for measuring its weighted flow-time and its contribution towards total allowed rejection weight are different. This is useful, for instance, when we consider the objective of minimizing the maximum stretch. We obtain an $O(1/\eps^6)$-competitive algorithm in this case.
Our algorithms are immediate dispatch, though they may not be immediate reject. All these problems have very strong lower bounds in the speed augmentation model.
△ Less
Submitted 7 October, 2014;
originally announced October 2014.
-
Distributed and Parallel Algorithms for Set Cover Problems with Small Neighborhood Covers
Authors:
Archita Agarwal,
Venkatesan T. Chakaravarthy,
Anamitra R. Choudhury,
Sambuddha Roy,
Yogish Sabharwal
Abstract:
In this paper, we study a class of set cover problems that satisfy a special property which we call the {\em small neighborhood cover} property. This class encompasses several well-studied problems including vertex cover, interval cover, bag interval cover and tree cover. We design unified distributed and parallel algorithms that can handle any set cover problem falling under the above framework a…
▽ More
In this paper, we study a class of set cover problems that satisfy a special property which we call the {\em small neighborhood cover} property. This class encompasses several well-studied problems including vertex cover, interval cover, bag interval cover and tree cover. We design unified distributed and parallel algorithms that can handle any set cover problem falling under the above framework and yield constant factor approximations. These algorithms run in polylogarithmic communication rounds in the distributed setting and are in NC, in the parallel setting.
△ Less
Submitted 27 December, 2013;
originally announced December 2013.
-
Fenchel Duals for Drifting Adversaries
Authors:
Suman K Bera,
Anamitra R Choudhury,
Syamantak Das,
Sambuddha Roy,
Jayram S. Thatchachar
Abstract:
We describe a primal-dual framework for the design and analysis of online convex optimization algorithms for {\em drifting regret}. Existing literature shows (nearly) optimal drifting regret bounds only for the $\ell_2$ and the $\ell_1$-norms. Our work provides a connection between these algorithms and the Online Mirror Descent ($\omd$) updates; one key insight that results from our work is that i…
▽ More
We describe a primal-dual framework for the design and analysis of online convex optimization algorithms for {\em drifting regret}. Existing literature shows (nearly) optimal drifting regret bounds only for the $\ell_2$ and the $\ell_1$-norms. Our work provides a connection between these algorithms and the Online Mirror Descent ($\omd$) updates; one key insight that results from our work is that in order for these algorithms to succeed, it suffices to have the gradient of the regularizer to be bounded (in an appropriate norm). For situations (like for the $\ell_1$ norm) where the vanilla regularizer does not have this property, we have to {\em shift} the regularizer to ensure this. Thus, this helps explain the various updates presented in \cite{bansal10, buchbinder12}. We also consider the online variant of the problem with 1-lookahead, and with movement costs in the $\ell_2$-norm. Our primal dual approach yields nearly optimal competitive ratios for this problem.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.