-
Resilience of the Electric Grid through Trustable IoT-Coordinated Assets
Authors:
Vineet J. Nair,
Venkatesh Venkataramanan,
Priyank Srivastava,
Partha S. Sarker,
Anurag Srivastava,
Laurentiu D. Marinovici,
Jun Zha,
Christopher Irwin,
Prateek Mittal,
John Williams,
H. Vincent Poor,
Anuradha M. Annaswamy
Abstract:
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. Ho…
▽ More
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. However, they can introduce new vulnerabilities in the form of cyberattacks, which can cause significant challenges in ensuring grid resilience. %, i.e. the ability to rapidly restore grid services in the face of severe disruptions. We propose a framework in this paper for achieving grid resilience through suitably coordinated assets including a network of Internet of Things (IoT) devices. A local electricity market is proposed to identify trustable assets and carry out this coordination. Situational Awareness (SA) of locally available DERs with the ability to inject power or reduce consumption is enabled by the market, together with a monitoring procedure for their trustability and commitment. With this SA, we show that a variety of cyberattacks can be mitigated using local trustable resources without stressing the bulk grid. The demonstrations are carried out using a variety of platforms with a high-fidelity co-simulation platform, real-time hardware-in-the-loop validation, and a utility-friendly simulator.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Authors:
Muhammad Adnan,
Amar Phanishayee,
Janardhan Kulkarni,
Prashant J. Nair,
Divya Mahajan
Abstract:
In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor model parallel scenarios, latter being addressed for the first time. The search optimized accelerators for training relevant metrics such as throughput/TDP und…
▽ More
In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor model parallel scenarios, latter being addressed for the first time. The search optimized accelerators for training relevant metrics such as throughput/TDP under a fixed area and power constraints. However, with the proliferation of specialized architectures and complex distributed training mechanisms, the design space exploration of hardware accelerators is very large. Prior work in this space has tried to tackle this by reducing the search space to either a single accelerator execution that too only for inference, or tuning the architecture for specific layers (e.g., convolution). Instead, we take a unique heuristic-based critical path-based approach to determine the best use of available resources (power and area) either for a set of DNN workloads or each workload individually. First, we perform local search to determine the architecture for each pipeline and tensor model stage. Specifically, the system iteratively generates architectural configurations and tunes the design using a novel heuristic-based approach that prioritizes accelerator resources and scheduling to critical operators in a machine learning workload. Second, to address the complexities of distributed training, the local search selects multiple (k) designs per stage. A global search then identifies an accelerator from the top-k sets to optimize training throughput across the stages. We evaluate this work on 11 different DNN models. Compared to a recent inference-only work Spotlight, our method converges to a design in, on average, 31x less time and offers 12x higher throughput. Moreover, designs generated using our method achieve 12% throughput improvement over TPU architecture.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Accelerating Recommender Model Training by Dynamically Skip** Stale Embeddings
Authors:
Yassaman Ebrahimzadeh Maboud,
Muhammad Adnan,
Divya Mahajan,
Prashant J. Nair
Abstract:
Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variatio…
▽ More
Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively.
△ Less
Submitted 21 March, 2024;
originally announced April 2024.
-
Capacity Provisioning Motivated Online Non-Convex Optimization Problem with Memory and Switching Cost
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
An online non-convex optimization problem is considered where the goal is to minimize the flow time (total delay) of a set of jobs by modulating the number of active servers, but with a switching cost associated with changing the number of active servers over time. Each job can be processed by at most one fixed speed server at any time. Compared to the usual online convex optimization (OCO) proble…
▽ More
An online non-convex optimization problem is considered where the goal is to minimize the flow time (total delay) of a set of jobs by modulating the number of active servers, but with a switching cost associated with changing the number of active servers over time. Each job can be processed by at most one fixed speed server at any time. Compared to the usual online convex optimization (OCO) problem with switching cost, the objective function considered is non-convex and more importantly, at each time, it depends on all past decisions and not just the present one. Both worst-case and stochastic inputs are considered; for both cases, competitive algorithms are derived.
△ Less
Submitted 1 July, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Authors:
Muhammad Adnan,
Akhil Arunkumar,
Gaurav Jain,
Prashant J. Nair,
Ilya Soloveychik,
Purushotham Kamath
Abstract:
Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phas…
▽ More
Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phase is constrained by memory bandwidth due to the overhead of transferring weights and KV cache values from the memory system to the computing units. This memory bottleneck becomes particularly pronounced in applications that require long-context and extensive text generation, both of which are increasingly crucial for LLMs.
This paper introduces "Keyformer", an innovative inference-time approach, to mitigate the challenges associated with KV cache size and memory bandwidth utilization. Keyformer leverages the observation that approximately 90% of the attention weight in generative inference focuses on a specific subset of tokens, referred to as "key" tokens. Keyformer retains only the key tokens in the KV cache by identifying these crucial tokens using a novel score function. This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy. We evaluate Keyformer's performance across three foundational models: GPT-J, Cerebras-GPT, and MPT, which employ various positional embedding algorithms. Our assessment encompasses a variety of tasks, with a particular emphasis on summarization and conversation tasks involving extended contexts. Keyformer's reduction of KV cache reduces inference latency by 2.1x and improves token generation throughput by 2.4x, while preserving the model's accuracy.
△ Less
Submitted 5 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Fixed confidence community mode estimation
Authors:
Meera Pai,
Nikhil Karamchandani,
Jayakrishnan Nair
Abstract:
Our aim is to estimate the largest community (a.k.a., mode) in a population composed of multiple disjoint communities. This estimation is performed in a fixed confidence setting via sequential sampling of individuals with replacement. We consider two sampling models: (i) an identityless model, wherein only the community of each sampled individual is revealed, and (ii) an identity-based model, wher…
▽ More
Our aim is to estimate the largest community (a.k.a., mode) in a population composed of multiple disjoint communities. This estimation is performed in a fixed confidence setting via sequential sampling of individuals with replacement. We consider two sampling models: (i) an identityless model, wherein only the community of each sampled individual is revealed, and (ii) an identity-based model, wherein the learner is able to discern whether or not each sampled individual has been sampled before, in addition to the community of that individual. The former model corresponds to the classical problem of identifying the mode of a discrete distribution, whereas the latter seeks to capture the utility of identity information in mode estimation. For each of these models, we establish information theoretic lower bounds on the expected number of samples needed to meet the prescribed confidence level, and propose sound algorithms with a sample complexity that is provably asymptotically optimal. Our analysis highlights that identity information can indeed be utilized to improve the efficiency of community mode estimation.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Ad-Rec: Advanced Feature Interactions to Address Covariate-Shifts in Recommendation Networks
Authors:
Muhammad Adnan,
Yassaman Ebrahimzadeh Maboud,
Divya Mahajan,
Prashant J. Nair
Abstract:
Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user b…
▽ More
Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user behaviour. Traditional feature interaction techniques have limitations in achieving optimal performance in this context.
This work introduces Ad-Rec, an advanced network that leverages feature interaction techniques to address covariate shifts. This helps eliminate irrelevant interactions in recommendation tasks. Ad-Rec leverages masked transformers to enable the learning of higher-order cross-features while mitigating the impact of data distribution drift. Our approach improves model quality, accelerates convergence, and reduces training time, as measured by the Area Under Curve (AUC) metric. We demonstrate the scalability of Ad-Rec and its ability to achieve superior model quality through comprehensive ablation studies.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
Authors:
Irene Wang,
Prashant J. Nair,
Divya Mahajan
Abstract:
Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall traini…
▽ More
Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
△ Less
Submitted 26 September, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Best Arm Identification in Bandits with Limited Precision Sampling
Authors:
Kota Srinivas Reddy,
P. N. Karthik,
Nikhil Karamchandani,
Jayakrishnan Nair
Abstract:
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled a…
▽ More
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled arm and its instantaneous reward are revealed to the learner, whose goal is to find the best arm by minimising the expected stop** time, subject to an upper bound on the error probability. We present an asymptotic lower bound on the expected stop** time, which holds as the error probability vanishes. We show that the optimal allocation suggested by the lower bound is, in general, non-unique and therefore challenging to track. We propose a modified tracking-based algorithm to handle non-unique optimal allocations, and demonstrate that it is asymptotically optimal. We also present non-asymptotic lower and upper bounds on the stop** time in the simpler setting when the arms accessible from one box do not overlap with those of others.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
On the ubiquity of duopolies in constant sum congestion games
Authors:
Shiksha Singhal,
Veeraruna Kavitha,
Jayakrishnan Nair
Abstract:
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statisti…
▽ More
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statistical economies of scale) and competition between coalitions over market share. In a departure from the prior literature on resource pooling for congestible services, we show that the grand coalition is in general not stable, once we allow for competition over market share. In fact, under classical notions of stability (defined via blocking by any coalition), we show that no partition is stable. This motivates us to introduce more restricted (and relevant) notions of blocking; interestingly, we find that the stable configurations under these novel notions of stability are duopolies, where the dominant coalition exploits its economies of scale to corner a disproportionate market share. Furthermore, we completely characterise the stable duopolies in heavy and light traffic regimes.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Local retail electricity markets for distribution grid services
Authors:
Vineet Jagadeesan Nair,
Anuradha Annaswamy
Abstract:
We propose a hierarchical local electricity market (LEM) at the primary and secondary feeder levels in a distribution grid, to optimally coordinate and schedule distributed energy resources (DER) and provide valuable grid services like voltage control. At the primary level, we use a current injection-based model that is valid for both radial and meshed, balanced and unbalanced, multi-phase systems…
▽ More
We propose a hierarchical local electricity market (LEM) at the primary and secondary feeder levels in a distribution grid, to optimally coordinate and schedule distributed energy resources (DER) and provide valuable grid services like voltage control. At the primary level, we use a current injection-based model that is valid for both radial and meshed, balanced and unbalanced, multi-phase systems. The primary and secondary markets leverage the flexibility offered by DERs to optimize grid operation and maximize social welfare. Numerical simulations on an IEEE-123 bus modified to include DERs, show that the LEM successfully achieves voltage control and reduces overall network costs, while also allowing us to decompose the price and value associated with different grid services so as to accurately compensate DERs.
△ Less
Submitted 11 July, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems
Authors:
Jeonghyun Woo,
Gururaj Saileshwar,
Prashant J. Nair
Abstract:
As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swap** aggressor rows with randomly selected rows…
▽ More
As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swap** aggressor rows with randomly selected rows before an aggressor row can cause Row Hammer.
Our paper observes that RRS is neither secure nor scalable. We first propose the `Juggernaut attack pattern' that breaks RRS in under 1 day. Juggernaut exploits the fact that the mitigative action of RRS, a swap operation, can itself induce additional target row activations, defeating such a defense. Second, this paper proposes a new defense Secure Row-Swap mechanism that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut. Furthermore, this paper extends Secure Row-Swap with attack detection to defend against even future attacks. While this provides better security, it also allows for securely reducing the frequency of swaps, thereby enabling Scalable and Secure Row-Swap. The Scalable and Secure Row-Swap mechanism provides years of Row Hammer protection with 3.3X lower storage overheads as compared to the RRS design. It incurs only a 0.7% slowdown as compared to a not-secure baseline for a Row Hammer threshold of 1200.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget
Authors:
Fathima Zarin Faizal,
Jayakrishnan Nair
Abstract:
We consider a constrained, pure exploration, stochastic multi-armed bandit formulation under a fixed budget. Each arm is associated with an unknown, possibly multi-dimensional distribution and is described by multiple attributes that are a function of this distribution. The aim is to optimize a particular attribute subject to user-defined constraints on the other attributes. This framework models…
▽ More
We consider a constrained, pure exploration, stochastic multi-armed bandit formulation under a fixed budget. Each arm is associated with an unknown, possibly multi-dimensional distribution and is described by multiple attributes that are a function of this distribution. The aim is to optimize a particular attribute subject to user-defined constraints on the other attributes. This framework models applications such as financial portfolio optimization, where it is natural to perform risk-constrained maximization of mean return. We assume that the attributes can be estimated using samples from the arms' distributions and that these estimators satisfy suitable concentration inequalities. We propose an algorithm called \textsc{Constrained-SR} based on the Successive Rejects framework, which recommends an optimal arm and flags the instance as being feasible or infeasible. A key feature of this algorithm is that it is designed on the basis of an information theoretic lower bound for two-armed instances. We characterize an instance-dependent upper bound on the probability of error under \textsc{Constrained-SR}, that decays exponentially with respect to the budget. We further show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Non-asymptotic near optimal algorithms for two sided matchings
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
A two-sided matching system is considered, where servers are assumed to arrive at a fixed rate, while the arrival rate of customers is modulated via a price-control mechanism. We analyse a loss model, wherein customers who are not served immediately upon arrival get blocked, as well as a queueing model, wherein customers wait in a queue until they receive service. The objective is to maximize the…
▽ More
A two-sided matching system is considered, where servers are assumed to arrive at a fixed rate, while the arrival rate of customers is modulated via a price-control mechanism. We analyse a loss model, wherein customers who are not served immediately upon arrival get blocked, as well as a queueing model, wherein customers wait in a queue until they receive service. The objective is to maximize the platform profit generated from matching servers and customers, subject to quality of service constraints, such as the expected wait time of servers in the loss system model, and the stability of the customer queue in the queuing model. For the loss system, subject to a certain relaxation, we show that the optimal policy has a bang-bang structure. We also derive approximation guarantees for simple pricing policies. For the queueing system, we propose a simple bi-modal matching strategy and show that it achieves near optimal profit.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
The Dirty Secret of SSDs: Embodied Carbon
Authors:
Swamit Tannu,
Prashant J. Nair
Abstract:
Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A strikin…
▽ More
Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A striking example of this is the manufacturing process for a single Gigabyte of Flash memory, which emits approximately 0.16 Kg of CO2 - a considerable fraction of the total carbon emissions attributed to the system. Remarkably, the manufacturing of storage devices alone contributed to an estimated 20 million metric tonnes of CO2 emissions in the year 2021.
In light of these environmental concerns, this paper delves into an analysis of the sustainability trade-offs inherent in Solid-State Drives (SSDs) when compared to traditional Hard Disk Drives (HDDs). Moreover, this study proposes methodologies to gauge the embodied carbon costs associated with storage systems effectively. The research encompasses four key strategies to enhance the sustainability of storage systems. In summation, this paper critically addresses the embodied carbon issues associated with SSDs, comparing them with HDDs, and proposes a comprehensive framework of strategies to enhance the sustainability of storage systems.
△ Less
Submitted 28 September, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
On Decentralizing Federated Reinforcement Learning in Multi-Robot Scenarios
Authors:
Jayprakash S. Nair,
Divya D. Kulkarni,
Ajitem Joshi,
Sruthy Suresh
Abstract:
Federated Learning (FL) allows for collaboratively aggregating learned information across several computing devices and sharing the same amongst them, thereby tackling issues of privacy and the need of huge bandwidth. FL techniques generally use a central server or cloud for aggregating the models received from the devices. Such centralized FL techniques suffer from inherent problems such as failu…
▽ More
Federated Learning (FL) allows for collaboratively aggregating learned information across several computing devices and sharing the same amongst them, thereby tackling issues of privacy and the need of huge bandwidth. FL techniques generally use a central server or cloud for aggregating the models received from the devices. Such centralized FL techniques suffer from inherent problems such as failure of the central node and bottlenecks in channel bandwidth. When FL is used in conjunction with connected robots serving as devices, a failure of the central controlling entity can lead to a chaotic situation. This paper describes a mobile agent based paradigm to decentralize FL in multi-robot scenarios. Using Webots, a popular free open-source robot simulator, and Tartarus, a mobile agent platform, we present a methodology to decentralize federated learning in a set of connected robots. With Webots running on different connected computing systems, we show how mobile agents can perform the task of Decentralized Federated Reinforcement Learning (dFRL). Results obtained from experiments carried out using Q-learning and SARSA by aggregating their corresponding Q-tables, show the viability of using decentralized FL in the domain of robotics. Since the proposed work can be used in conjunction with other learning algorithms and also real robots, it can act as a vital tool for the study of decentralized FL using heterogeneous learning algorithms concurrently in multi-robot scenarios.
△ Less
Submitted 7 September, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Unsupervised Crowdsourcing with Accuracy and Cost Guarantees
Authors:
Yashvardhan Didwania,
Jayakrishnan Nair,
N. Hemachandra
Abstract:
We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be divided into multiple classes, based on their skill, experience, and/or past performance. We model each worker class via an unknown confusion matrix, and a (known…
▽ More
We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be divided into multiple classes, based on their skill, experience, and/or past performance. We model each worker class via an unknown confusion matrix, and a (known) price to be paid per label prediction. For this setting, we propose algorithms for acquiring label predictions from workers, and for inferring the true labels of items. We prove that if the number of (unlabeled) items available is large enough, our algorithms satisfy the prescribed error thresholds, incurring a cost that is near-optimal. Finally, we validate our algorithms, and some heuristics inspired by them, through an extensive case study.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Heterogeneous Acceleration Pipeline for Recommendation System Training
Authors:
Muhammad Adnan,
Yassaman Ebrahimzadeh Maboud,
Divya Mahajan,
Prashant J. Nair
Abstract:
Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer tim…
▽ More
Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns.
This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline.
△ Less
Submitted 28 April, 2024; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Sequential Community Mode Estimation
Authors:
Shubham Anand Jain,
Shreyas Goenka,
Divyam Bapna,
Nikhil Karamchandani,
Jayakrishnan Nair
Abstract:
We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread…
▽ More
We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread across multiple boxes. The learning agent can, at any time, sample (with replacement) a random individual from any chosen box; when this is done, the agent learns the community the sampled individual belongs to, and also whether or not this individual has been sampled before. The goal of the agent is to minimize the probability of mis-identifying the largest community in a \emph{fixed budget} setting, by optimizing both the sampling strategy as well as the decision rule. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm. In several cases of interest, the exponential decay rates of the probability of error under our algorithms are shown to be optimal up to constant factors. The proposed algorithms are further validated via simulations on real-world datasets.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Coalition Formation in Constant Sum Queueing Games
Authors:
Shiksha Singhal,
Veeraruna Kavitha,
Jayakrishnan Nair
Abstract:
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the total size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting st…
▽ More
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the total size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statistical economies of scale) and competition between coalitions over market share. In a departure from the prior literature on resource pooling for congestible services, we show that the grand coalition is in general not stable, once we allow for competition over market share. Instead, the stable configurations are duopolies, where the dominant coalition exploits its economies of scale to corner a disproportionate market share. We analyse the stable duopolies that emerge from this interaction, and also study a dynamic variant of this game.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Optimal Cycling of a Heterogenous Battery Bank via Reinforcement Learning
Authors:
Vivek Deulkar,
Jayakrishnan Nair
Abstract:
We consider the problem of optimal charging/discharging of a bank of heterogenous battery units, driven by stochastic electricity generation and demand processes. The batteries in the battery bank may differ with respect to their capacities, ramp constraints, losses, as well as cycling costs. The goal is to minimize the degradation costs associated with battery cycling in the long run; this is pos…
▽ More
We consider the problem of optimal charging/discharging of a bank of heterogenous battery units, driven by stochastic electricity generation and demand processes. The batteries in the battery bank may differ with respect to their capacities, ramp constraints, losses, as well as cycling costs. The goal is to minimize the degradation costs associated with battery cycling in the long run; this is posed formally as a Markov decision process. We propose a linear function approximation based Q-learning algorithm for learning the optimal solution, using a specially designed class of kernel functions that approximate the structure of the value functions associated with the MDP. The proposed algorithm is validated via an extensive case study.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Speed Scaling with Multiple Servers Under A Sum Power Constraint
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
The problem of scheduling jobs and choosing their respective speeds with multiple servers under a sum power constraint to minimize the flow time + energy is considered. This problem is a generalization of the flow time minimization problem with multiple unit-speed servers, when jobs can be parallelized, however, with a sub-linear, concave speedup function $k^{1/α}, α>1$ when allocated $k$ servers,…
▽ More
The problem of scheduling jobs and choosing their respective speeds with multiple servers under a sum power constraint to minimize the flow time + energy is considered. This problem is a generalization of the flow time minimization problem with multiple unit-speed servers, when jobs can be parallelized, however, with a sub-linear, concave speedup function $k^{1/α}, α>1$ when allocated $k$ servers, i.e., jobs experience diminishing returns from being allocated additional servers. When all jobs are available at time $0$, we show that a very simple algorithm EQUI, that processes all available jobs at the same speed is $\left(2-\frac{1}α\right) \frac{2}{\left(1-\left(\frac{1}α\right)\right)}$-competitive, while in the general case, when jobs arrive over time, an LCFS based algorithm is shown to have a constant (dependent only on $α$) competitive ratio.
△ Less
Submitted 18 August, 2021; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Speed Scaling On Parallel Servers with MapReduce Type Precedence Constraints
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
A multiple server setting is considered, where each server has tunable speed, and increasing the speed incurs an energy cost. Jobs arrive to a single queue, and each job has two types of sub-tasks, map and reduce, and a {\bf precedence} constraint among them: any reduce task of a job can only be processed once all the map tasks of the job have been completed. In addition to the scheduling problem,…
▽ More
A multiple server setting is considered, where each server has tunable speed, and increasing the speed incurs an energy cost. Jobs arrive to a single queue, and each job has two types of sub-tasks, map and reduce, and a {\bf precedence} constraint among them: any reduce task of a job can only be processed once all the map tasks of the job have been completed. In addition to the scheduling problem, i.e., which task to execute on which server, with tunable speed, an additional decision variable is the choice of speed for each server, so as to minimize a linear combination of the sum of the flow times of jobs/tasks and the total energy cost. The precedence constraints present new challenges for the speed scaling problem with multiple servers, namely that the number of tasks that can be executed at any time may be small but the total number of outstanding tasks might be quite large. We present simple speed scaling algorithms that are shown to have competitive ratios, that depend on the power cost function, and/or the ratio of the size of the largest task and the shortest reduce task, but not on the number of jobs, or the number of servers.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Accelerating Recommendation System Training by Leveraging Popular Choices
Authors:
Muhammad Adnan,
Yassaman Ebrahimzadeh Maboud,
Divya Mahajan,
Prashant J. Nair
Abstract:
Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation model…
▽ More
Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000x more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3x and 1.52x in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy
△ Less
Submitted 28 September, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits
Authors:
Anmol Kagrecha,
Jayakrishnan Nair,
Krishna Jagannathan
Abstract:
Traditional multi-armed bandit (MAB) formulations usually make certain assumptions about the underlying arms' distributions, such as bounds on the support or their tail behaviour. Moreover, such parametric information is usually 'baked' into the algorithms. In this paper, we show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance whe…
▽ More
Traditional multi-armed bandit (MAB) formulations usually make certain assumptions about the underlying arms' distributions, such as bounds on the support or their tail behaviour. Moreover, such parametric information is usually 'baked' into the algorithms. In this paper, we show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified. Our key contributions are twofold: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are asymptotically near-optimal. Additionally, we consider a risk-aware criterion for best arm identification, where the objective associated with each arm is a linear combination of the mean and the conditional value at risk (CVaR). Throughout, we make a very mild 'bounded moment' assumption, which lets us work with both light-tailed and heavy-tailed distributions within a unified framework.
△ Less
Submitted 27 March, 2022; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Bandit algorithms: Letting go of logarithmic regret for statistical robustness
Authors:
Kumar Ashutosh,
Jayakrishnan Nair,
Anmol Kagrecha,
Krishna Jagannathan
Abstract:
We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms' distributions, we show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super…
▽ More
We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms' distributions, we show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret. This result highlights the inevitable statistical fragility of all `logarithmic regret' bandit algorithms available in the literature---for instance, if a UCB algorithm designed for $σ$-subGaussian distributions is used in a subGaussian setting with a mismatched variance parameter, the learning performance could be inconsistent. Next, we show a positive result: statistically robust and consistent learning performance is attainable if we allow the regret to be slightly worse than logarithmic. Specifically, we propose three classes of distribution oblivious algorithms that achieve an asymptotic regret that is arbitrarily close to logarithmic.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Constrained regret minimization for multi-criterion multi-armed bandits
Authors:
Anmol Kagrecha,
Jayakrishnan Nair,
Krishna Jagannathan
Abstract:
We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm is determined by several, possibly conflicting attributes. The aim is to optimize a 'primary' attribute subject to user-provided constraints on other 'secondar…
▽ More
We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm is determined by several, possibly conflicting attributes. The aim is to optimize a 'primary' attribute subject to user-provided constraints on other 'secondary' attributes. We assume that the attributes can be estimated using samples from the arms' distributions, and that the estimators enjoy suitable concentration properties. We propose an algorithm called Con-LCB that guarantees a logarithmic regret, i.e., the average number of plays of all non-optimal arms is at most logarithmic in the horizon. The algorithm also outputs a Boolean flag that correctly identifies, with high probability, whether the given instance is feasible/infeasible with respect to the constraints. We also show that Con-LCB is optimal within a universal constant, i.e., that more sophisticated algorithms cannot do much better universally. Finally, we establish a fundamental trade-off between regret minimization and feasibility identification. Our framework finds natural applications, for instance, in financial portfolio optimization, where risk constrained maximization of expected return is meaningful.
△ Less
Submitted 3 January, 2023; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Adaptive flow-level scheduling for the IoT MAC
Authors:
Pragya Sharma,
Jayakrishnan Nair,
Raman Singh
Abstract:
Over the past decade, distributed CSMA, which forms the basis for WiFi, has been deployed ubiquitously to provide seamless and high-speed mobile internet access. However, distributed CSMA might not be ideal for future IoT/M2M applications, where the density of connected devices/sensors/controllers is expected to be orders of magnitude higher than that in present wireless networks. In such high-den…
▽ More
Over the past decade, distributed CSMA, which forms the basis for WiFi, has been deployed ubiquitously to provide seamless and high-speed mobile internet access. However, distributed CSMA might not be ideal for future IoT/M2M applications, where the density of connected devices/sensors/controllers is expected to be orders of magnitude higher than that in present wireless networks. In such high-density networks, the overhead associated with completely distributed MAC protocols will become a bottleneck. Moreover, IoT communications are likely to have strict QoS requirements, for which the `best-effort' scheduling by present WiFi networks may be unsuitable. This calls for a clean-slate redesign of the wireless MAC taking into account the requirements for future IoT/M2M networks. In this paper, we propose a reservation-based (for minimal overhead) wireless MAC designed specifically with IoT/M2M applications in mind. The key features include: (i) flow-level, rather than packet level contention to minimize overhead, (ii) deadline aware, reservation based scheduling, and (iii) the ability to dynamically adapt the MAC parameters with changing workload.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Please come back later: Benefiting from deferrals in service systems
Authors:
Anmol Kagrecha,
Jayakrishnan Nair
Abstract:
The performance evaluation of loss service systems, where customers who cannot be served upon arrival get dropped, has a long history going back to the classical Erlang B model. In this paper, we consider the performance benefits arising from the possibility of deferring customers who cannot be served upon arrival. Specifically, we consider an Erlang B type loss system where the system operator ca…
▽ More
The performance evaluation of loss service systems, where customers who cannot be served upon arrival get dropped, has a long history going back to the classical Erlang B model. In this paper, we consider the performance benefits arising from the possibility of deferring customers who cannot be served upon arrival. Specifically, we consider an Erlang B type loss system where the system operator can, subject to certain constraints, ask a customer arriving when all servers are busy, to come back at a specified time in the future. If the system is still fully loaded when the deferred customer returns, she gets dropped for good. For such a system, we ask: How should the system operator determine the rearrival times of the deferred customers based on the state of the system (which includes those customers already deferred and yet to arrive)? How does one quantify the performance benefit of such a deferral policy? Our contributions are as follows. We propose a simple state-dependent policy for determining the rearrival times of deferred customers. For this policy, we characterize the long run fraction of customers dropped. We also analyse a relaxation where the deferral times are bounded in expectation. Via extensive numerical evaluations, we demonstrate the superiority of the proposed state-dependent policies over naive state-independent deferral policies.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area Overheads
Authors:
Seokin Hong,
Bulent Abali,
Alper Buyuktosunoglu,
Michael B. Healy,
Prashant J. Nair
Abstract:
Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads.
This…
▽ More
Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads.
This paper proposes Touché, a framework that enables storing multiple arbitrary compressed cache blocks within a physical cacheline without any tag area overheads. The Touché framework consists of three components. The first component, called the ``Signature'' (SIGN) engine, creates shortened signatures from the tag addresses of compressed blocks. Due to this, the SIGN engine can store multiple signatures in each tag entry. On a cache access, the physical cacheline is accessed only if there is a signature match (which has a negligible probability of false positive). The second component, called the ``Tag Appended Data'' (TADA) mechanism, stores the full tag addresses with data. TADA enables Touché to detect false positive signature matches by ensuring that the actual tag address is available for comparison. The third component, called the ``Superblock Marker'' (SMARK) mechanism, uses a unique marker in the tag entry to indicate the occurrence of compressed cache blocks from neighboring physical addresses in the same cacheline. Touché is completely hardware-based and achieves an average speedup of 12\% (ideal 13\%) when compared to an uncompressed baseline.
△ Less
Submitted 2 September, 2019;
originally announced September 2019.
-
Revenue Sharing in the Internet: A Moral Hazard Approach and a Net-neutrality Perspective
Authors:
Fehmina Malik,
Manjesh K. ~Hanawal,
Yezekael Hayel,
Jayakrishnan Nair
Abstract:
Revenue sharing contracts between Content Providers (CPs) and Internet Service Providers (ISPs) can act as leverage for enhancing the infrastructure of the Internet. ISPs can be incentivized to make investments in network infrastructure that improve Quality of Service (QoS) for users if attractive contracts are negotiated between them and CPs. The idea here is that part of the net profit gained by…
▽ More
Revenue sharing contracts between Content Providers (CPs) and Internet Service Providers (ISPs) can act as leverage for enhancing the infrastructure of the Internet. ISPs can be incentivized to make investments in network infrastructure that improve Quality of Service (QoS) for users if attractive contracts are negotiated between them and CPs. The idea here is that part of the net profit gained by CPs are given to ISPs to invest in the network. The Moral Hazard economic framework is used to model such an interaction, in which a principal determines a contract, and an agent reacts by adapting her effort. In our setting, several competitive CPs interact through one common ISP. Two cases are studied: (i) the ISP differentiates between the CPs and makes a (potentially) different investment to improve the QoS of each CP, and (ii) the ISP does not differentiate between CPs and makes a common investment for both. The last scenario can be viewed as \emph{network neutral behavior} on the part of the ISP. We analyse the optimal contracts and show that the CP that can better monetize its demand always prefers the non-neutral regime. Interestingly, ISP revenue, as well as social utility, are also found to be higher under the non-neutral regime.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
Multiple Server SRPT with speed scaling is competitive
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
Can the popular shortest remaining processing time (SRPT) algorithm achieve a constant competitive ratio on multiple servers when server speeds are adjustable (speed scaling) with respect to the flow time plus energy consumption metric? This question has remained open for a while, where a negative result in the absence of speed scaling is well known. The main result of this paper is to show that m…
▽ More
Can the popular shortest remaining processing time (SRPT) algorithm achieve a constant competitive ratio on multiple servers when server speeds are adjustable (speed scaling) with respect to the flow time plus energy consumption metric? This question has remained open for a while, where a negative result in the absence of speed scaling is well known. The main result of this paper is to show that multi-server SRPT can be constant competitive, with a competitive ratio that only depends on the power-usage function of the servers, but not on the number of jobs/servers or the job sizes (unlike when speed scaling is not allowed). When all job sizes are unity, we show that round-robin routing is optimal and can achieve the same competitive ratio as the best known algorithm for the single server problem. Finally, we show that a class of greedy dispatch policies, including policies that route to the least loaded or the shortest queue, do not admit a constant competitive ratio. When job arrivals are stochastic, with Poisson arrivals and i.i.d. job sizes, we show that random routing and a simple gated-static speed scaling algorithm achieves a constant competitive ratio.
△ Less
Submitted 5 May, 2020; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Speed Scaling with Tandem Servers
Authors:
Rahul Vaze,
Jayakrishnan Nair
Abstract:
Speed scaling for a tandem server setting is considered, where there is a series of servers, and each job has to be processed by each of the servers in sequence. Servers have a variable speed, their power consumption being a convex increasing function of the speed. We consider the worst case setting as well as the stochastic setting. In the worst case setting, the jobs are assumed to be of unit si…
▽ More
Speed scaling for a tandem server setting is considered, where there is a series of servers, and each job has to be processed by each of the servers in sequence. Servers have a variable speed, their power consumption being a convex increasing function of the speed. We consider the worst case setting as well as the stochastic setting. In the worst case setting, the jobs are assumed to be of unit size with arbitrary (possibly adversarially determined) arrival instants. For this problem, we devise an online speed scaling algorithm that is constant competitive with respect to the optimal offline algorithm that has non-causal information. The proposed algorithm, at all times, uses the same speed on all active servers, such that the total power consumption equals the number of outstanding jobs. In the stochastic setting, we consider a more general tandem network, with a parallel bank of servers at each stage. In this setting, we show that random routing with a simple gated static speed selection is constant competitive. In both cases, the competitive ratio depends only on the power functions, and is independent of the workload and the number of servers.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Partial Server Pooling in Redundancy Systems
Authors:
Akshay Mete,
D. Manjunath,
Jayakrishnan Nair,
Balakrishna Prabhu
Abstract:
Partial sharing allows providers to possibly pool a fraction of their resources when full pooling is not beneficial to them. Recent work in systems without sharing has shown that redundancy can improve performance considerably. In this paper, we combine partial sharing and redundancy by develo** partial sharing models for providers operating multi-server systems with redundancy. Two M/M/N queues…
▽ More
Partial sharing allows providers to possibly pool a fraction of their resources when full pooling is not beneficial to them. Recent work in systems without sharing has shown that redundancy can improve performance considerably. In this paper, we combine partial sharing and redundancy by develo** partial sharing models for providers operating multi-server systems with redundancy. Two M/M/N queues with redundant service models are considered. Copies of an arriving job are placed in the queues of servers that can serve the job. Partial sharing models for cancel-on-complete and cancel-on-start redundancy models are developed. For cancel-on-complete, it is shown that the Pareto efficient region is the full pooling configuration. For a cancel-on-start policy, we conjecture that the Pareto frontier is always non-empty and is such that at least one of the two providers is sharing all of its resources. For this system, using bargaining theory the sharing configuration that the providers may use is determined. Mean response time and probability of waiting are the performance metrics considered.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
Sponsored data with ISP competition
Authors:
Pooja Vyavahare,
D. Manjunath,
Jayakrishnan Nair
Abstract:
We analyze the effect of sponsored data platforms when Internet service providers (ISPs) compete for subscribers and content providers (CPs) compete for a share of the bandwidth usage by the customers. Our analytical model is of a full information, leader-follower game. ISPs lead and set prices for sponsorship. CPs then make the binary decision of sponsoring or not sponsoring their content on the…
▽ More
We analyze the effect of sponsored data platforms when Internet service providers (ISPs) compete for subscribers and content providers (CPs) compete for a share of the bandwidth usage by the customers. Our analytical model is of a full information, leader-follower game. ISPs lead and set prices for sponsorship. CPs then make the binary decision of sponsoring or not sponsoring their content on the ISPs. Lastly, based on both of these, users make a two-part decision of choosing the ISP to which they subscribe, and the amount of data to consume from each of the CPs through the chosen ISP. User consumption is determined by a utility maximization framework, the sponsorship decision is determined by a non-cooperative game between the CPs, and the ISPs set their prices to maximize their profit in response to the prices set by the competing ISP. We analyze the pricing dynamics of the prices set by the ISPs, the sponsorship decisions that the CPs make and the market structure therein, and the surpluses of the ISPs, CPs, and users.
This is the first analysis of the effect sponsored data platforms in the presence of ISP competition. We show that inter-ISP competition does not inhibit ISPs from extracting a significant fraction of the CP surplus. Moreover, the ISPs often have an incentive to significantly skew the CP marketplace in favor of the most profitable CP.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards
Authors:
Anmol Kagrecha,
Jayakrishnan Nair,
Krishna Jagannathan
Abstract:
Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear…
▽ More
Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework. We allow the reward distributions to be unbounded or even heavy-tailed. For this problem, our goal is to devise algorithms that are entirely distribution oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms.
In this paper, we provide a class of such algorithms with provable upper bounds on the probability of incorrect identification. In the process, we develop a novel estimator for the CVaR of unbounded (including heavy-tailed) random variables and prove a concentration inequality for the same, which could be of independent interest. We also compare the error bounds for our distribution oblivious algorithms with those corresponding to standard non-oblivious algorithms. Finally, numerical experiments reveal that our algorithms perform competitively when compared with non-oblivious algorithms, suggesting that distribution obliviousness can be realised in practice without incurring a significant loss of performance.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Dynamic scheduling in a partially fluid, partially lossy queueing system
Authors:
Kiran Chaudhary,
Veeraruna Kavitha,
Jayakrishnan Nair
Abstract:
We consider a single server queueing system with two classes of jobs: eager jobs with small sizes that require service to begin almost immediately upon arrival, and tolerant jobs with larger sizes that can wait for service. While blocking probability is the relevant performance metric for the eager class, the tolerant class seeks to minimize its mean sojourn time. In this paper, we discuss the per…
▽ More
We consider a single server queueing system with two classes of jobs: eager jobs with small sizes that require service to begin almost immediately upon arrival, and tolerant jobs with larger sizes that can wait for service. While blocking probability is the relevant performance metric for the eager class, the tolerant class seeks to minimize its mean sojourn time. In this paper, we discuss the performance of each class under dynamic scheduling policies, where the scheduling of both classes depends on the instantaneous state of the system. This analysis is carried out under a certain fluid limit, where the arrival rate and service rate of the eager class are scaled to infinity, holding the offered load constant. Our performance characterizations reveal a (dynamic) pseudo-conservation law that ties the performance of both the classes to the standalone blocking probabilities of the eager class. Further, the performance is robust to other specifics of the scheduling policies. We also characterize the Pareto frontier of the achievable region of performance vectors under the same fluid limit, and identify a (two-parameter) class of Pareto-complete scheduling policies.
△ Less
Submitted 23 December, 2021; v1 submitted 13 April, 2019;
originally announced April 2019.
-
Sharing within limits: Partial resource pooling in loss systems
Authors:
Anvitha Nandigam,
Suraj Jog,
D. Manjunath,
Jayakrishnan Nair,
B. J. Prabhu
Abstract:
Fragmentation of expensive resources, e.g., spectrum for wireless services, between providers can introduce inefficiencies in resource utilisation and worsen overall system performance. In such cases, resource pooling between independent service providers can be used to improve performance. However, for providers to agree to pool their resources, the arrangement has to be mutually beneficial. The…
▽ More
Fragmentation of expensive resources, e.g., spectrum for wireless services, between providers can introduce inefficiencies in resource utilisation and worsen overall system performance. In such cases, resource pooling between independent service providers can be used to improve performance. However, for providers to agree to pool their resources, the arrangement has to be mutually beneficial. The traditional notion of resource pooling, which implies complete sharing, need not have this property. For example, under full pooling, one of the providers may be worse off and hence have no incentive to participate. In this paper, we propose partial resource sharing models as a generalization of full pooling, which can be configured to be beneficial to all participants.
We formally define and analyze two partial sharing models between two service providers, each of which is an Erlang-B loss system with the blocking probabilities as the performance measure. We show that there always exist partial sharing configurations that are beneficial to both providers, irrespective of the load and the number of circuits of each of the providers. A key result is that the Pareto frontier has at least one of the providers sharing all its resources with the other. Furthermore, full pooling may not lie inside this Pareto set. The choice of the sharing configurations within the Pareto set is formalized based on bargaining theory. Finally, large system approximations of the blocking probabilities in the quality-efficiency-driven regime are presented.
△ Less
Submitted 19 August, 2018;
originally announced August 2018.
-
On QoS-Compliant Telehaptic Communication over Shared Networks
Authors:
Vineet Gokhale,
Jayakrishnan Nair,
Subhasis Chaudhuri,
Jan Fesl
Abstract:
The development of communication protocols for teleoperation with force feedback (generally known as telehaptics) has gained widespread interest over the past decade. Several protocols have been proposed for performing telehaptic interaction over shared networks. However, a comprehensive analysis of the impact of network cross-traffic on telehaptic streams, and the feasibility of Quality of Servic…
▽ More
The development of communication protocols for teleoperation with force feedback (generally known as telehaptics) has gained widespread interest over the past decade. Several protocols have been proposed for performing telehaptic interaction over shared networks. However, a comprehensive analysis of the impact of network cross-traffic on telehaptic streams, and the feasibility of Quality of Service (QoS) compliance is lacking in the literature. In this paper, we seek to fill this gap. Specifically, we explore the QoS experienced by two classes of telehaptic protocols on shared networks - Constant Bitrate (CBR) protocols and adaptive sampling based protocols, accounting for CBR as well as TCP cross-traffic. Our treatment of CBR-based telehaptic protocols is based on a micro-analysis of the interplay between TCP and CBR flows on a shared bottleneck link, which is broadly applicable for performance evaluation of CBR-based media streaming applications. Based on our analytical characterization of telehaptic QoS, and via extensive simulations and real network experiments, we formulate a set of sufficient conditions for telehaptic QoS-compliance. These conditions provide guidelines for designers of telehaptic protocols, and for network administrators to configure their networks for guaranteeing QoS-compliant telehaptic communication.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
LISA: Increasing Internal Connectivity in DRAM for Fast Data Movement and Low Latency
Authors:
Kevin K. Chang,
Prashant J. Nair,
Saugata Ghose,
Donghyuk Lee,
Moinuddin K. Qureshi,
Onur Mutlu
Abstract:
This paper summarizes the idea of Low-Cost Interlinked Subarrays (LISA), which was published in HPCA 2016, and examines the work's significance and future potential. Contemporary systems perform bulk data movement movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow channel results in high latency a…
▽ More
This paper summarizes the idea of Low-Cost Interlinked Subarrays (LISA), which was published in HPCA 2016, and examines the work's significance and future potential. Contemporary systems perform bulk data movement movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow channel results in high latency and energy consumption. Prior work proposes to avoid these high costs by exploiting the existing wide internal DRAM bandwidth for bulk data movement, but the limited connectivity of wires within DRAM allows fast data movement within only a single DRAM subarray. Each subarray is only a few megabytes in size, greatly restricting the range over which fast bulk data movement can happen within DRAM.
Our HPCA 2016 paper proposes a new DRAM substrate, Low-Cost Inter-Linked Subarrays (LISA), whose goal is to enable fast and efficient data movement across a large range of memory at low cost. LISA adds low-cost connections between adjacent subarrays. By using these connections to interconnect the existing internal wires (bitlines) of adjacent subarrays, LISA enables wide-bandwidth data transfer across multiple subarrays with little (only 0.8%) DRAM area overhead. As a DRAM substrate, LISA is versatile, enabling a variety of new applications. We describe and evaluate three such applications in detail: (1) fast inter-subarray bulk data copy, (2) in-DRAM caching using a DRAM architecture whose rows have heterogeneous access latencies, and (3) accelerated bitline precharging by linking multiple precharge units together. Our extensive evaluations show that each of LISA's three applications significantly improves performance and memory energy efficiency on a variety of workloads and system configurations.
△ Less
Submitted 8 May, 2018;
originally announced May 2018.
-
Architectural Techniques to Enable Reliable and Scalable Memory Systems
Authors:
Prashant J. Nair
Abstract:
High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen…
▽ More
High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while kee** the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads.
△ Less
Submitted 13 April, 2017;
originally announced April 2017.
-
Congestion Control for Network-Aware Telehaptic Communication
Authors:
Vineet Gokhale,
Jayakrishnan Nair,
Subhasis Chaudhuri
Abstract:
Telehaptic applications involve delay-sensitive multimedia communication between remote locations with distinct Quality of Service (QoS) requirements for different media components. These QoS constraints pose a variety of challenges, especially when the communication occurs over a shared network, with unknown and time-varying cross-traffic. In this work, we propose a transport layer congestion con…
▽ More
Telehaptic applications involve delay-sensitive multimedia communication between remote locations with distinct Quality of Service (QoS) requirements for different media components. These QoS constraints pose a variety of challenges, especially when the communication occurs over a shared network, with unknown and time-varying cross-traffic. In this work, we propose a transport layer congestion control protocol for telehaptic applications operating over shared networks, termed as dynamic packetization module (DPM). DPM is a lossless, network-aware protocol which tunes the telehaptic packetization rate based on the level of congestion in the network. To monitor the network congestion, we devise a novel network feedback module, which communicates the end-to-end delays encountered by the telehaptic packets to the respective transmitters with negligible overhead. Via extensive simulations, we show that DPM meets the QoS requirements of telehaptic applications over a wide range of network cross-traffic conditions. We also report qualitative results of a real-time telepottery experiment with several human subjects, which reveal that DPM preserves the quality of telehaptic activity even under heavily congested network scenarios. Finally, we compare the performance of DPM with several previously proposed telehaptic communication protocols and demonstrate that DPM outperforms these protocols.
△ Less
Submitted 11 January, 2017; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Toward Refactoring of DMARF and GIPSY Case Studies -- A Team XI SOEN6471-S14 Project Report
Authors:
Zinia Das,
Mohammad Iftekharul Hoque,
Renuka Milkoori,
Jithin Nair,
Rohan Nayak,
Swamy Yogya Reddy,
Dhana Shree Sankini,
Arslan Zaffar
Abstract:
This report focuses on improving the internal structure of the Distributed Modular Audio recognition Framework (DMARF) and the General Intensional Programming System (GIPSY) case studies without affecting their original behavior. At first, the general principles, and the working of DMARF and GIPSY are understood by mainly stressing on the architecture of the systems by looking at their frameworks…
▽ More
This report focuses on improving the internal structure of the Distributed Modular Audio recognition Framework (DMARF) and the General Intensional Programming System (GIPSY) case studies without affecting their original behavior. At first, the general principles, and the working of DMARF and GIPSY are understood by mainly stressing on the architecture of the systems by looking at their frameworks and running them in the Eclipse environment. To improve the quality of the structure of the code, a furtherance of understanding of the architecture of the case studies and this is achieved by analyzing the design patterns present in the code. The improvement is done by the identification and removal of code smells in the code of the case studies. Code smells are identified by analyzing the source code by using Logiscope and JDeodorant. Some refactoring techniques are suggested, out of which the best suited ones are implemented to improve the code. Finally, Test cases are implemented to check if the behavior of the code has changed or not.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.