-
Towards a Cost vs. Quality Sweet Spot for Monitoring Networks
Authors:
Nofel Yaseen,
Behnaz Arzani,
Krishna Chintalapudi,
Vaishnavi Ranganathan,
Felipe Frujeri,
Kevin Hsieh,
Daniel Berger,
Vincent Liu,
Srikanth Kandula
Abstract:
Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled t…
▽ More
Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled time-series, we show that we can use signal processing techniques such as the Nyquist-Shannon theorem to avoid wasteful data collection. We show that large savings appear possible by analyzing tens of popular measurements from a production datacenter network. We also discuss the technical challenges that must be solved when applying these techniques in practice.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
MCAL: Minimum Cost Human-Machine Active Labeling
Authors:
Hang Qiu,
Krishna Chintalapudi,
Ramesh Govindan
Abstract:
Today, ground-truth generation uses data sets annotated by cloud-based annotation services. These services rely on human annotation, which can be prohibitively expensive. In this paper, we consider the problem of hybrid human-machine labeling, which trains a classifier to accurately auto-label part of the data set. However, training the classifier can be expensive too. We propose an iterative appr…
▽ More
Today, ground-truth generation uses data sets annotated by cloud-based annotation services. These services rely on human annotation, which can be prohibitively expensive. In this paper, we consider the problem of hybrid human-machine labeling, which trains a classifier to accurately auto-label part of the data set. However, training the classifier can be expensive too. We propose an iterative approach that minimizes total overall cost by, at each step, jointly determining which samples to label using humans and which to label using the trained classifier. We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and ImageNet. In some cases, our approach has 6x lower overall cost relative to human labeling the entire data set, and is always cheaper than the cheapest competing strategy.
△ Less
Submitted 26 February, 2023; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Satyam: Democratizing Groundtruth for Machine Vision
Authors:
Hang Qiu,
Krishna Chintalapudi,
Ramesh Govindan
Abstract:
The democratization of machine learning (ML) has led to ML-based machine vision systems for autonomous driving, traffic monitoring, and video surveillance. However, true democratization cannot be achieved without greatly simplifying the process of collecting groundtruth for training and testing these systems. This groundtruth collection is necessary to ensure good performance under varying conditi…
▽ More
The democratization of machine learning (ML) has led to ML-based machine vision systems for autonomous driving, traffic monitoring, and video surveillance. However, true democratization cannot be achieved without greatly simplifying the process of collecting groundtruth for training and testing these systems. This groundtruth collection is necessary to ensure good performance under varying conditions. In this paper, we present the design and evaluation of Satyam, a first-of-its-kind system that enables a layperson to launch groundtruth collection tasks for machine vision with minimal effort. Satyam leverages a crowdtasking platform, Amazon Mechanical Turk, and automates several challenging aspects of groundtruth collection: creating and launching of custom web-UI tasks for obtaining the desired groundtruth, controlling result quality in the face of spammers and untrained workers, adapting prices to match task complexity, filtering spammers and workers with poor performance, and processing worker payments. We validate Satyam using several popular benchmark vision datasets, and demonstrate that groundtruth obtained by Satyam is comparable to that obtained from trained experts and provides matching ML performance when used for training.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.