Clipper: A Low-Latency Online Prediction Serving System

Crankshaw, Daniel; Wang, Xin; Zhou, Giulio; Franklin, Michael J.; Gonzalez, Joseph E.; Stoica, Ion

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1612.03079 (cs)

[Submitted on 9 Dec 2016 (v1), last revised 28 Feb 2017 (this version, v2)]

Title:Clipper: A Low-Latency Online Prediction Serving System

Authors:Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica

View PDF

Abstract:Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.
In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:1612.03079 [cs.DC]
	(or arXiv:1612.03079v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1612.03079

Submission history

From: Daniel Crankshaw [view email]
[v1] Fri, 9 Dec 2016 16:29:16 UTC (577 KB)
[v2] Tue, 28 Feb 2017 17:21:33 UTC (6,482 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Clipper: A Low-Latency Online Prediction Serving System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Clipper: A Low-Latency Online Prediction Serving System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators