Skip to main content

Showing 1–1 of 1 results for author: Kunjal, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.04311  [pdf, other

    cs.AI cs.CL cs.DC cs.IR

    ALTO: An Efficient Network Orchestrator for Compound AI Systems

    Authors: Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, Matei Zaharia

    Abstract: We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.