-
HawkVision: Low-Latency Modeless Edge AI Serving
Authors:
ChonLam Lao,
Jiaqi Gao,
Ganesh Ananthanarayanan,
Aditya Akella,
Minlan Yu
Abstract:
The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional c…
▽ More
The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional challenges related to low power consumption, limited device memory, and volatile network environments.
To address these challenges, we propose HawkVision, which provides low-latency modeless serving of vision DNNs. HawkVision leverages a two-layer edge-DC architecture that employs confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. It also supports lossy inference under volatile network environments. Our experimental results show that HawkVision outperforms current serving systems by up to 1.6X in P99 latency for providing modeless service. Our FPGA prototype demonstrates similar performance at certain accuracy levels with up to a 3.34X reduction in power consumption.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Trajectory Tracking Control of a Flexible Spine Robot, With and Without a Reference Input
Authors:
Andrew P. Sabelhaus,
Shirley Hua**g Zhao,
Mallory C. Daly,
Ellande Tang,
Edward Zhu,
Abishek K. Akella,
Zeerek A. Ahmad,
Vytas SunSpiral,
Alice M. Agogino
Abstract:
The Underactuated Lightweight Tensegrity Robotic Assistive Spine (ULTRA Spine) project is an ongoing effort to develop a flexible, actuated backbone for quadruped robots. In this work, model-predictive control is used to track a trajectory in the robot's state space, in simulation. The state trajectory used here corresponds to a bending motion of the spine, with translations and rotations of the m…
▽ More
The Underactuated Lightweight Tensegrity Robotic Assistive Spine (ULTRA Spine) project is an ongoing effort to develop a flexible, actuated backbone for quadruped robots. In this work, model-predictive control is used to track a trajectory in the robot's state space, in simulation. The state trajectory used here corresponds to a bending motion of the spine, with translations and rotations of the moving vertebrae. Two different controllers are presented in this work: one that does not use a reference input but includes smoothing constrants, and a second one that uses a reference input without smoothing. For the smoothing controller, without reference inputs, the error converges to zero, while the simpler-to-tune controller with an input reference shows small errors but not complete convergence. It is expected that this controller will converge as it is improved further.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters
Authors:
Robert Grandl,
Srikanth Kandula,
Sriram Rao,
Aditya Akella,
Janardhan Kulkarni
Abstract:
We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running ta…
▽ More
We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.