-
Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices
Authors:
Adarsh Prasad Behera,
Roberto Morabito,
Joerg Widmer,
Jaya Prakash Champati
Abstract:
The Hierarchical Inference (HI) paradigm employs a tiered processing: the inference from simple data samples are accepted at the end device, while complex data samples are offloaded to the central servers. HI has recently emerged as an effective method for balancing inference accuracy, data processing, transmission throughput, and offloading cost. This approach proves particularly efficient in sce…
▽ More
The Hierarchical Inference (HI) paradigm employs a tiered processing: the inference from simple data samples are accepted at the end device, while complex data samples are offloaded to the central servers. HI has recently emerged as an effective method for balancing inference accuracy, data processing, transmission throughput, and offloading cost. This approach proves particularly efficient in scenarios involving resource-constrained edge devices, such as IoT sensors and micro controller units (MCUs), tasked with executing tinyML inference. Notably, it outperforms strategies such as local inference execution, inference offloading to edge servers or cloud facilities, and split inference (i.e., inference execution distributed between two endpoints). Building upon the HI paradigm, this work explores different techniques aimed at further optimizing inference task execution. We propose and discuss three distinct HI approaches and evaluate their utility for image classification.
△ Less
Submitted 8 April, 2024;
originally announced June 2024.
-
The Case for Hierarchical Deep Learning Inference at the Network Edge
Authors:
Ghina Al-Atat,
Andrea Fresa,
Adarsh Prasad Behera,
Vishnu Narayanan Moothedath,
James Gross,
Jaya Prakash Champati
Abstract:
Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, there is a significant research effort in develo** tinyML models - Deep Learning (DL) models with reduced computation and memory storage requirements - that can be embedded on these devices…
▽ More
Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, there is a significant research effort in develo** tinyML models - Deep Learning (DL) models with reduced computation and memory storage requirements - that can be embedded on these devices. However, tinyML models have lower inference accuracy. On a different front, DNN partitioning and inference offloading techniques were studied for distributed DL inference between EDs and Edge Servers (ESs). In this paper, we explore Hierarchical Inference (HI), a novel approach proposed by Vishnu et al. 2023, arXiv:2304.00891v1 , for performing distributed DL inference at the edge. Under HI, for each data sample, an ED first uses a local algorithm (e.g., a tinyML model) for inference. Depending on the application, if the inference provided by the local algorithm is incorrect or further assistance is required from large DL models on edge or cloud, only then the ED offloads the data sample. At the outset, HI seems infeasible as the ED, in general, cannot know if the local inference is sufficient or not. Nevertheless, we present the feasibility of implementing HI for machine fault detection and image classification applications. We demonstrate its benefits using quantitative analysis and argue that using HI will result in low latency, bandwidth savings, and energy savings in edge AI systems.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge
Authors:
Vishnu Narayanan Moothedath,
Jaya Prakash Champati,
James Gross
Abstract:
We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, bu…
▽ More
We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the maximum softmax value output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We propose two different algorithms and prove sublinear regret bounds for them without any assumption on the smoothness of the loss function. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.
△ Less
Submitted 15 February, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Realistic Modeling of Human Timings for Wearable Cognitive Assistance
Authors:
Manuel O. J. Olguín Muñoz,
Vishnu N. Moothedath,
Jaya Prakash Champati,
Roberta Klatzky,
Mahadev Satyanarayanan,
James Gross
Abstract:
Wearable Cognitive Assistance (WCA) applications present a challenge to benchmark and characterize due to their human-in-the-loop nature. Employing user testing to optimize system parameters is generally not feasible, given the scope of the problem and the number of observations needed to detect small but important effects in controlled experiments. Considering the intended mass-scale deployment o…
▽ More
Wearable Cognitive Assistance (WCA) applications present a challenge to benchmark and characterize due to their human-in-the-loop nature. Employing user testing to optimize system parameters is generally not feasible, given the scope of the problem and the number of observations needed to detect small but important effects in controlled experiments. Considering the intended mass-scale deployment of WCA applications in the future, there exists a need for tools enabling human-independent benchmarking.
We present in this paper the first model for the complete end-to-end emulation of humans in WCA. We build this model through statistical analysis of data collected from previous work in this field, and demonstrate its utility by studying application task durations. Compared to first-order approximations, our model shows a ~36% larger gap between step execution times at high system impairment versus low. We further introduce a novel framework for stochastic optimization of resource consumption-responsiveness tradeoffs in WCA, and show that by combining this framework with our realistic model of human behavior, significant reductions of up to 50% in number processed frame samples and 20% in energy consumption can be achieved with respect to the state-of-the-art.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge
Authors:
SM Zobaed,
Ali Mokhtari,
Jaya Prakash Champati,
Mathieu Kourouma,
Mohsen Amini Salehi
Abstract:
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that c…
▽ More
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Offloading Algorithms for Maximizing Inference Accuracy on Edge Device Under a Time Constraint
Authors:
Andrea Fresa,
Jaya Prakash Champati
Abstract:
With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference, we study the problem of offloading inference jobs by considering the following novel aspects: 1) in contrast to a typical comput…
▽ More
With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference, we study the problem of offloading inference jobs by considering the following novel aspects: 1) in contrast to a typical computational job, the processing time of an inference job depends on the size of the ML model, and 2) recently proposed Deep Neural Networks (DNNs) for resource-constrained devices provide the choice of scaling the model size. We formulate an assignment problem with the aim of maximizing the total inference accuracy of n data samples available at the ED, subject to a time constraint T on the makespan. We propose an approximation algorithm AMR2, and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from optimal total accuracy. As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNet, and is connected to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification application.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Energy Efficient Sampling Policies for Edge Computing Feedback Systems
Authors:
Vishnu Narayanan Moothedath,
Jaya Prakash Champati,
James Gross
Abstract:
We study the problem of finding efficient sampling policies in an edge-based feedback system, where sensor samples are offloaded to a back-end server that processes them and generates feedback to a user. Sampling the system at maximum frequency results in the detection of events of interest with minimum delay but incurs higher energy costs due to the communication and processing of redundant sampl…
▽ More
We study the problem of finding efficient sampling policies in an edge-based feedback system, where sensor samples are offloaded to a back-end server that processes them and generates feedback to a user. Sampling the system at maximum frequency results in the detection of events of interest with minimum delay but incurs higher energy costs due to the communication and processing of redundant samples. On the other hand, lower sampling frequency results in higher delay in detecting the event, thus increasing the idle energy usage and degrading the quality of experience. We quantify this trade-off as a weighted function between the number of samples and the sampling interval. We solve the minimisation problem for exponential and Rayleigh distributions, for the random time to the event of interest. We prove the convexity of the objective functions by using novel techniques, which can be of independent interest elsewhere. We argue that adding an initial offset to the periodic sampling can further reduce the energy consumption and jointly compute the optimum offset and sampling interval. We apply our framework to two practically relevant applications and show energy savings of up to 36% when compared to an existing periodic scheme.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Estimating Active Cases of COVID-19
Authors:
Javier Álvarez,
Carlos Baquero,
Elisa Cabana,
Jaya Prakash Champati,
Antonio Fernández Anta,
Davide Frey,
Augusto García-Agúndez,
Chryssis Georgiou,
Mathieu Goessens,
Harold Hernández,
Rosa Lillo,
Raquel Menezes,
Raúl Moreno,
Nicolas Nicolaou,
Oluwasegun Ojo,
Antonio Ortega,
Jesús Rufino,
Efstathios Stavrakis,
Govind Jeevan,
Christin Glorioso
Abstract:
Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that t…
▽ More
Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that the latter is a viable option in countries with reduced testing capacity or suboptimal infrastructures.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Scheduling of Wireless Edge Networks for Feedback-Based Interactive Applications
Authors:
Samuele Zoppi,
Jaya Prakash Champati,
James Gross,
Wolfgang Kellerer
Abstract:
Interactive applications with automated feedback will largely influence the design of future networked infrastructures. In such applications, status information about an environment of interest is captured and forwarded to a compute node, which analyzes the information and generates a feedback message. Timely processing and forwarding must ensure the feedback information to be still applicable; th…
▽ More
Interactive applications with automated feedback will largely influence the design of future networked infrastructures. In such applications, status information about an environment of interest is captured and forwarded to a compute node, which analyzes the information and generates a feedback message. Timely processing and forwarding must ensure the feedback information to be still applicable; thus, the quality-of-service parameter for such applications is the end-to-end latency over the entire loop. By modelling the communication of a feedback loop as a two-hop network, we address the problem of allocating network resources in order to minimize the delay violation probability (DVP), i.e. the probability of the end-to-end latency exceeding a target value. We investigate the influence of the network queue states along the network path on the performance of semi-static and dynamic scheduling policies. The former determine the schedule prior to the transmission of the packet, while the latter benefit from feedback on the queue states as time evolves and reallocate time slots depending on the queue's evolution. The performance of the proposed policies is evaluated for variations in several system parameters and comparison baselines. Results show that the proposed semi-static policy achieves close-to-optimal DVP and the dynamic policy outperforms the state-of-the-art algorithms.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Detecting State Transitions of a Markov Source: Sampling Frequency and Age Trade-off
Authors:
Jaya Prakash Champati,
Mikael Skoglund,
James Gross
Abstract:
We consider a finite-state Discrete-Time Markov Chain (DTMC) source that can be sampled for detecting the events when the DTMC transits to a new state. Our goal is to study the trade-off between sampling frequency and staleness in detecting the events. We argue that, for the problem at hand, using Age of Information (AoI) for quantifying the staleness of a sample is conservative and therefore, int…
▽ More
We consider a finite-state Discrete-Time Markov Chain (DTMC) source that can be sampled for detecting the events when the DTMC transits to a new state. Our goal is to study the trade-off between sampling frequency and staleness in detecting the events. We argue that, for the problem at hand, using Age of Information (AoI) for quantifying the staleness of a sample is conservative and therefore, introduce \textit{age penalty} for this purpose. We study two optimization problems: minimize average age penalty subject to an average sampling frequency constraint, and minimize average sampling frequency subject to an average age penalty constraint; both are Constrained Markov Decision Problems. We solve them using linear programming approach and compute Markov policies that are optimal among all causal policies. Our numerical results demonstrate that the computed Markov policies not only outperform optimal periodic sampling policies, but also achieve sampling frequencies close to or lower than that of an optimal clairvoyant (non-causal) sampling policy, if a small age penalty is allowed.
△ Less
Submitted 5 May, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
On the Minimum Achievable Age of Information for General Service-Time Distributions
Authors:
Jaya Prakash Champati,
Ramana R. Avula,
Tobias J. Oechtering,
James Gross
Abstract:
There is a growing interest in analysing the freshness of data in networked systems. Age of Information (AoI) has emerged as a popular metric to quantify this freshness at a given destination. There has been a significant research effort in optimizing this metric in communication and networking systems under different settings. In contrast to previous works, we are interested in a fundamental ques…
▽ More
There is a growing interest in analysing the freshness of data in networked systems. Age of Information (AoI) has emerged as a popular metric to quantify this freshness at a given destination. There has been a significant research effort in optimizing this metric in communication and networking systems under different settings. In contrast to previous works, we are interested in a fundamental question, what is the minimum achievable AoI in any single-server-single-source queuing system for a given service-time distribution? To address this question, we study a problem of optimizing AoI under service preemptions. Our main result is on the characterization of the minimum achievable average peak AoI (PAoI). We obtain this result by showing that a fixed-threshold policy is optimal in the set of all randomized-threshold causal policies. We use the characterization to provide necessary and sufficient conditions for the service-time distributions under which preemptions are beneficial.
△ Less
Submitted 19 January, 2020;
originally announced January 2020.
-
Statistical Guarantee Optimization for AoI in Single-Hop and Two-Hop Systems with Periodic Arrivals
Authors:
Jaya Prakash Champati,
Hussein Al-Zubaidy,
James Gross
Abstract:
Age of Information (AoI) has proven to be a useful metric in networked systems where timely information updates are of importance. In the literature, minimizing "average age" has received considerable attention. However, various applications pose stricter age requirements on the updates which demand knowledge of the AoI distribution. Furthermore, the analysis of AoI distribution in a multi-hop set…
▽ More
Age of Information (AoI) has proven to be a useful metric in networked systems where timely information updates are of importance. In the literature, minimizing "average age" has received considerable attention. However, various applications pose stricter age requirements on the updates which demand knowledge of the AoI distribution. Furthermore, the analysis of AoI distribution in a multi-hop setting, which is important for the study of Wireless Networked Control Systems (WNCS), has not been addressed before. Toward this end, we study the distribution of AoI in a WNCS with two hops and devise a problem of minimizing the tail of the AoI distribution with respect to the frequency of generating information updates, i.e., the sampling rate of monitoring a process, under first-come-first-serve (FCFS) queuing discipline. We argue that computing an exact expression for the AoI distribution may not always be feasible; therefore, we opt for computing upper bounds on the tail of the AoI distribution. Using these upper bounds we formulate Upper Bound Minimization Problems (UBMP), namely, Chernoff-UBMP and alpha-relaxed Upper BoundMinimization Problem (alpha-UBMP), where alpha > 1 is an approximation factor, and solve them to obtain "good" heuristic rate solutions. We demonstrate the efficacy of our approach by solving the proposed UBMPs for three service distributions: geometric, exponential, and Erlang. Simulation results show that the rate solutions obtained are near-optimal for minimizing the tail of the AoI distribution for the considered distributions.
△ Less
Submitted 4 October, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
On the Distribution of AoI for the GI/GI/1/1 and GI/GI/1/2* Systems: Exact Expressions and Bounds
Authors:
Jaya Prakash Champati,
Hussein Al-Zubaidy,
James Gross
Abstract:
Since Age of Information (AoI) has been proposed as a metric that quantifies the freshness of information updates in a communication system, there has been a constant effort in understanding and optimizing different statistics of the AoI process for classical queueing systems. In addition to classical queuing systems, more recently, systems with no queue or a unit capacity queue storing the latest…
▽ More
Since Age of Information (AoI) has been proposed as a metric that quantifies the freshness of information updates in a communication system, there has been a constant effort in understanding and optimizing different statistics of the AoI process for classical queueing systems. In addition to classical queuing systems, more recently, systems with no queue or a unit capacity queue storing the latest packet have been gaining importance as storing and transmitting older packets do not reduce AoI at the receiver. Following this line of research, we study the distribution of AoI for the GI/GI/1/1 and GI/GI/1/2* systems, under non-preemptive scheduling. For any single-source-single-server queueing system, we derive, using sample path analysis, a fundamental result that characterizes the AoI violation probability, and use it to obtain closed-form expressions for D/GI/1/1, M/GI/1/1 as well as systems that use zero-wait policy. Further, when exact results are not tractable, we present a simple methodology for obtaining upper bounds for the violation probability for both GI/GI/1/1 and GI/GI/1/2* systems. An interesting feature of the proposed upper bounds is that, if the departure rate is given, they overestimate the violation probability by at most a value that decreases with the arrival rate. Thus, given the departure rate and for a fixed average service, the bounds are tighter at higher utilization.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
Performance Characterization Using AoI in a Single-loop Networked Control System
Authors:
Jaya Prakash Champati,
Mohammad H. Mamduhi,
Karl H. Johansson,
James Gross
Abstract:
The joint design of control and communication scheduling in a Networked Control System (NCS) is known to be a hard problem. Several research works have successfully designed optimal sampling and/or control strategies under simplified communication models, where transmission delays/times are negligible or fixed. However, considering sophisticated communication models, with random transmission times…
▽ More
The joint design of control and communication scheduling in a Networked Control System (NCS) is known to be a hard problem. Several research works have successfully designed optimal sampling and/or control strategies under simplified communication models, where transmission delays/times are negligible or fixed. However, considering sophisticated communication models, with random transmission times, result in highly coupled and difficult-to-solve optimal design problems due to the parameter inter-dependencies between estimation/control and communication layers. To tackle this problem, in this work, we investigate the applicability of Age-of-Information (AoI) for solving control/estimation problems in an NCS under i.i.d. transmission times. Our motivation for this investigation stems from the following facts: 1) recent results indicate that AoI can be tackled under relatively sophisticated communication models, and 2) a lower AoI in an NCS may result in a lower estimation/control cost. We study a joint optimization of sampling and scheduling for a single-loop stochastic LTI networked system with the objective of minimizing the time-average squared norm of the estimation error. We first show that under mild assumptions on information structure the optimal control policy can be designed independently from the sampling and scheduling policies. We then derive a key result that minimizing the estimation error is equivalent to minimizing a function of AoI when the sampling decisions are independent of the state of the LTI system. Noting that minimizing the function of AoI is a stochastic combinatorial optimization problem and is hard to solve, we resort to heuristic algorithms obtained by extending existing algorithms in the AoI literature. We also identify a class of LTI system dynamics for which minimizing the estimation error is equivalent to minimizing the expected AoI.
△ Less
Submitted 5 July, 2019; v1 submitted 20 January, 2019;
originally announced January 2019.
-
Transient Delay Bounds for Multi-Hop Wireless Networks
Authors:
Jaya Prakash Champati,
Hussein Al-Zubaidy,
James Gross
Abstract:
In this article, we investigate the transient behavior of a sequence of packets/bits traversing a multi-hop wireless network. Our work is motivated by novel applications from the domain of process automation, Machine-Type Communication (MTC) and cyber-physical systems, where short messages are communicated and statistical guarantees need to be provided on a per-message level. In order to optimize…
▽ More
In this article, we investigate the transient behavior of a sequence of packets/bits traversing a multi-hop wireless network. Our work is motivated by novel applications from the domain of process automation, Machine-Type Communication (MTC) and cyber-physical systems, where short messages are communicated and statistical guarantees need to be provided on a per-message level. In order to optimize such a network, apart from understanding the stationary system dynamics, an understanding of the short-term dynamics (i.e., transient behavior) is also required. To this end, we derive novel Wireless Transient Bounds (WTB) for end-to-end delay and backlog in a multi-hop wireless network using stochastic network calculus approach. WTB depends on the initial backlog at each node as well as the instantaneous channel states. We numerically compare WTB with State-Of-The-Art Transient bounds (SOTAT), that can be obtained by adapting existing stationary bounds, as well as simulation of the network. While SOTAT and stationary bounds are not able to capture the short-term system dynamics well, WTB provides relatively tight upper bound and has a decay rate that closely matches the simulation. This is achieved by WTB only with a slight increase in the computational complexity, by a factor of O(T + N), where T is the duration of the arriving sequence and N is the number of hops in the network. We believe that the presented analysis and the bounds can be used as base for future work on transient network optimization, e.g., in massive MTC, critical MTC, edge computing and autonomous vehicle.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.