-
Structured Reinforcement Learning for Media Streaming at the Wireless Edge
Authors:
Archana Bura,
Sarat Chandra Bobbili,
Shreyas Rameshkumar,
Desik Rengarajan,
Dileep Kalathil,
Srinivas Shakkottai
Abstract:
Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to…
▽ More
Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$μ$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase quality of experience (QOE) by over 30\%.
△ Less
Submitted 16 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
The site linkage spectrum of data arrays
Authors:
Christopher Barrett,
Andrei Bura,
Fenix Huang,
Christian Reidys
Abstract:
A new perspective is introduced regarding the analysis of Multiple Sequence Alignments (MSA), representing aligned data defined over a finite alphabet of symbols. The framework is designed to produce a block decomposition of an MSA, where each block is comprised of sequences exhibiting a certain site-coherence. The key component of this framework is an information theoretical potential defined on…
▽ More
A new perspective is introduced regarding the analysis of Multiple Sequence Alignments (MSA), representing aligned data defined over a finite alphabet of symbols. The framework is designed to produce a block decomposition of an MSA, where each block is comprised of sequences exhibiting a certain site-coherence. The key component of this framework is an information theoretical potential defined on pairs of sites (links) within the MSA. This potential quantifies the expected drop in variation of information between the two constituent sites, where the expectation is taken with respect to all possible sub-alignments, obtained by removing a finite, fixed collection of rows. It is proved that the potential is zero for linked sites representing columns, whose symbols are in bijective correspondence and it is strictly positive, otherwise. It is furthermore shown that the potential assumes its unique minimum for links at which each symbol pair appears with the same multiplicity. Finally, an application is presented regarding anomaly detection in an MSA, composed of inverse fold solutions of a fixed tRNA secondary structure, where the anomalies are represented by inverse fold solutions of a different RNA structure.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
A computational framework for weighted simplicial homology
Authors:
Andrei C. Bura,
Neelav S. Dutta,
Thomas J. X. Li,
Christian M. Reidys
Abstract:
We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in t…
▽ More
We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in the ring $R$. Using the latter, a bijection is established between $n+1$ and $n$ dimensional simplices whose weight ratios provide the exponents of the $π$-monomials that generate each torsion summand in the structure theorem of the weighted homology modules over $R$. We present algorithms that subsume the torsion computation by reducing it to normalization over the residue field of $R$, and describe a Python package we implemented that takes advantage of this reduction and performs the computation efficiently.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning
Authors:
Archana Bura,
Aria HasanzadeZonuzy,
Dileep Kalathil,
Srinivas Shakkottai,
Jean-Francois Chamberland
Abstract:
Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirement…
▽ More
Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirements as constraints on the expected cumulative costs that must be satisfied during all episodes of learning. We propose a model-based safe RL algorithm that we call Doubly Optimistic and Pessimistic Exploration (DOPE), and show that it achieves an objective regret $\tilde{O}(|\mathcal{S}|\sqrt{|\mathcal{A}| K})$ without violating the safety constraints during learning, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, and $K$ is the number of learning episodes. Our key idea is to combine a reward bonus for exploration (optimism) with a conservative constraint (pessimism), in addition to the standard optimistic model-based exploration. DOPE is not only able to improve the objective regret bound, but also shows a significant empirical performance improvement as compared to earlier optimism-pessimism approaches.
△ Less
Submitted 17 October, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
Authors:
Aria HasanzadeZonuzy,
Archana Bura,
Dileep Kalathil,
Srinivas Shakkottai
Abstract:
Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goa…
▽ More
Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy -- both objective maximization and constraint satisfaction -- in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.
△ Less
Submitted 1 March, 2021; v1 submitted 1 August, 2020;
originally announced August 2020.
-
Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms
Authors:
Archana Bura,
Desik Rengarajan,
Dileep Kalathil,
Srinivas Shakkottai,
Jean-Francois Chamberland-Tremblay
Abstract:
Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between t…
▽ More
Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between the hits achieved by a candidate caching algorithm with respect to a genie-aided scheme that places the most popular items in the cache. We first consider the Full Observation regime wherein all requests are seen by the cache. We show that the Least Frequently Used (LFU) algorithm is able to achieve order optimal regret, which is matched by an efficient counting algorithm design that we call LFU-Lite. We then consider the Partial Observation regime wherein only requests for items currently cached are seen by the cache, making it similar to an online learning problem related to the multi-armed bandit problem. We show how approaching this "caching bandit" using traditional approaches yields either high complexity or regret, but a simple algorithm design that exploits the structure of the distribution can ensure order optimal regret. We conclude by illustrating our insights using numerical simulations.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge
Authors:
Rajarshi Bhattacharyya,
Archana Bura,
Desik Rengarajan,
Mason Rumuly,
Bainan Xia,
Srinivas Shakkottai,
Dileep Kalathil,
Ricky K. P. Mok,
Amogh Dhamdhere
Abstract:
The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adopt…
▽ More
The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adoption, and this in turn implies that agile control policies can be now instantiated on access networks. The goal of this work is to design, develop and demonstrate QFlow, a learning approach to create a value chain from the application on one side, to algorithms operating over reconfigurable infrastructure on the other, so that applications are able to obtain necessary resources for optimal performance. Using YouTube video streaming as an example, we illustrate how QFlow is able to adaptively provide such resources and attain a high QoE for all clients at a wireless access point.
△ Less
Submitted 13 May, 2020; v1 submitted 3 January, 2019;
originally announced January 2019.
-
D-chain tomography of networks: a new structure spectrum and an application to the SIR process
Authors:
Ricky X. F. Chen,
Christian M. Reidys,
Andrei C. Bura
Abstract:
The analysis of the dynamics on complex networks is closely connected to structural features of the networks. Features like, for instance, graph-cores and node degrees have been studied ubiquitously. Here we introduce the D-spectrum of a network, a novel new framework that is based on a collection of nested chains of subgraphs within the network. Graph-cores and node degrees are merely from two pa…
▽ More
The analysis of the dynamics on complex networks is closely connected to structural features of the networks. Features like, for instance, graph-cores and node degrees have been studied ubiquitously. Here we introduce the D-spectrum of a network, a novel new framework that is based on a collection of nested chains of subgraphs within the network. Graph-cores and node degrees are merely from two particular such chains of the D-spectrum. Each chain gives rise to a ranking of nodes and, for a fixed node, the collection of these ranks provides us with the D-spectrum of the node. Besides a node deletion algorithm, we discover a connection between the D-spectrum of a network and some fixed points of certain graph dynamical systems (MC systems) on the network. Using the D-spectrum we identify nodes of similar spreading power in the susceptible-infectious-recovered (SIR) model on a collection of real world networks as a quick application. We then discuss our results and conclude that D-spectra represent a meaningful augmentation of graph-cores and node degrees.
△ Less
Submitted 27 April, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
Throughput of TCP over Cognitive Radio Channels
Authors:
Sudheer Poojary,
Akash Agrawal,
Bhoomika Gupta,
Archana Bura,
Vinod Sharma
Abstract:
In this paper, we study the performance of a TCP connection over cognitive radio networks. In these networks, the network may not always be available for transmission. Also, the packets can be lost due to wireless channel impairments. We evaluate the throughput and packet retransmission timeout probability of a secondary TCP connection over an ON/OFF channel. We first assume that the ON and OFF ti…
▽ More
In this paper, we study the performance of a TCP connection over cognitive radio networks. In these networks, the network may not always be available for transmission. Also, the packets can be lost due to wireless channel impairments. We evaluate the throughput and packet retransmission timeout probability of a secondary TCP connection over an ON/OFF channel. We first assume that the ON and OFF time durations are exponential and later extend it to more general distributions. We then consider multiple TCP connections over the ON/OFF channel. We validate our theoretical models and the approximations made therein via ns2 simulations.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.