-
Percentile Criterion Optimization in Offline Reinforcement Learning
Authors:
Elita A. Lobo,
Cyrus Cousins,
Yair Zick,
Marek Petrik
Abstract:
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the \emph{percentile criterion}. The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion i…
▽ More
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the \emph{percentile criterion}. The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion is non-convex, constructing ambiguity sets is often challenging. Existing work uses \emph{Bayesian credible regions} as ambiguity sets, but they are often unnecessarily large and result in learning overly conservative policies. To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any ambiguity sets. Our theoretical and empirical results show that our algorithm implicitly constructs much smaller ambiguity sets and learns less conservative robust policies.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
Authors:
Elita Lobo,
Harvineet Singh,
Marek Petrik,
Cynthia Rudin,
Himabindu Lakkaraju
Abstract:
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to m…
▽ More
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to marginal adversarial perturbations to the data. We design a generic data poisoning attack framework leveraging influence functions from robust statistics to carefully construct perturbations that maximize error in the policy value estimates. We carry out extensive experimentation with multiple healthcare and control datasets. Our results demonstrate that many existing OPE methods are highly prone to generating value estimates with large errors when subject to data poisoning attacks, even for small adversarial perturbations. These findings question the reliability of policy values derived using OPE methods and motivate the need for develo** OPE methods that are statistically robust to train-time data poisoning attacks.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Device-independent quantum key distribution based on routed Bell tests
Authors:
Tristan Le Roy-Deloison,
Edwin Peter Lobo,
Jef Pauwels,
Stefano Pironio
Abstract:
Photon losses are the main obstacle to fully photonic implementations of device-independent quantum key distribution (DIQKD). Motivated by recent work showing that routed Bell scenarios offer increased robustness to detection inefficiencies for the certification of long-range quantum correlations, we investigate DIQKD protocols based on a routed setup. In these protocols, in some of the test round…
▽ More
Photon losses are the main obstacle to fully photonic implementations of device-independent quantum key distribution (DIQKD). Motivated by recent work showing that routed Bell scenarios offer increased robustness to detection inefficiencies for the certification of long-range quantum correlations, we investigate DIQKD protocols based on a routed setup. In these protocols, in some of the test rounds, photons from the source are routed by an actively controlled switch to a nearby test device instead of the distant one. We show how to analyze the security of these protocols and compute lower bounds on the key rates using non-commutative polynomial optimization and the Brown-Fawzi-Fazwi method. We determine lower bounds on the asymptotic key rates of several simple two-qubit routed DIQKD protocols based on CHSH or BB84 correlations and compare their performance to standard protocols. We find that in an ideal case routed DIQKD protocols can significantly improve detection efficiency requirements, by up to $\sim 30\%$, compared to their non-routed counterparts. Notably, the routed BB84 protocol achieves a positive key rate with a detection efficiency as low as $50\%$ for the distant device, the minimal threshold for any QKD protocol featuring two untrusted measurements. However, the advantages we find are highly sensitive to noise and losses affecting the short-range correlations involving the additional test device.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Quantum Advantage: A Single Qubit's Experimental Edge in Classical Data Storage
Authors:
Chen Ding,
Edwin Peter Lobo,
Mir Alimuddin,
Xiao-Yue Xu,
Shuo Zhang,
Manik Banik,
Wan-Su Bao,
He-Liang Huang
Abstract:
We implement an experiment on a photonic quantum processor establishing efficacy of an elementary quantum system in classical information storage. The advantage is established by considering a class of simple bipartite games played with the communication resource qubit and classical bit (c-bit), respectively. Conventional wisdom, as articulated by the no-go theorems of Holevo and Frenkel-Weiner, s…
▽ More
We implement an experiment on a photonic quantum processor establishing efficacy of an elementary quantum system in classical information storage. The advantage is established by considering a class of simple bipartite games played with the communication resource qubit and classical bit (c-bit), respectively. Conventional wisdom, as articulated by the no-go theorems of Holevo and Frenkel-Weiner, suggests that such a quantum advantage is unattainable in scenarios wherein sender and receiver possess shared randomness or classical correlation between them. Notably, the advantage we report is demonstrated in a scenario where participating players lack any form of shared randomness. Our experiment involves the development of a variational triangular polarimeter, enabling the realization of positive operator value measurements crucial for establishing the targeted quantum advantage. In addition to demonstrating a robust communication advantage of a single qubit our experiment also opens avenues for immediate applications in near-term quantum technologies. Furthermore, it constitutes a semi-device-independent non-classicality certification scheme for the quantum encoding-decoding apparatus, underscoring the broader implications of our work beyond its immediate technological applications.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Measurement incompatibility at remote entangled parties is insufficient for Bell nonlocality in two-input and two-output setting
Authors:
Priya Ghosh,
Chirag Srivastava,
Swati Choudhary,
Edwin Peter Lobo,
Ujjwal Sen
Abstract:
Two important ingredients necessary for obtaining Bell nonlocal correlations between two spatially separated parties are an entangled state shared between them and an incompatible set of measurements employed by each of them. We focus on the relation of Bell nonlocality with incompatibility of the set of measurements employed by both the parties, in the two-input and two-output scenario. We first…
▽ More
Two important ingredients necessary for obtaining Bell nonlocal correlations between two spatially separated parties are an entangled state shared between them and an incompatible set of measurements employed by each of them. We focus on the relation of Bell nonlocality with incompatibility of the set of measurements employed by both the parties, in the two-input and two-output scenario. We first observe that Bell nonlocality can always be established in case both parties employ any set of incompatible projective measurements. On the other hand, going beyond projective measurements, we present a class of incompatible positive operator-valued measures, employed by both the observers, which can never activate Bell nonlocality. Next, we optimize the Clauser-Horne-Shimony-Holt Bell expression in the case where the parties share a fixed amount of pure two-qubit entanglement, with any incompatible set of projective measurements. This helps to find the minimum entanglement and degree of incompatibility of measurements that the parties should employ, in order to achieve Bell nonlocal correlations.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Certifying long-range quantum correlations through routed Bell tests
Authors:
Edwin Peter Lobo,
Jef Pauwels,
Stefano Pironio
Abstract:
Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we…
▽ More
Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we call 'routed Bell experiments', Bob can route his quantum particle along two possible paths and measure it at two distinct locations - one near and another far from the source. The idea is that a Bell violation in the short-path should weaken the conditions required to detect nonlocal correlations in the long-path. Indeed, CVP showed that there are quantum correlations in routed Bell experiments such that the outcomes of the remote device cannot be classically predetermined, even when its detection efficiency is arbitrarily low. In this paper, we show that the correlations considered by CVP, though they cannot be classically predetermined, do not require the transmission of quantum systems to the remote device. This leads us to define the concept of 'short-range' and 'long-range' quantum correlations in routed Bell experiments. We show that these correlations can be characterized through standard semidefinite programming hierarchies for non-commutative polynomial optimization. We then explore the conditions under which short-range quantum correlations can be ruled out. We point out that there exist fundamental lower-bounds on the critical detection efficiency of the distant device, implying that routed Bell experiments cannot demonstrate long-range quantum nonlocality at arbitrarily large distances. However, we do find that routed Bell experiments allow for reducing the detection efficiency threshold. The improvements, though, are significantly smaller than those suggested by CVP's analysis.
△ Less
Submitted 24 April, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Axiomatic Aggregations of Abductive Explanations
Authors:
Gagan Biradar,
Yacine Izza,
Elita Lobo,
Vignesh Viswanathan,
Yair Zick
Abstract:
The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue -- there c…
▽ More
The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue -- there can be several valid abductive explanations for the same data point. In such cases, providing a single abductive explanation can be insufficient; on the other hand, providing all valid abductive explanations can be incomprehensible due to their size. In this work, we solve this issue by aggregating the many possible abductive explanations into feature importance scores. We propose three aggregation methods: two based on power indices from cooperative game theory and a third based on a well-known measure of causal strength. We characterize these three methods axiomatically, showing that each of them uniquely satisfies a set of desirable properties. We also evaluate them on multiple datasets and show that these explanations are robust to the attacks that fool SHAP and LIME.
△ Less
Submitted 12 October, 2023; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Overcoming Traditional No-Go Theorems: Quantum Advantage in Multiple Access Channels
Authors:
Ananya Chakraborty,
Sahil Gopalkrishna Naik,
Edwin Peter Lobo,
Ram Krishna Patra,
Samrat Sen,
Mir Alimuddin,
Amit Mukherjee,
Manik Banik
Abstract:
Extension of point-to-point communication model to the realm of multi-node configurations finds a plethora of applications in internet and telecommunication networks. Here, we establish a novel advantage of quantum communication in a commonly encountered network configuration known as the Multiple Access Channel (MAC). A MAC consists of multiple distant senders aiming to send their respective mess…
▽ More
Extension of point-to-point communication model to the realm of multi-node configurations finds a plethora of applications in internet and telecommunication networks. Here, we establish a novel advantage of quantum communication in a commonly encountered network configuration known as the Multiple Access Channel (MAC). A MAC consists of multiple distant senders aiming to send their respective messages to a common receiver. Unlike the quantum superdense coding protocol, the advantage reported here is realized without invoking entanglement between the senders and the receiver. Notably, such an advantage is unattainable in traditional point-to-point communication involving one sender and one receiver, where the limitations imposed by the Holevo and Frankel Weiner no-go theorems come into play. Within the MAC setup, this distinctive advantage materializes through the receiver's unique ability to simultaneously decode the quantum systems received from multiple senders. Intriguingly, some of our MAC designs draw inspiration from various other constructs in quantum foundations, such as the Pusey-Barrett-Rudolph theorem and the concept of `nonlocality without entanglement', originally explored for entirely different purposes. Beyond its immediate applications in network communication, the presented quantum advantage hints at a profound connection with the concept of `quantum nonlocality without inputs' and holds the potential for semi-device-independent certification of entangled measurements.
△ Less
Submitted 29 May, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Matching Table Metadata with Business Glossaries Using Large Language Models
Authors:
Elita Lobo,
Oktie Hassanzadeh,
Nhan Pham,
Nandana Mihindukulasooriya,
Dharmashankar Subramanian,
Horst Samulowitz
Abstract:
Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents and, therefore, limit the application of classic retrieval and analysis solutions. As a result, there is a need for solutions that can effectively utilize the av…
▽ More
Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents and, therefore, limit the application of classic retrieval and analysis solutions. As a result, there is a need for solutions that can effectively utilize the available metadata. In this paper, we study the problem of matching table metadata to a business glossary containing data labels and descriptions. The resulting matching enables the use of an available or curated business glossary for retrieval and analysis without or before requesting access to the data contents. One solution to this problem is to use manually-defined rules or similarity measures on column names and glossary descriptions (or their vector embeddings) to find the closest match. However, such approaches need to be tuned through manual labeling and cannot handle many business glossaries that contain a combination of simple as well as complex and long descriptions. In this work, we leverage the power of large language models (LLMs) to design generic matching methods that do not require manual tuning and can identify complex relations between column names and glossaries. We propose methods that utilize LLMs in two ways: a) by generating additional context for column names that can aid with matching b) by using LLMs to directly infer if there is a relation between column names and glossary descriptions. Our preliminary experimental results show the effectiveness of our proposed methods.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Principle of information causality rationalizes quantum composition
Authors:
Ram Krishna Patra,
Sahil Gopalkrishna Naik,
Edwin Peter Lobo,
Samrat Sen,
Govind Lal Sidhardh,
Mir Alimuddin,
Manik Banik
Abstract:
Principle of information causality, proposed as a generalization of no signaling principle, has efficiently been applied to outcast beyond quantum correlations as unphysical. In this letter we show that this principle when utilized properly can provide physical rationale towards structural derivation of multipartite quantum systems. In accordance with no signaling condition state and effect spaces…
▽ More
Principle of information causality, proposed as a generalization of no signaling principle, has efficiently been applied to outcast beyond quantum correlations as unphysical. In this letter we show that this principle when utilized properly can provide physical rationale towards structural derivation of multipartite quantum systems. In accordance with no signaling condition state and effect spaces of a composite system can allow different possible mathematical descriptions even when description for the individual systems are assumed to be quantum. While in one extreme, namely the maximal tensor product composition, the state space becomes quite exotic and permits composite states that are not allowed in quantum theory, the other extreme -- minimal tensor product composition -- contains only separable states and the resulting theory allows only Bell local correlation. As we show, none of these compositions does commensurate with information causality, and hence get invalidated to be the bona-fide description of nature. Information causality, therefore, promises information theoretic derivation of self-duality of state and effect cones for composite quantum systems.
△ Less
Submitted 24 February, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Timelike correlations and quantum tensor product structure
Authors:
Samrat Sen,
Edwin Peter Lobo,
Ram Krishna Patra,
Sahil Gopalkrishna Naik,
Anandamay Das Bhowmik,
Mir Alimuddin,
Manik Banik
Abstract:
The state space structure for a composite quantum system is postulated among several mathematically consistent possibilities that are compatible with local quantum description. For instance, unentangled Gleason's theorem allows a state space that includes density operators as a proper subset among all possible composite states. However, bipartite correlations obtained in Bell type experiments from…
▽ More
The state space structure for a composite quantum system is postulated among several mathematically consistent possibilities that are compatible with local quantum description. For instance, unentangled Gleason's theorem allows a state space that includes density operators as a proper subset among all possible composite states. However, bipartite correlations obtained in Bell type experiments from this broader state space are in-fact quantum simulable, and hence such spacelike correlations are no good to make distinction among different compositions. In this work we analyze communication utilities of these different composite models and show that they can lead to distinct utilities in a simple communication game involving two players. Our analysis, thus, establishes that beyond quantum composite structure can lead to beyond quantum correlations in timelike scenario and hence welcomes new principles to isolate the quantum correlations from the beyond quantum ones. We also prove a no-go that the classical information carrying capacity of different such compositions cannot be more than the corresponding quantum composite systems.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Classical analogue of quantum superdense coding and communication advantage of a single quantum system
Authors:
Ram Krishna Patra,
Sahil Gopalkrishna Naik,
Edwin Peter Lobo,
Samrat Sen,
Tamal Guha,
Some Sankar Bhattacharya,
Mir Alimuddin,
Manik Banik
Abstract:
We analyze utility of communication channels in absence of any short of quantum or classical correlation shared between the sender and the receiver. To this aim, we propose a class of two-party communication games, and show that the games cannot be won given a noiseless $1$-bit classical channel from the sender to the receiver. Interestingly, the goal can be perfectly achieved if the channel is as…
▽ More
We analyze utility of communication channels in absence of any short of quantum or classical correlation shared between the sender and the receiver. To this aim, we propose a class of two-party communication games, and show that the games cannot be won given a noiseless $1$-bit classical channel from the sender to the receiver. Interestingly, the goal can be perfectly achieved if the channel is assisted with classical shared randomness. This resembles an advantage similar to the quantum superdense coding phenomenon where pre-shared entanglement can enhance the communication utility of a perfect quantum communication line. Quite surprisingly, we show that a qubit communication without any assistance of classical shared randomness can achieve the goal, and hence establishes a novel quantum advantage in the simplest communication scenario. In pursuit of a deeper origin of this advantage, we show that an advantageous quantum strategy must invoke quantum interference both at the encoding step by the sender and at the decoding step by the receiver. We also study communication utility of a class of non-classical toy systems described by symmetric polygonal state spaces. We come up with communication tasks that can be achieved neither with $1$-bit of classical communication nor by communicating a polygon system, whereas $1$-qubit communication yields a perfect strategy, establishing quantum advantage over them. To this end, we show that the quantum advantages are robust against imperfect encodings-decodings, making the protocols implementable with presently available quantum technologies.
△ Less
Submitted 4 April, 2024; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Certifying beyond quantumness of locally quantum no-signalling theories through quantum input Bell test
Authors:
Edwin Peter Lobo,
Sahil Gopalkrishna Naik,
Samrat Sen,
Ram Krishna Patra,
Manik Banik,
Mir Alimuddin
Abstract:
Physical theories constrained with local quantum structure and satisfying the no-signalling principle can allow beyond-quantum global states. In a standard Bell experiment, correlations obtained from any such beyond-quantum bipartite state can always be reproduced by quantum states and measurements, suggesting local quantum structure and no-signalling to be the axioms to isolate quantum correlatio…
▽ More
Physical theories constrained with local quantum structure and satisfying the no-signalling principle can allow beyond-quantum global states. In a standard Bell experiment, correlations obtained from any such beyond-quantum bipartite state can always be reproduced by quantum states and measurements, suggesting local quantum structure and no-signalling to be the axioms to isolate quantum correlations. In this letter, however, we show that if the Bell experiment is generalized to allow local quantum inputs, then beyond-quantum correlations can be generated by every beyond-quantum state. This gives us a way to certify beyond-quantumness of locally quantum no-signalling theories and in turn suggests requirement of additional information principles along with local quantum structure and no-signalling principle to isolate quantum correlations. More importantly, our work establishes that the additional principle(s) must be sensitive to the quantum signature of local inputs. We also generalize our results to multipartite locally quantum no-signalling theories and further analyze some interesting implications.
△ Less
Submitted 27 September, 2022; v1 submitted 7 November, 2021;
originally announced November 2021.
-
Local Quantum State Marking
Authors:
Samrat Sen,
Edwin Peter Lobo,
Sahil Gopalkrishna Naik,
Ram Krishna Patra,
Tathagata Gupta,
Subhendu B. Ghosh,
Sutapa Saha,
Mir Alimuddin,
Tamal Guha,
Some Sankar Bhattacharya,
Manik Banik
Abstract:
We propose the task of local state marking (LSM), where some multipartite quantum states chosen randomly from a known set of states are distributed among spatially separated parties without revealing the identities of the individual states. The collaborative aim of the parties is to correctly mark the identities of states under the restriction that they can perform only local quantum operations (L…
▽ More
We propose the task of local state marking (LSM), where some multipartite quantum states chosen randomly from a known set of states are distributed among spatially separated parties without revealing the identities of the individual states. The collaborative aim of the parties is to correctly mark the identities of states under the restriction that they can perform only local quantum operations (LO) on their respective subsystems and can communicate with each other classically (CC) -- popularly known as the operational paradigm of LOCC. While mutually orthogonal states can always be marked exactly under global operations, this is in general not the case under LOCC. We show that the LSM task is distinct from the vastly explored task of local state distinguishability (LSD) -- perfect LSD always implies perfect LSM, whereas we establish that the converse does not hold in general. We also explore entanglement assisted marking of states that are otherwise locally unmarkable and report intriguing entanglement assisted catalytic LSM phenomenon.
△ Less
Submitted 7 March, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Composition of multipartite quantum systems: perspective from time-like paradigm
Authors:
Sahil Gopalkrishna Naik,
Edwin Peter Lobo,
Samrat Sen,
Ramkrishna Patra,
Mir Alimuddin,
Tamal Guha,
Some Sankar Bhattacharya,
Manik Banik
Abstract:
Figuring out the physical rationale behind natural selection of quantum theory is one of the most acclaimed quests in quantum foundational research. This pursuit has inspired several axiomatic initiatives to derive mathematical formulation of the theory by identifying general structure of state and effect space of individual systems as well as specifying their composition rules. This generic frame…
▽ More
Figuring out the physical rationale behind natural selection of quantum theory is one of the most acclaimed quests in quantum foundational research. This pursuit has inspired several axiomatic initiatives to derive mathematical formulation of the theory by identifying general structure of state and effect space of individual systems as well as specifying their composition rules. This generic framework can allow several consistent composition rules for a multipartite system even when state and effect cones of individual subsystems are assumed to be quantum. Nevertheless, for any bipartite system, none of these compositions allows beyond quantum space-like correlations. In this letter we show that such bipartite compositions can admit stronger than quantum correlations in the time-like domain and, hence, indicates pragmatically distinct roles carried out by state and effect cones. We discuss consequences of such correlations in a communication task, which accordingly opens up a possibility of testing the actual composition between elementary quanta.
△ Less
Submitted 12 March, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Soft-Robust Algorithms for Batch Reinforcement Learning
Authors:
Elita A. Lobo,
Mohammad Ghavamzadeh,
Marek Petrik
Abstract:
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome the…
▽ More
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criterion better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to approximately optimize it. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentile-criterion.
△ Less
Submitted 26 February, 2021; v1 submitted 29 November, 2020;
originally announced November 2020.
-
Soft Options Critic
Authors:
Elita Lobo,
Scott Jordan
Abstract:
The option-critic architecture (Bacon, Harb, and Precup 2017) and several variants have successfully demonstrated the use of the options framework proposed by Sutton et al (Sutton, Precup, and Singh1999) to scale learning and planning in hierarchical tasks. Although most of these frameworks use entropy as a regularizer to improve exploration, they do not maximize entropy along with returns at ever…
▽ More
The option-critic architecture (Bacon, Harb, and Precup 2017) and several variants have successfully demonstrated the use of the options framework proposed by Sutton et al (Sutton, Precup, and Singh1999) to scale learning and planning in hierarchical tasks. Although most of these frameworks use entropy as a regularizer to improve exploration, they do not maximize entropy along with returns at every time step. (Haarnoja et al., 2018d) recently introduced an off-policy actor critic algorithm in theSoft Actor Critic paper that maximize returns while maximizing entropy in a constrained manner thus enabling learning of robust options in continuous and discrete action spaces In this paper we adopt the architecture of soft-actor critic to investigate the effect of maximizing entropy of each options and inter-option policy in options framework. We derive the soft options improvement theorem and propose a novel soft-options framework to incorporate maximization of entropy of actions and options in a constrained manner. Our experiments show that the modified options-critic framework generates robust policies which allows fast recovery when environment is subjected to perturbations and outperforms vanilla options-critic framework in most hierarchical tasks
△ Less
Submitted 11 June, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Develo** a Collaborative and Autonomous Training and Learning Environment for Hybrid Wireless Networks
Authors:
Jose Eduardo M. Lobo,
Jorge Luis Risco Becerra,
Matthias R. Brust,
Steffen Rothkugel,
Christian M. Adriano
Abstract:
With larger memory capacities and the ability to link into wireless networks, more and more students uses palmtop and handheld computers for learning activities. However, existing software for Web-based learning is not well-suited for such mobile devices, both due to constrained user interfaces as well as communication effort required. A new generation of applications for the learning domain tha…
▽ More
With larger memory capacities and the ability to link into wireless networks, more and more students uses palmtop and handheld computers for learning activities. However, existing software for Web-based learning is not well-suited for such mobile devices, both due to constrained user interfaces as well as communication effort required. A new generation of applications for the learning domain that is explicitly designed to work on these kinds of small mobile devices has to be developed. For this purpose, we introduce CARLA, a cooperative learning system that is designed to act in hybrid wireless networks. As a cooperative environment, CARLA aims at disseminating teaching material, notes, and even components of itself through both fixed and mobile networks to interested nodes. Due to the mobility of nodes, CARLA deals with upcoming problems such as network partitions and synchronization of teaching material, resource dependencies, and time constraints.
△ Less
Submitted 8 June, 2007;
originally announced June 2007.