Search | arXiv e-print repository

arXiv:2403.01538 [pdf]

A Preliminary Exploration of the Disruption of a Generative AI Systems: Faculty/Staff and Student Perceptions of ChatGPT and its Capability of Completing Undergraduate Engineering Coursework

Authors: Lance White, Trini Balart, Sara Amani, Dr. Kristi J. Shryock, Dr. Karan L. Watson

Abstract: The authors of this study aim to assess the capabilities of the OpenAI ChatGPT tool to understand just how effective such a system might be for students to utilize in their studies as well as deepen understanding of faculty/staff and student perceptions about ChatGPT in general. The purpose of what is learned from the study is to continue the design of a model to facilitate the development of facu… ▽ More The authors of this study aim to assess the capabilities of the OpenAI ChatGPT tool to understand just how effective such a system might be for students to utilize in their studies as well as deepen understanding of faculty/staff and student perceptions about ChatGPT in general. The purpose of what is learned from the study is to continue the design of a model to facilitate the development of faculty for becoming adept at embracing change, the DANCE model (Designing Adaptations for the Next Changes in Education). This model is used in this study to help faculty with examining the impact that a disruptive new tool, such as ChatGPT, can pose for the learning environment. The authors analyzed the performance of ChatGPT used to complete course assignments at a variety of levels by novice engineering students working as research assistants. Those completed works have been assessed by the faculty who created those assignments to understand how these completed assignments might compare with the performance of a typical student. A set of surveys conducted by the authors of this work are discussed where students, faculty, and staff respondents in March of 2023 addressed their perceptions of ChatGPT (A follow-up survey is being administered now, February 2024). These survey instruments were analyzed, and the data visualized in this work to bring attention to relevant findings by the researchers. This work reports the findings of the researchers with the purpose of sharing the current state of this work at Texas A&M University with the intention to provide insights to scholars both at our own institution and around the world. This work is not intended to be a finished work but reports these findings with full transparency that this work is currently continuing as the researchers gather new data and develop and validate various measurement instruments. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 22 pages, 13 figures

arXiv:2307.05834 [pdf, other]

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

Authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang

Abstract: Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge… ▽ More Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $ε$-optimal policies for the tasks. Our research demonstrates that to achieve $ε$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(ε^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $ε$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}}(d^3H^6Mε^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2304.14415 [pdf]

Generative AI Perceptions: A Survey to Measure the Perceptions of Faculty, Staff, and Students on Generative AI Tools in Academia

Authors: Sara Amani, Lance White, Trini Balart, Laksha Arora, Dr. Kristi J. Shryock, Dr. Kelly Brumbelow, Dr. Karan L. Watson

Abstract: ChatGPT is a natural language processing tool that can engage in human-like conversations and generate coherent and contextually relevant responses to various prompts. ChatGPT is capable of understanding natural text that is input by a user and generating appropriate responses in various forms. This tool represents a major step in how humans are interacting with technology. This paper specifically… ▽ More ChatGPT is a natural language processing tool that can engage in human-like conversations and generate coherent and contextually relevant responses to various prompts. ChatGPT is capable of understanding natural text that is input by a user and generating appropriate responses in various forms. This tool represents a major step in how humans are interacting with technology. This paper specifically focuses on how ChatGPT is revolutionizing the realm of engineering education and the relationship between technology, students, and faculty and staff. Because this tool is quickly changing and improving with the potential for even greater future capability, it is a critical time to collect pertinent data. A survey was created to measure the effects of ChatGPT on students, faculty, and staff. This survey is shared as a Texas A&M University technical report to allow other universities and entities to use this survey and measure the effects elsewhere. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 17 pages, 3 figures

arXiv:2206.00270 [pdf, ps, other]

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

Authors: Sanae Amani, Lin F. Yang, Ching-An Cheng

Abstract: We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen b… ▽ More We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen based on the agent's past behaviors. Remarkably, our algorithm uses only sublinear number of planning calls, which means that the agent eventually learns a policy that is near optimal for multiple tasks (seen or unseen) without the need of deliberate planning. A key to this property is a new structural assumption that enables computation sharing across tasks during exploration. Specifically, for $K$ task episodes of horizon $H$, our algorithm has a regret bound $\tilde{\mathcal{O}}(\sqrt{(d^3+d^\prime d)H^4K})$ based on $\mathcal{O}(dH\log(K))$ number of planning calls, where $d$ and $d^\prime$ are the feature dimensions of the dynamics and rewards, respectively. This theoretical guarantee implies that our algorithm can enable a lifelong learning agent to accumulate experiences and learn to rapidly solve new tasks. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.13170 [pdf, other]

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

Authors: Sanae Amani, Tor Lattimore, András György, Lin F. Yang

Abstract: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. W… ▽ More We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure. △ Less

Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2106.06239 [pdf, ps, other]

Safe Reinforcement Learning with Linear Function Approximation

Authors: Sanae Amani, Christos Thrampoulidis, Lin F. Yang

Abstract: Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unkn… ▽ More Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with \emph{no safety violation}, achieve a $\tilde{\mathcal{O}}\left(κ\sqrt{d^3H^3T}\right)$ regret, nearly matching that of state-of-the-art unsafe algorithms, where $H$ is the duration of each episode, $d$ is the dimension of the feature map**, $κ$ is a constant characterizing the safety constraints, and $T$ is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2103.11489 [pdf, ps, other]

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Authors: Sanae Amani, Christos Thrampoulidis

Abstract: Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing… ▽ More Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1\geq 2$ possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action $\mathbf{x}_t$, the user selects one of $K+1\geq 2$ outcomes, say outcome $i$, with a multinomial logit (MNL) probabilistic model with corresponding unknown parameter $\bar{\boldsymbolθ}_{\ast i}$. Each outcome $i$ is also associated with a revenue parameter $ρ_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $\tilde{\mathcal{O}}(dK\sqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 27 pages, 5 figures

arXiv:2012.00314 [pdf, ps, other]

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Authors: Sanae Amani, Christos Thrampoulidis

Abstract: We study decentralized stochastic linear bandits, where a network of $N$ agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a $d$-dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper co… ▽ More We study decentralized stochastic linear bandits, where a network of $N$ agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a $d$-dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper confidence bound (UCB) strategy and agents share information with their immediate neighbors through a carefully designed consensus procedure that repeats over cycles. Our analysis adjusts the duration of these communication cycles ensuring near-optimal regret performance $\mathcal{O}(d\log{NT}\sqrt{NT})$ at a communication rate of $\mathcal{O}(dN^2)$ per round. The structure of the network affects the regret performance via a small additive term - coined the regret of delay - that depends on the spectral gap of the underlying graph. Notably, our results apply to arbitrary network topologies without a requirement for a dedicated agent acting as a server. In consideration of situations with high communication cost, we propose RC-DLUCB: a modification of DLUCB with rare communication among agents. The new algorithm trades off regret performance for a significantly reduced total communication cost of $\mathcal{O}(d^3N^{2.5})$ over all $T$ rounds. Finally, we show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits. For the recently studied problem of linear bandits with unknown linear safety constraints, we propose the first safe decentralized algorithm. Our study contributes towards applying bandit techniques in safety-critical distributed systems that repeatedly deal with unknown stochastic environments. We present numerical simulations for various network topologies that corroborate our theoretical findings. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2005.01936 [pdf, ps, other]

Regret Bounds for Safe Gaussian Process Bandit Optimization

Authors: Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Abstract: Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constrai… ▽ More Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 22 pages, 17 figures

arXiv:1911.02156 [pdf, other]

Safe Linear Thompson Sampling with Side Information

Authors: Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Abstract: The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this… ▽ More The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order $\mathcal{O} (d^{3/2}\log^{1/2}d \cdot T^{1/2}\log^{3/2}T)$, which remarkably matches the results provided by (Abeille et al., 2017) for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round. △ Less

Submitted 29 February, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: Comparing with safe versions of linear UCB algorithms, Providing more intuition for proof sketch

arXiv:1908.05814 [pdf, other]

Linear Stochastic Bandits Under Safety Constraints

Authors: Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Abstract: Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe action… ▽ More Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set. Once this goal is achieved, the algorithm begins a safe exploration-exploitation phase where the learner gradually expands their estimate of the set of safe actions while controlling the growth of regret. We provide a general regret bound for the algorithm, as well as a problem dependent bound that is connected to the location of the optimal action within the safe set. We then propose a modified heuristic that exploits our problem dependent analysis to improve the regret. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: 23 pages, 7 figures

arXiv:1601.05520 [pdf, other]

COGENT: Certified Compilation for a Functional Systems Language

Authors: Liam O'Connor, Christine Rizkallah, Zilin Chen, Sidney Amani, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Alex Hixon, Gabriele Keller, Toby Murray, Gerwin Klein

Abstract: We present a self-certifying compiler for the COGENT systems language. COGENT is a restricted, polymorphic, higher-order, and purely functional language with linear types and without the need for a trusted runtime or garbage collector. It compiles to efficient C code that is designed to interoperate with existing C functions. The language is suited for layered systems code with minimal sharing suc… ▽ More We present a self-certifying compiler for the COGENT systems language. COGENT is a restricted, polymorphic, higher-order, and purely functional language with linear types and without the need for a trusted runtime or garbage collector. It compiles to efficient C code that is designed to interoperate with existing C functions. The language is suited for layered systems code with minimal sharing such as file systems or network protocol control code. For a well-typed COGENT program, the compiler produces C code, a high-level shallow embedding of its semantics in Isabelle/HOL, and a proof that the C code correctly implements this embedding. The aim is for proof engineers to reason about the full semantics of real-world systems code productively and equationally, while retaining the interoperability and leanness of C. We describe the formal verification stages of the compiler, which include automated formal refinement calculi, a switch from imperative update semantics to functional value semantics formally justified by the linear type system, and a number of standard compiler phases such as type checking and monomorphisation. The compiler certificate is a series of language-level meta proofs and per-program translation validation phases, combined into one coherent top-level theorem in Isabelle/HOL. △ Less

Submitted 21 January, 2016; originally announced January 2016.

arXiv:1511.04169 [pdf, other]

doi 10.4204/EPTCS.196.1

Specifying a Realistic File System

Authors: Sidney Amani, Toby Murray

Abstract: We present the most interesting elements of the correctness specification of BilbyFs, a performant Linux flash file system. The BilbyFs specification supports asynchronous writes, a feature that has been overlooked by several file system verification projects, and has been used to verify the correctness of BilbyFs's fsync() C implementation. It makes use of nondeterminism to be concise and is shal… ▽ More We present the most interesting elements of the correctness specification of BilbyFs, a performant Linux flash file system. The BilbyFs specification supports asynchronous writes, a feature that has been overlooked by several file system verification projects, and has been used to verify the correctness of BilbyFs's fsync() C implementation. It makes use of nondeterminism to be concise and is shallowly-embedded in higher-order logic. △ Less

Submitted 13 November, 2015; originally announced November 2015.

Comments: In Proceedings MARS 2015, arXiv:1511.02528

Journal ref: EPTCS 196, 2015, pp. 1-9

arXiv:1211.6185 [pdf, ps, other]

doi 10.4204/EPTCS.102.3

Automatic Verification of Message-Based Device Drivers

Authors: Sidney Amani, Peter Chubb, Alastair F. Donaldson, Alexander Legg, Leonid Ryzhyk, Yan** Zhu

Abstract: We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluati… ▽ More We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluation shows that this methodology amplifies the power of existing verification tools in detecting driver bugs, making it possible to verify properties beyond the reach of traditional techniques. △ Less

Submitted 26 November, 2012; originally announced November 2012.

Comments: In Proceedings SSV 2012, arXiv:1211.5873

ACM Class: D.4.4; B.4.2; D.2.4

Journal ref: EPTCS 102, 2012, pp. 4-17

Showing 1–14 of 14 results for author: Amani, S