-
Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship
Authors:
Zachary R. Baker,
Zarif L. Azher
Abstract:
This study introduces a novel approach to simulating legislative processes using LLM-driven virtual agents, focusing on the U.S. Senate Intelligence Committee. We developed agents representing individual senators and placed them in simulated committee discussions. The agents demonstrated the ability to engage in realistic debate, provide thoughtful reflections, and find bipartisan solutions under…
▽ More
This study introduces a novel approach to simulating legislative processes using LLM-driven virtual agents, focusing on the U.S. Senate Intelligence Committee. We developed agents representing individual senators and placed them in simulated committee discussions. The agents demonstrated the ability to engage in realistic debate, provide thoughtful reflections, and find bipartisan solutions under certain conditions. Notably, the simulation also showed promise in modeling shifts towards bipartisanship in response to external perturbations. Our results indicate that this LLM-driven approach could become a valuable tool for understanding and potentially improving legislative processes, supporting a broader pattern of findings highlighting how LLM-based agents can usefully model real-world phenomena. Future works will focus on enhancing agent complexity, expanding the simulation scope, and exploring applications in policy testing and negotiation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students
Authors:
Valdemar Švábenský,
Mélina Verger,
Maria Mercedes T. Rodrigo,
Clarence James G. Monterozo,
Ryan S. Baker,
Miguel Zenon Nicanor Lerias Saavedra,
Sébastien Lallé,
Atsushi Shimada
Abstract:
Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, inves…
▽ More
Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, investigating algorithmic bias based on students' regional background. The university used the Canvas learning management system (LMS) in its online courses across a broad range of domains. Over the period of three semesters, we collected 48.7 million log records of the students' activity in Canvas. We used these logs to train binary classification models that predict student grades from the LMS activity. The best-performing model reached AUC of 0.75 and weighted F1-score of 0.79. Subsequently, we examined the data for bias based on students' region. Evaluation using three metrics: AUC, weighted F1-score, and MADD showed consistent results across all demographic groups. Thus, no unfairness was observed against a particular student group in the grade predictions.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
On Fixing the Right Problems in Predictive Analytics: AUC Is Not the Problem
Authors:
Ryan S. Baker,
Nigel Bosch,
Stephen Hutt,
Andres F. Zambrano,
Alex J. Bowers
Abstract:
Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metri…
▽ More
Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metric as a gold standard for comparison to AUC ROC, and the article's application of critiques solely to AUC ROC for concerns that would apply to the use of any metric. We conclude with a re-framing of the very valid concerns raised in that article, and discuss how the use of AUC ROC can remain a valid and appropriate practice in a well-informed predictive analytics approach taking those concerns into account. We conclude by discussing the combined use of multiple metrics, including machine learning bias metrics, and AUC ROC's place in such an approach. Like broccoli, AUC ROC is healthy, but also like broccoli, researchers and practitioners in our field shouldn't eat a diet of only AUC ROC.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades
Authors:
Valdemar Švábenský,
Maciej Pankiewicz,
Jiayi Zhang,
Elizabeth B. Cloude,
Ryan S. Baker,
Eric Fouh
Abstract:
Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the difficulty in learning to program shifts to learning computational thinking (e.g., debugging strategies). This study examined the relationships between student…
▽ More
Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the difficulty in learning to program shifts to learning computational thinking (e.g., debugging strategies). This study examined the relationships between students' rate of programming errors and their grades on two exams. Using an online integrated development environment, data were collected from 280 students in a Java programming course. The course had two parts. The first focused on introductory procedural programming and culminated with exam 1, while the second part covered more complex topics and object-oriented programming and ended with exam 2. To measure students' programming abilities, 51095 code snapshots were collected from students while they completed assignments that were autograded based on unit tests. Compiler and runtime errors were extracted from the snapshots, and three measures -- Error Count, Error Quotient and Repeated Error Density -- were explored to identify the best measure explaining variability in exam grades. Models utilizing Error Quotient outperformed the models using the other two measures, in terms of the explained variability in grades and Bayesian Information Criterion. Compiler errors were significant predictors of exam 1 grades but not exam 2 grades; only runtime errors significantly predicted exam 2 grades. The findings indicate that leveraging Error Quotient with multiple error types (compiler and runtime) may be a better measure of students' introductory programming abilities, though still not explaining most of the observed variability.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Navigating Compiler Errors with AI Assistance -- A Study of GPT Hints in an Introductory Programming Course
Authors:
Maciej Pankiewicz,
Ryan S. Baker
Abstract:
We examined the efficacy of AI-assisted learning in an introductory programming course at the university level by using a GPT-4 model to generate personalized hints for compiler errors within a platform for automated assessment of programming assignments. The control group had no access to GPT hints. In the experimental condition GPT hints were provided when a compiler error was detected, for the…
▽ More
We examined the efficacy of AI-assisted learning in an introductory programming course at the university level by using a GPT-4 model to generate personalized hints for compiler errors within a platform for automated assessment of programming assignments. The control group had no access to GPT hints. In the experimental condition GPT hints were provided when a compiler error was detected, for the first half of the problems in each module. For the latter half of the module, hints were disabled. Students highly rated the usefulness of GPT hints. In affect surveys, the experimental group reported significantly higher levels of focus and lower levels of confrustion (confusion and frustration) than the control group. For the six most commonly occurring error types we observed mixed results in terms of performance when access to GPT hints was enabled for the experimental group. However, in the absence of GPT hints, the experimental group's performance surpassed the control group for five out of the six error types.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Distributed Multi-Object Tracking Under Limited Field of View Heterogeneous Sensors with Density Clustering
Authors:
Fei Chen,
Hoa Van Nguyen,
Alex S. Leong,
Sabita Panicker,
Robin Baker,
Damith C. Ranasinghe
Abstract:
We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracki…
▽ More
We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracking algorithm involves solving an NP-hard multidimensional assignment problem either optimally for small-size problems or sub-optimally for general practical problems. For general problems, we propose an efficient distributed multi-object tracking algorithm that performs track-to-track fusion using a clustering-based analysis of the state space transformed into a density space to mitigate the complexity of the assignment problem. The proposed algorithm can more efficiently group local track estimates for fusion than existing approaches. To ensure we achieve globally consistent identities for tracks across a network of nodes as objects move between FoVs, we develop a graph-based algorithm to achieve label consensus and minimise track segmentation. Numerical experiments with a synthetic and a real-world trajectory dataset demonstrate that our proposed method is significantly more computationally efficient than state-of-the-art solutions, achieving similar tracking accuracy and bandwidth requirements but with improved label consistency.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems
Authors:
Conrad Borchers,
Jiayi Zhang,
Ryan S. Baker,
Vincent Aleven
Abstract:
Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performan…
▽ More
Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performance while working with intelligent tutoring systems for stoichiometry chemistry. We demonstrate the feasibility of labeling SRL behaviors based on AI-generated think-aloud transcripts, identifying the presence or absence of four SRL categories (processing information, planning, enacting, and realizing errors) in each utterance. Using the SRL codes, we conducted regression analyses to examine how the use of SRL in terms of presence, frequency, cyclical characteristics, and recency relate to student performance on subsequent steps in multi-step problems. A model considering students' SRL cycle characteristics outperformed a model only using in-the-moment SRL assessment. In line with theoretical predictions, students' actions during earlier, process-heavy stages of SRL cycles exhibited lower moment-by-moment correctness during problem-solving than later SRL cycle stages. We discuss system re-design opportunities to add SRL support during stages of processing and paths forward for using machine learning to speed research depending on the assessment of SRL based on transcription of think-aloud data.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Cultural Bias and Cultural Alignment of Large Language Models
Authors:
Yan Tao,
Olga Viberg,
Ryan S. Baker,
Rene F. Kizilcec
Abstract:
Culture fundamentally shapes people's reasoning, behavior, and communication. As people increasingly use generative artificial intelligence (AI) to expedite and automate personal and professional tasks, cultural values embedded in AI models may bias people's authentic expression and contribute to the dominance of certain cultures. We conduct a disaggregated evaluation of cultural bias for five wid…
▽ More
Culture fundamentally shapes people's reasoning, behavior, and communication. As people increasingly use generative artificial intelligence (AI) to expedite and automate personal and professional tasks, cultural values embedded in AI models may bias people's authentic expression and contribute to the dominance of certain cultures. We conduct a disaggregated evaluation of cultural bias for five widely used large language models (OpenAI's GPT-4o/4-turbo/4/3.5-turbo/3) by comparing the models' responses to nationally representative survey data. All models exhibit cultural values resembling English-speaking and Protestant European countries. We test cultural prompting as a control strategy to increase cultural alignment for each country/territory. For recent models (GPT-4, 4-turbo, 4o), this improves the cultural alignment of the models' output for 71-81% of countries and territories. We suggest using cultural prompting and ongoing evaluation to reduce cultural bias in the output of generative AI.
△ Less
Submitted 26 June, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Towards Generalizable Detection of Urgency of Discussion Forum Posts
Authors:
Valdemar Švábenský,
Ryan S. Baker,
Andrés Zambrano,
Yishan Zou,
Stefan Slater
Abstract:
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the co…
▽ More
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Large Language Models (GPT) for automating feedback on programming assignments
Authors:
Maciej Pankiewicz,
Ryan S. Baker
Abstract:
Addressing the challenge of generating personalized feedback for programming assignments is demanding due to several factors, like the complexity of code syntax or different ways to correctly solve a task. In this experimental study, we automated the process of feedback generation by employing OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments on an…
▽ More
Addressing the challenge of generating personalized feedback for programming assignments is demanding due to several factors, like the complexity of code syntax or different ways to correctly solve a task. In this experimental study, we automated the process of feedback generation by employing OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments on an automated assessment platform. Students rated the usefulness of GPT-generated hints positively. The experimental group (with GPT hints enabled) relied less on the platform's regular feedback but performed better in terms of percentage of successful submissions across consecutive attempts for tasks, where GPT hints were enabled. For tasks where the GPT feedback was made unavailable, the experimental group needed significantly less time to solve assignments. Furthermore, when GPT hints were unavailable, students in the experimental condition were initially less likely to solve the assignment correctly. This suggests potential over-reliance on GPT-generated feedback. However, students in the experimental condition were able to correct reasonably rapidly, reaching the same percentage correct after seven submission attempts. The availability of GPT hints did not significantly impact students' affective state.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Exploring players' experience of humor and snark in a grade 3-6 history practices game
Authors:
David J. Gagnon,
Ryan S. Baker,
Sarah Gagnon,
Luke Swanson,
Nick Spevacek,
Juliana Andres,
Erik Harpstead,
Jennifer Scianna,
Stefan Slater,
Maria O. C. Z. San Pedro
Abstract:
In this paper we use an existing history learning game with an active audience as a research platform for exploring how humor and "snarkiness" in the dialog script affect students' progression and attitudes about the game. We conducted a 2x2 randomized experiment with 11,804 anonymous 3rd-6th grade students. Using one-way ANOVA and Kruskall-Wallis tests, we find that changes to the script produced…
▽ More
In this paper we use an existing history learning game with an active audience as a research platform for exploring how humor and "snarkiness" in the dialog script affect students' progression and attitudes about the game. We conducted a 2x2 randomized experiment with 11,804 anonymous 3rd-6th grade students. Using one-way ANOVA and Kruskall-Wallis tests, we find that changes to the script produced measurable results in the self-reported perceived humor of the game and the likeability of the player character. Different scripts did not produce significant differences in player completion of the game, or how much of the game was played. Perceived humor and enjoyment of the game and its main character contributed significantly to progress in the game, as did self-perceived reading skill.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Death By A Thousand COTS: Disrupting Satellite Communications using Low Earth Orbit Constellations
Authors:
Frederick Rawlins,
Richard Baker,
Ivan Martinovic
Abstract:
Satellites in Geostationary Orbit (GEO) provide a number of commercial, government, and military services around the world, offering everything from surveillance and monitoring to video calls and internet access. However a dramatic lowering of the cost-per-kilogram to space has led to a recent explosion in real and planned constellations in Low Earth Orbit (LEO) of smaller satellites. These conste…
▽ More
Satellites in Geostationary Orbit (GEO) provide a number of commercial, government, and military services around the world, offering everything from surveillance and monitoring to video calls and internet access. However a dramatic lowering of the cost-per-kilogram to space has led to a recent explosion in real and planned constellations in Low Earth Orbit (LEO) of smaller satellites. These constellations are managed remotely and it is important to consider a scenario in which an attacker gains control over the constituent satellites. In this paper we aim to understand what damage this attacker could cause, using the satellites to generate interference. To ground our analysis, we simulate a number of existing and planned LEO constellations against an example GEO constellation, and evaluate the relative effectiveness of each. Our model shows that with conservative power estimates, both current and planned constellations could disrupt GEO satellite services at every groundstation considered, with effectiveness varying considerably between locations. We analyse different patterns of interference, how they reflect the structures of the constellations creating them, and how effective they might be against a number of legitimate services. We found that real-time usage (e.g. calls, streaming) would be most affected, with 3 constellation designs able to generate thousands of outages of 30 seconds or longer over the course of the day across all groundstations.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Analyzing Adaptive Scaffolds that Help Students Develop Self-Regulated Learning Behaviors
Authors:
Anabil Munshi,
Gautam Biswas,
Ryan Baker,
Jaclyn Ocumpaugh,
Stephen Hutt,
Luc Paquette
Abstract:
Providing adaptive scaffolds to help learners develop self-regulated learning (SRL) processes has been an important goal for intelligent learning environments. Adaptive scaffolding is especially important in open-ended learning environments (OELE), where novice learners often face difficulties in completing their learning tasks. This paper presents a systematic framework for adaptive scaffolding i…
▽ More
Providing adaptive scaffolds to help learners develop self-regulated learning (SRL) processes has been an important goal for intelligent learning environments. Adaptive scaffolding is especially important in open-ended learning environments (OELE), where novice learners often face difficulties in completing their learning tasks. This paper presents a systematic framework for adaptive scaffolding in Betty's Brain, a learning-by-teaching OELE for middle school science, where students construct a causal model to teach a virtual agent, generically named Betty. We evaluate the adaptive scaffolding framework and discuss its implications on the development of more effective scaffolds for SRL in OELEs. We detect key cognitive/metacognitive inflection points, i.e., instances where students' behaviors and performance change as they work on their learning tasks. At such inflection points, Mr. Davis (a mentor agent) or Betty (the teachable agent) provide conversational feedback, focused on strategies to help students become productive learners. We conduct a classroom study with 98 middle schoolers to analyze the impact of adaptive scaffolds on students' learning behaviors and performance. Adaptive scaffolding produced mixed results, with some scaffolds (viz., strategic hints that supported debugging and assessment of causal models) being generally more useful to students than others (viz., encouragement prompts). We also note differences in learning behaviors of High and Low performers after receiving scaffolds. Overall, our findings suggest how adaptive scaffolding in OELEs like Betty's Brain can be further improved to narrow the gap between High and Low performers.
△ Less
Submitted 1 June, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Brokenwire : Wireless Disruption of CCS Electric Vehicle Charging
Authors:
Sebastian Köhler,
Richard Baker,
Martin Strohmeier,
Ivan Martinovic
Abstract:
We present a novel attack against the Combined Charging System, one of the most widely used DC rapid charging technologies for electric vehicles (EVs). Our attack, Brokenwire, interrupts necessary control communication between the vehicle and charger, causing charging sessions to abort. The attack requires only temporary physical proximity and can be conducted wirelessly from a distance, allowing…
▽ More
We present a novel attack against the Combined Charging System, one of the most widely used DC rapid charging technologies for electric vehicles (EVs). Our attack, Brokenwire, interrupts necessary control communication between the vehicle and charger, causing charging sessions to abort. The attack requires only temporary physical proximity and can be conducted wirelessly from a distance, allowing individual vehicles or entire fleets to be disrupted stealthily and simultaneously. In addition, it can be mounted with off-the-shelf radio hardware and minimal technical knowledge. By exploiting CSMA/CA behavior, only a very weak signal needs to be induced into the victim to disrupt communication - exceeding the effectiveness of broadband noise jamming by three orders of magnitude. The exploited behavior is a required part of the HomePlug Green PHY, DIN 70121 & ISO 15118 standards and all known implementations exhibit it. We first study the attack in a controlled testbed and then demonstrate it against eight vehicles and 20 chargers in real deployments. We find the attack to be successful in the real world, at ranges up to 47 m, for a power budget of less than 1 W. We further show that the attack can work between the floors of a building (e.g., multi-story parking), through perimeter fences, and from `drive-by' attacks. We present a heuristic model to estimate the number of vehicles that can be attacked simultaneously for a given output power. Brokenwire has immediate implications for a substantial proportion of the around 12 million battery EVs on the roads worldwide - and profound effects on the new wave of electrification for vehicle fleets, both for private enterprise and crucial public services, as well as electric buses, trucks and small ships. As such, we conducted a disclosure to the industry and discussed a range of mitigation techniques that could be deployed to limit the impact.
△ Less
Submitted 26 March, 2024; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Signal Injection Attacks against CCD Image Sensors
Authors:
Sebastian Köhler,
Richard Baker,
Ivan Martinovic
Abstract:
Since cameras have become a crucial part in many safety-critical systems and applications, such as autonomous vehicles and surveillance, a large body of academic and non-academic work has shown attacks against their main component - the image sensor. However, these attacks are limited to coarse-grained and often suspicious injections because light is used as an attack vector. Furthermore, due to t…
▽ More
Since cameras have become a crucial part in many safety-critical systems and applications, such as autonomous vehicles and surveillance, a large body of academic and non-academic work has shown attacks against their main component - the image sensor. However, these attacks are limited to coarse-grained and often suspicious injections because light is used as an attack vector. Furthermore, due to the nature of optical attacks, they require the line-of-sight between the adversary and the target camera.
In this paper, we present a novel post-transducer signal injection attack against CCD image sensors, as they are used in professional, scientific, and even military settings. We show how electromagnetic emanation can be used to manipulate the image information captured by a CCD image sensor with the granularity down to the brightness of individual pixels. We study the feasibility of our attack and then demonstrate its effects in the scenario of automatic barcode scanning. Our results indicate that the injected distortion can disrupt automated vision-based intelligent systems.
△ Less
Submitted 13 December, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
They See Me Rollin': Inherent Vulnerability of the Rolling Shutter in CMOS Image Sensors
Authors:
Sebastian Köhler,
Giulio Lovisotto,
Simon Birnbach,
Richard Baker,
Ivan Martinovic
Abstract:
In this paper, we describe how the electronic rolling shutter in CMOS image sensors can be exploited using a bright, modulated light source (e.g., an inexpensive, off-the-shelf laser), to inject fine-grained image disruptions. We demonstrate the attack on seven different CMOS cameras, ranging from cheap IoT to semi-professional surveillance cameras, to highlight the wide applicability of the rolli…
▽ More
In this paper, we describe how the electronic rolling shutter in CMOS image sensors can be exploited using a bright, modulated light source (e.g., an inexpensive, off-the-shelf laser), to inject fine-grained image disruptions. We demonstrate the attack on seven different CMOS cameras, ranging from cheap IoT to semi-professional surveillance cameras, to highlight the wide applicability of the rolling shutter attack. We model the fundamental factors affecting a rolling shutter attack in an uncontrolled setting. We then perform an exhaustive evaluation of the attack's effect on the task of object detection, investigating the effect of attack parameters. We validate our model against empirical data collected on two separate cameras, showing that by simply using information from the camera's datasheet the adversary can accurately predict the injected distortion size and optimize their attack accordingly. We find that an adversary can hide up to 75% of objects perceived by state-of-the-art detectors by selecting appropriate attack parameters. We also investigate the stealthiness of the attack in comparison to a naïve camera blinding attack, showing that common image distortion metrics can not detect the attack presence. Therefore, we present a new, accurate and lightweight enhancement to the backbone network of an object detector to recognize rolling shutter attacks. Overall, our results indicate that rolling shutter attacks can substantially reduce the performance and reliability of vision-based intelligent systems.
△ Less
Submitted 1 December, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
A Review of Computational Approaches for Evaluation of Rehabilitation Exercises
Authors:
Yalin Liao,
Aleksandar Vakanski,
Min Xian,
David Paul,
Russell Baker
Abstract:
Recent advances in data analytics and computer-aided diagnostics stimulate the vision of patient-centric precision healthcare, where treatment plans are customized based on the health records and needs of every patient. In physical rehabilitation, the progress in machine learning and the advent of affordable and reliable motion capture sensors have been conducive to the development of approaches f…
▽ More
Recent advances in data analytics and computer-aided diagnostics stimulate the vision of patient-centric precision healthcare, where treatment plans are customized based on the health records and needs of every patient. In physical rehabilitation, the progress in machine learning and the advent of affordable and reliable motion capture sensors have been conducive to the development of approaches for automated assessment of patient performance and progress toward functional recovery. The presented study reviews computational approaches for evaluating patient performance in rehabilitation programs using motion capture systems. Such approaches will play an important role in supplementing traditional rehabilitation assessment performed by trained clinicians, and in assisting patients participating in home-based rehabilitation. The reviewed computational methods for exercise evaluation are grouped into three main categories: discrete movement score, rule-based, and template-based approaches. The review places an emphasis on the application of machine learning methods for movement evaluation in rehabilitation. Related work in the literature on data representation, feature engineering, movement segmentation, and scoring functions is presented. The study also reviews existing sensors for capturing rehabilitation movements and provides an informative listing of pertinent benchmark datasets. The significance of this paper is in being the first to provide a comprehensive review of computational methods for evaluation of patient performance in rehabilitation programs.
△ Less
Submitted 19 March, 2020; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Extending Deep Knowledge Tracing: Inferring Interpretable Knowledge and Predicting Post-System Performance
Authors:
Richard Scruggs,
Ryan S. Baker,
Bruce M. McLaren
Abstract:
Recent student knowledge modeling algorithms such as Deep Knowledge Tracing (DKT) and Dynamic Key-Value Memory Networks (DKVMN) have been shown to produce accurate predictions of problem correctness within the same learning system. However, these algorithms do not attempt to directly infer student knowledge. In this paper we present an extension to these algorithms to also infer knowledge. We appl…
▽ More
Recent student knowledge modeling algorithms such as Deep Knowledge Tracing (DKT) and Dynamic Key-Value Memory Networks (DKVMN) have been shown to produce accurate predictions of problem correctness within the same learning system. However, these algorithms do not attempt to directly infer student knowledge. In this paper we present an extension to these algorithms to also infer knowledge. We apply this extension to DKT and DKVMN, resulting in knowledge estimates that correlate better with a posttest than knowledge estimates from Bayesian Knowledge Tracing (BKT), an algorithm designed to infer knowledge, and another classic algorithm, Performance Factors Analysis (PFA). We also apply our extension to correctness predictions from BKT and PFA, finding that knowledge estimates produced with it correlate better with the posttest than BKT and PFA's standard knowledge estimates. These findings are significant since the primary aim of education is to prepare students for later experiences outside of the immediate learning activity.
△ Less
Submitted 31 August, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Social Network Based Substance Abuse Prevention via Network Modification (A Preliminary Study)
Authors:
Aida Rahmattalabi,
Anamika Barman Adhikari,
Phebe Vayanos,
Milind Tambe,
Eric Rice,
Robin Baker
Abstract:
Substance use and abuse is a significant public health problem in the United States. Group-based intervention programs offer a promising means of preventing and reducing substance abuse. While effective, unfortunately, inappropriate intervention groups can result in an increase in deviant behaviors among participants, a process known as deviancy training. This paper investigates the problem of opt…
▽ More
Substance use and abuse is a significant public health problem in the United States. Group-based intervention programs offer a promising means of preventing and reducing substance abuse. While effective, unfortunately, inappropriate intervention groups can result in an increase in deviant behaviors among participants, a process known as deviancy training. This paper investigates the problem of optimizing the social influence related to the deviant behavior via careful construction of the intervention groups. We propose a Mixed Integer Optimization formulation that decides on the intervention groups, captures the impact of the groups on the structure of the social network, and models the impact of these changes on behavior propagation. In addition, we propose a scalable hybrid meta-heuristic algorithm that combines Mixed Integer Programming and Large Neighborhood Search to find near-optimal network partitions. Our algorithm is packaged in the form of GUIDE, an AI-based decision aid that recommends intervention groups. Being the first quantitative decision aid of this kind, GUIDE is able to assist practitioners, in particular social workers, in three key areas: (a) GUIDE proposes near-optimal solutions that are shown, via extensive simulations, to significantly improve over the traditional qualitative practices for forming intervention groups; (b) GUIDE is able to identify circumstances when an intervention will lead to deviancy training, thus saving time, money, and effort; (c) GUIDE can evaluate current strategies of group formation and discard strategies that will lead to deviancy training. In develo** GUIDE, we are primarily interested in substance use interventions among homeless youth as a high risk and vulnerable population. GUIDE is developed in collaboration with Urban Peak, a homeless-youth serving organization in Denver, CO, and is under preparation for deployment.
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
The Importance of Socio-Cultural Differences for Annotating and Detecting the Affective States of Students
Authors:
Eda Okur,
Sinem Aslan,
Nese Alyuz,
Asli Arslan Esme,
Ryan S. Baker
Abstract:
The development of real-time affect detection models often depends upon obtaining annotated data for supervised learning by employing human experts to label the student data. One open question in annotating affective data for affect detection is whether the labelers (i.e., human experts) need to be socio-culturally similar to the students being labeled, as this impacts the cost feasibility of obta…
▽ More
The development of real-time affect detection models often depends upon obtaining annotated data for supervised learning by employing human experts to label the student data. One open question in annotating affective data for affect detection is whether the labelers (i.e., human experts) need to be socio-culturally similar to the students being labeled, as this impacts the cost feasibility of obtaining the labels. In this study, we investigate the following research questions: For affective state annotation, how does the socio-cultural background of human expert labelers, compared to the subjects, impact the degree of consensus and distribution of affective states obtained? Secondly, how do differences in labeler background impact the performance of affect detection models that are trained using these labels?
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining
Authors:
Josh Gardner,
Yuming Yang,
Ryan Baker,
Christopher Brooks
Abstract:
The use of machine learning techniques has expanded in education research, driven by the rich data from digital learning environments and institutional data warehouses. However, replication of machine learned models in the domain of the learning sciences is particularly challenging due to a confluence of experimental, methodological, and data barriers. We discuss the challenges of end-to-end machi…
▽ More
The use of machine learning techniques has expanded in education research, driven by the rich data from digital learning environments and institutional data warehouses. However, replication of machine learned models in the domain of the learning sciences is particularly challenging due to a confluence of experimental, methodological, and data barriers. We discuss the challenges of end-to-end machine learning replication in this context, and present an open-source software toolkit, the MOOC Replication Framework (MORF), to address them. We demonstrate the use of MORF by conducting a replication at scale, and provide a complete executable container, with unique DOIs documenting the configurations of each individual trial, for replication or future extension at https://github.com/educational-technology-collective/fy2015-replication. This work demonstrates an approach to end-to-end machine learning replication which is relevant to any domain with large, complex or multi-format, privacy-protected data with a consistent schema.
△ Less
Submitted 10 July, 2018; v1 submitted 13 June, 2018;
originally announced June 2018.
-
MORF: A Framework for Predictive Modeling and Replication At Scale With Privacy-Restricted MOOC Data
Authors:
Josh Gardner,
Christopher Brooks,
Juan Miguel L. Andres,
Ryan Baker
Abstract:
Big data repositories from online learning platforms such as Massive Open Online Courses (MOOCs) represent an unprecedented opportunity to advance research on education at scale and impact a global population of learners. To date, such research has been hindered by poor reproducibility and a lack of replication, largely due to three types of barriers: experimental, inferential, and data. We presen…
▽ More
Big data repositories from online learning platforms such as Massive Open Online Courses (MOOCs) represent an unprecedented opportunity to advance research on education at scale and impact a global population of learners. To date, such research has been hindered by poor reproducibility and a lack of replication, largely due to three types of barriers: experimental, inferential, and data. We present a novel system for large-scale computational research, the MOOC Replication Framework (MORF), to jointly address these barriers. We discuss MORF's architecture, an open-source platform-as-a-service (PaaS) which includes a simple, flexible software API providing for multiple modes of research (predictive modeling or production rule analysis) integrated with a high-performance computing environment. All experiments conducted on MORF use executable Docker containers which ensure complete reproducibility while allowing for the use of any software or language which can be installed in the linux-based Docker container. Each experimental artifact is assigned a DOI and made publicly available. MORF has the potential to accelerate and democratize research on its massive data repository, which currently includes over 200 MOOCs, as demonstrated by initial research conducted on the platform. We also highlight ways in which MORF represents a solution template to a more general class of problems faced by computational researchers in other domains.
△ Less
Submitted 21 August, 2018; v1 submitted 16 January, 2018;
originally announced January 2018.
-
Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform
Authors:
Phil Trinder,
Natalia Chechina,
Nikolaos Papaspyrou,
Konstantinos Sagonas,
Simon Thompson,
Stephen Adams,
Stavros Aronis,
Robert Baker,
Eva Bihari,
Olivier Boudeville,
Francesco Cesarini,
Maurizio Di Stefano,
Sverker Eriksson,
Viktoria Fordos,
Amir Ghaffari,
Aggelos Giantsios,
Rickard Green,
Csaba Hoch,
David Klaftenegger,
Huiqing Li,
Kenneth Lundin,
Kenneth Mackenzie,
Katerina Roukounaki,
Yiannis Tsiouris,
Kjell Winblad
Abstract:
Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang s…
▽ More
Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems, and reports the work to improve the language scalability.
We systematically study the scalability limits of Erlang and address the issues at the virtual machine (VM), language, and tool levels. More specifically: (1) We have evolved the Erlang VM so that it can work effectively in large scale single-host multicore and NUMA architectures. We have made important architectural improvements to the Erlang/OTP. (2) We have designed and implemented Scalable Distributed (SD) Erlang libraries to address language-level scalability issues, and provided and validated a set of semantics for the new language constructs. (3) To make large Erlang systems easier to deploy, monitor, and debug we have developed and made open source releases of five complementary tools, some specific to SD Erlang.
Throughout the article we use two case studies to investigate the capabilities of our new technologies and tools: a distributed hash table based Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments show that two versions of ACO survive random process failure and hence that SD Erlang preserves the Erlang reliability model. Even for programs with no global recovery data to maintain, SD Erlang partitions the network to reduce network traffic and hence improves performance of the Orbit and ACO benchmarks above 80 hosts. ACO measurements show that maintaining global recovery data dramatically limits scalability; however scalability is recovered by partitioning the recovery data.
△ Less
Submitted 8 May, 2017; v1 submitted 24 April, 2017;
originally announced April 2017.
-
Stochastic Ordering based Carrier-to-Interference Ratio Analysis for the Shotgun Cellular Systems
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
A simple analytical tool based on stochastic ordering is developed to compare the distributions of carrier-to-interference ratio at the mobile station of two cellular systems where the base stations are distributed randomly according to certain non-homogeneous Poisson point processes. The comparison is conveniently done by studying only the base station densities without having to solve for the di…
▽ More
A simple analytical tool based on stochastic ordering is developed to compare the distributions of carrier-to-interference ratio at the mobile station of two cellular systems where the base stations are distributed randomly according to certain non-homogeneous Poisson point processes. The comparison is conveniently done by studying only the base station densities without having to solve for the distributions of the carrier-to-interference ratio, that are often hard to obtain.
△ Less
Submitted 14 October, 2011;
originally announced October 2011.
-
Multi-tier Network Performance Analysis using a Shotgun Cellular System
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
This paper studies the carrier-to-interference ratio (CIR) and carrier-to-interference-plus-noise ratio (CINR) performance at the mobile station (MS) within a multi-tier network composed of M tiers of wireless networks, with each tier modeled as the homogeneous n-dimensional (n-D, n=1,2, and 3) shotgun cellular system, where the base station (BS) distribution is given by the homogeneous Poisson po…
▽ More
This paper studies the carrier-to-interference ratio (CIR) and carrier-to-interference-plus-noise ratio (CINR) performance at the mobile station (MS) within a multi-tier network composed of M tiers of wireless networks, with each tier modeled as the homogeneous n-dimensional (n-D, n=1,2, and 3) shotgun cellular system, where the base station (BS) distribution is given by the homogeneous Poisson point process in n-D. The CIR and CINR at the MS in a single tier network are thoroughly analyzed to simplify the analysis of the multi-tier network. For the multi-tier network with given system parameters, the following are the main results of this paper: (1) semi-analytical expressions for the tail probabilities of CIR and CINR; (2) a closed form expression for the tail probability of CIR in the range [1,Infinity); (3) a closed form expression for the tail probability of an approximation to CIR in the entire range [0,Infinity); (4) a lookup table based approach for obtaining the tail probability of CINR, and (5) the study of the effect of shadow fading and BSs with ideal sectorized antennas on the CIR and CINR. Based on these results, it is shown that, in a practical cellular system, the installation of additional wireless networks (microcells, picocells and femtocells) with low power BSs over the already existing macrocell network will always improve the CINR performance at the MS.
△ Less
Submitted 14 October, 2011;
originally announced October 2011.
-
Downlink Performance Analysis for a Generalized Shotgun Cellular System
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
In this paper, we analyze the signal-to-interference-plus-noise ratio (SINR) performance at a mobile station (MS) in a random cellular network. The cellular network is formed by base-stations (BSs) placed in a one, two or three dimensional space according to a possibly non-homogeneous Poisson point process, which is a generalization of the so-called shotgun cellular system. We develop a sequence o…
▽ More
In this paper, we analyze the signal-to-interference-plus-noise ratio (SINR) performance at a mobile station (MS) in a random cellular network. The cellular network is formed by base-stations (BSs) placed in a one, two or three dimensional space according to a possibly non-homogeneous Poisson point process, which is a generalization of the so-called shotgun cellular system. We develop a sequence of equivalence relations for the SCSs and use them to derive semi-analytical expressions for the coverage probability at the MS when the transmissions from each BS may be affected by random fading with arbitrary distributions as well as attenuation following arbitrary path-loss models. For homogeneous Poisson point processes in the interference-limited case with power-law path-loss model, we show that the SINR distribution is the same for all fading distributions and is not a function of the base station density. In addition, the influence of random transmission powers, power control, multiple channel reuse groups on the downlink performance are also discussed. The techniques developed for the analysis of SINR have applications beyond cellular networks and can be used in similar studies for cognitive radio networks, femtocell networks and other heterogeneous and multi-tier networks.
△ Less
Submitted 12 December, 2012; v1 submitted 20 February, 2010;
originally announced February 2010.
-
Comparison of Characteristics and Practices amongst Spreadsheet Users with Different Levels of Experience
Authors:
Kenneth R. Baker,
Stephen G. Powell,
Barry Lawson,
Lynn Foster-Johnson
Abstract:
We developed an internet-based questionnaire on spreadsheet use that we administered to a large number of users in several companies and organizations to document how spreadsheets are currently being developed and used in business. In this paper, we discuss the results drawn from of a comparison of responses from individuals with the most experience and expertise with those from individuals with…
▽ More
We developed an internet-based questionnaire on spreadsheet use that we administered to a large number of users in several companies and organizations to document how spreadsheets are currently being developed and used in business. In this paper, we discuss the results drawn from of a comparison of responses from individuals with the most experience and expertise with those from individuals with the least. These results describe two views of spreadsheet design and use in organizations, and reflect gaps between these two groups and between these groups and the entire population of nearly 1600 respondents. Moreover, our results indicate that these gaps have multiple dimensions: they reflect not only the context, skill, and practices of individual users but also the policies of large organizations.
△ Less
Submitted 2 March, 2008;
originally announced March 2008.
-
Impact of Errors in Operational Spreadsheets
Authors:
Stephen G. Powell,
Barry Lawson,
Kenneth R. Baker
Abstract:
All users of spreadsheets struggle with the problem of errors. Errors are thought to be prevalent in spreadsheets, and in some instances they have cost organizations millions of dollars. In a previous study of 50 operational spreadsheets we found errors in 0.8% to 1.8% of all formula cells, depending on how errors are defined. In the current study we estimate the quantitative impacts of errors i…
▽ More
All users of spreadsheets struggle with the problem of errors. Errors are thought to be prevalent in spreadsheets, and in some instances they have cost organizations millions of dollars. In a previous study of 50 operational spreadsheets we found errors in 0.8% to 1.8% of all formula cells, depending on how errors are defined. In the current study we estimate the quantitative impacts of errors in 25 operational spreadsheets from five different organizations. We find that many errors have no quantitative impact on the spreadsheet. Those that have an impact often affect unimportant portions of the spreadsheet. The remaining errors do sometimes have substantial impacts on key aspects of the spreadsheet. This paper provides the first fully-documented evidence on the quantitative impact of errors in operational spreadsheets.
△ Less
Submitted 4 January, 2008;
originally announced January 2008.
-
GridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment
Authors:
Rich Baker,
Dantong Yu,
Jason Smith,
Anthony Chan,
Kaushik De,
Patrick McGuigan
Abstract:
Grid computing consists of the coordinated use of large sets of diverse, geographically distributed resources for high performance computation. Effective monitoring of these computing resources is extremely important to allow efficient use on the Grid. The large number of heterogeneous computing entities available in Grids makes the task challenging. In this work, we describe a Grid monitoring s…
▽ More
Grid computing consists of the coordinated use of large sets of diverse, geographically distributed resources for high performance computation. Effective monitoring of these computing resources is extremely important to allow efficient use on the Grid. The large number of heterogeneous computing entities available in Grids makes the task challenging. In this work, we describe a Grid monitoring system, called GridMonitor, that captures and makes available the most important information from a large computing facility. The Grid monitoring system consists of four tiers: local monitoring, archiving, publishing and harnessing. This architecture was applied on a large scale linux farm and network infrastructure. It can be used by many higher-level Grid services including scheduling services and resource brokering.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
A Model for Grid User Management
Authors:
Richard Baker,
Dantong Yu,
Tomasz Wlodek
Abstract:
Registration and management of users in a large scale Grid computing environment presents new challenges that are not well addressed by existing protocols. Within a single Virtual Organization (VO), thousands of users will potentially need access to hundreds of computing sites, and the traditional model where users register for local accounts at each site will present significant scaling problem…
▽ More
Registration and management of users in a large scale Grid computing environment presents new challenges that are not well addressed by existing protocols. Within a single Virtual Organization (VO), thousands of users will potentially need access to hundreds of computing sites, and the traditional model where users register for local accounts at each site will present significant scaling problems. However, computing sites must maintain control over access to the site and site policies generally require individual local accounts for every user. We present here a model that allows users to register once with a VO and yet still provides all of the computing sites the information they require with the required level of trust. We have developed tools to allow sites to automate the management of local accounts and the map**s between Grid identities and local accounts.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.