-
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Authors:
Anthony Costarelli,
Mat Allen,
Roman Hauksson,
Grace Sodunke,
Suhas Hariharan,
Carlson Cheng,
Wenjie Li,
Arjun Yadav
Abstract:
Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benc…
▽ More
Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benchmark for evaluating strategic reasoning abilities of LLM agents. We focus on 9 different game environments, where each covers at least one axis of key reasoning skill identified in strategy games, and select games for which strategy explanations are unlikely to form a significant portion of models' pretraining corpuses. Our evaluations use GPT-3 and GPT-4 in their base form along with two scaffolding frameworks designed to enhance strategic reasoning ability: Chain-of-Thought (CoT) prompting and Reasoning Via Planning (RAP). Our results show that none of the tested models match human performance, and at worse GPT-4 performs worse than random action. CoT and RAP both improve scores but not comparable to human levels.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data
Authors:
Matthew J Allen,
Francisco Dorr,
Joseph Alejandro Gallego Mejia,
Laura Martínez-Ferrer,
Anna Jungbluth,
Freddie Kalaitzis,
Raúl Ramos-Pollán
Abstract:
Satellite-based remote sensing has revolutionised the way we address global challenges in a rapidly evolving world. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. Specifically, different types of EO data are often hosted on a variety of platforms, wi…
▽ More
Satellite-based remote sensing has revolutionised the way we address global challenges in a rapidly evolving world. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. Specifically, different types of EO data are often hosted on a variety of platforms, with differing availability for Python preprocessing tools. In addition, spatial alignment across data sources and data tiling can present significant technical hurdles for novice users. While some preprocessed EO datasets exist, their content is often limited to optical or near-optical wavelength data, which is ineffective at night or in adverse weather conditions. Synthetic Aperture Radar (SAR), an active sensing technique based on microwave length radiation, offers a viable alternative. However, the application of machine learning to SAR has been limited due to a lack of ML-ready data and pipelines, particularly for the full diversity of SAR data, including polarimetry, coherence and interferometry. We introduce M3LEO, a multi-modal, multi-label EO dataset that includes polarimetric, interferometric, and coherence SAR data derived from Sentinel-1, alongside Sentinel-2 RGB imagery and a suite of labelled tasks for model evaluation. M3LEO spans 17.5TB and contains approximately 10M data chips across six geographic regions. The dataset is complemented by a flexible PyTorch Lightning framework, with configuration management using Hydra. We provide tools to process any dataset available on popular platforms such as Google Earth Engine for integration with our framework. Initial experiments validate the utility of our data and framework, showing that SAR imagery contains information additional to that extractable from RGB data. Data at huggingface.co/M3LEO, and code at github.com/spaceml-org/M3LEO.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Authors:
S. Mostafa Mousavi,
Marc Stogaitis,
Ta**der Gadh,
Richard M Allen,
Alexei Barski,
Robert Bosch,
Patrick Robertson,
Nivetha Thiruverahan,
Youngmin Cho,
Aman Raj
Abstract:
This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified…
▽ More
This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified Mercalli Intensity (MMI) values, aligns well with independent observational data. Furthermore, our results suggest that LLMs, trained on vast internet data, may have developed a unique understanding of physical phenomena. Specifically, Google's Gemini models demonstrate a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, accurately describing observational data even though it's not identical to established models. These findings raise intriguing questions about the extent to which Gemini's training has led to a broader understanding of the physical world and its phenomena. The ability of Generative AI models like Gemini to generate results consistent with established scientific knowledge highlights their potential to augment our understanding of complex physical phenomena like earthquakes. The flexible and effective approach proposed in this study holds immense potential for enriching our understanding of the impact of physical phenomena and improving resilience during natural disasters. This research is a significant step toward harnessing the power of social media and AI for natural disaster mitigation, opening new avenues for understanding the emerging capabilities of Generative AI and LLMs for scientific applications.
△ Less
Submitted 14 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Deep Phenoty** of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex Disease
Authors:
Tahmina Sultana Priya,
Fan Leng,
Anthony C. Luehrs,
Eric W. Klee,
Alina M. Allen,
Konstantinos N. Lazaridis,
Danfeng,
Yao,
Shulan Tian
Abstract:
Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for…
▽ More
Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for precision medicine. The genomic and phenotypic data (3,408 cases and 4,739 controls) for this study were gathered from participants in Mayo Clinic Tapestry Study (IRB#19-000001) and their electric health records, including their demographic, clinical, and comorbidity data, and the genotype information through whole exome sequencing performed at Helix using the Exome+$^\circledR$ Assay according to standard procedure (www$.$helix$.$com). Factors highly relevant to NAFLD were determined by the chi-square test and stepwise backward-forward regression model. Latent class analysis (LCA) was performed on NAFLD cases using significant indicator variables to identify subgroups. The optimal clustering revealed 5 latent subgroups from 2,013 NAFLD patients (mean age 60.6 years and 62.1% women), while a polygenic risk score based on 6 single-nucleotide polymorphism (SNP) variants and disease outcomes were used to analyze the subgroups. The groups are characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors. Odds ratios were utilized to compare the risk of complex diseases, such as fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure between the clusters. Cluster 2 has a significantly higher complex disease outcome compared to other clusters.
Keywords: Fatty liver disease; Polygenic risk score; Precision medicine; Deep phenoty**; NAFLD comorbidities; Latent class analysis.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery
Authors:
Joseph A. Gallego-Mejia,
Anna Jungbluth,
Laura Martínez-Ferrer,
Matt Allen,
Francisco Dorr,
Freddie Kalaitzis,
Raúl Ramos-Pollán
Abstract:
Self-supervised learning (SSL) models have recently demonstrated remarkable performance across various tasks, including image segmentation. This study delves into the emergent characteristics of the Self-Distillation with No Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR) imagery. We pre-train a vision transformer (ViT)-based DINO model using unlabeled SAR data, and l…
▽ More
Self-supervised learning (SSL) models have recently demonstrated remarkable performance across various tasks, including image segmentation. This study delves into the emergent characteristics of the Self-Distillation with No Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR) imagery. We pre-train a vision transformer (ViT)-based DINO model using unlabeled SAR data, and later fine-tune the model to predict high-resolution land cover maps. We rigorously evaluate the utility of attention maps generated by the ViT backbone and compare them with the model's token embedding space. We observe a small improvement in model performance with pre-training compared to training from scratch and discuss the limitations and opportunities of SSL for remote sensing and land cover segmentation. Beyond small performance increases, we show that ViT attention maps hold great intrinsic value for remote sensing, and could provide useful inputs to other algorithms. With this, our work lays the groundwork for bigger and better SSL models for Earth Observation.
△ Less
Submitted 2 December, 2023; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Exploring Generalisability of Self-Distillation with No Labels for SAR-Based Vegetation Prediction
Authors:
Laura Martínez-Ferrer,
Anna Jungbluth,
Joseph A. Gallego-Mejia,
Matt Allen,
Francisco Dorr,
Freddie Kalaitzis,
Raúl Ramos-Pollán
Abstract:
In this work we pre-train a DINO-ViT based model using two Synthetic Aperture Radar datasets (S1GRD or GSSIC) across three regions (China, Conus, Europe). We fine-tune the models on smaller labeled datasets to predict vegetation percentage, and empirically study the connection between the embedding space of the models and their ability to generalize across diverse geographic regions and to unseen…
▽ More
In this work we pre-train a DINO-ViT based model using two Synthetic Aperture Radar datasets (S1GRD or GSSIC) across three regions (China, Conus, Europe). We fine-tune the models on smaller labeled datasets to predict vegetation percentage, and empirically study the connection between the embedding space of the models and their ability to generalize across diverse geographic regions and to unseen data. For S1GRD, embedding spaces of different regions are clearly separated, while GSSIC's overlaps. Positional patterns remain during fine-tuning, and greater distances in embeddings often result in higher errors for unfamiliar regions. With this, our work increases our understanding of generalizability for self-supervised models applied to remote sensing.
△ Less
Submitted 2 December, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data
Authors:
Matt Allen,
Francisco Dorr,
Joseph A. Gallego-Mejia,
Laura Martínez-Ferrer,
Anna Jungbluth,
Freddie Kalaitzis,
Raúl Ramos-Pollán
Abstract:
Satellite-based remote sensing is instrumental in the monitoring and mitigation of the effects of anthropogenic climate change. Large scale, high resolution data derived from these sensors can be used to inform intervention and policy decision making, but the timeliness and accuracy of these interventions is limited by use of optical data, which cannot operate at night and is affected by adverse w…
▽ More
Satellite-based remote sensing is instrumental in the monitoring and mitigation of the effects of anthropogenic climate change. Large scale, high resolution data derived from these sensors can be used to inform intervention and policy decision making, but the timeliness and accuracy of these interventions is limited by use of optical data, which cannot operate at night and is affected by adverse weather conditions. Synthetic Aperture Radar (SAR) offers a robust alternative to optical data, but its associated complexities limit the scope of labelled data generation for traditional deep learning. In this work, we apply a self-supervised pretraining scheme, masked autoencoding, to SAR amplitude data covering 8.7\% of the Earth's land surface area, and tune the pretrained weights on two downstream tasks crucial to monitoring climate change - vegetation cover prediction and land cover classification. We show that the use of this pretraining scheme reduces labelling requirements for the downstream tasks by more than an order of magnitude, and that this pretraining generalises geographically, with the performance gain increasing when tuned downstream on regions outside the pretraining set. Our findings significantly advance climate change mitigation by facilitating the development of task and region-specific SAR models, allowing local communities and organizations to deploy tailored solutions for rapid, accurate monitoring of climate change effects.
△ Less
Submitted 2 December, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Fewshot learning on global multimodal embeddings for earth observation tasks
Authors:
Matt Allen,
Francisco Dorr,
Joseph A. Gallego-Mejia,
Laura Martínez-Ferrer,
Anna Jungbluth,
Freddie Kalaitzis,
Raúl Ramos-Pollán
Abstract:
In this work we pretrain a CLIP/ViT based model using three different modalities of satellite imagery across five AOIs covering over ~10\% of Earth's total landmass, namely Sentinel 2 RGB optical imagery, Sentinel 1 SAR radar amplitude and interferometric coherence. This model uses $\sim 250$ M parameters. Then, we use the embeddings produced for each modality with a classical machine learning met…
▽ More
In this work we pretrain a CLIP/ViT based model using three different modalities of satellite imagery across five AOIs covering over ~10\% of Earth's total landmass, namely Sentinel 2 RGB optical imagery, Sentinel 1 SAR radar amplitude and interferometric coherence. This model uses $\sim 250$ M parameters. Then, we use the embeddings produced for each modality with a classical machine learning method to attempt different downstream tasks for earth observation related to vegetation, built up surface, croplands and permanent water. We consistently show how we reduce the need for labeled data by 99\%, so that with ~200-500 randomly selected labeled examples (around 4K-10K km$^2$) we reach performance levels analogous to those achieved with the full labeled datasets (about 150K image chips or 3M km$^2$ in each area of interest - AOI) on all modalities, AOIs and downstream tasks. This leads us to think that the model has captured significant earth features useful in a wide variety of scenarios. To enhance our model's usability in practice, its architecture allows inference in contexts with missing modalities and even missing channels within each modality. Additionally, we visually show that this embedding space, obtained with no labels, is sensible to the different earth features represented by the labelled datasets we selected.
△ Less
Submitted 2 December, 2023; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Understanding how the use of AI decision support tools affect critical thinking and over-reliance on technology by drug dispensers in Tanzania
Authors:
Ally Salim Jr,
Megan Allen,
Kelvin Mariki,
Kevin James Masoy,
Jafary Liana
Abstract:
The use of AI in healthcare is designed to improve care delivery and augment the decisions of providers to enhance patient outcomes. When deployed in clinical settings, the interaction between providers and AI is a critical component for measuring and understanding the effectiveness of these digital tools on broader health outcomes. Even in cases where AI algorithms have high diagnostic accuracy,…
▽ More
The use of AI in healthcare is designed to improve care delivery and augment the decisions of providers to enhance patient outcomes. When deployed in clinical settings, the interaction between providers and AI is a critical component for measuring and understanding the effectiveness of these digital tools on broader health outcomes. Even in cases where AI algorithms have high diagnostic accuracy, healthcare providers often still rely on their experience and sometimes gut feeling to make a final decision. Other times, providers rely unquestioningly on the outputs of the AI models, which leads to a concern about over-reliance on the technology. The purpose of this research was to understand how reliant drug shop dispensers were on AI-powered technologies when determining a differential diagnosis for a presented clinical case vignette. We explored how the drug dispensers responded to technology that is framed as always correct in an attempt to measure whether they begin to rely on it without any critical thought of their own. We found that dispensers relied on the decision made by the AI 25 percent of the time, even when the AI provided no explanation for its decision.
△ Less
Submitted 22 February, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
AI applications in forest monitoring need remote sensing benchmark datasets
Authors:
Emily R. Lines,
Matt Allen,
Carlos Cabo,
Kim Calders,
Amandine Debus,
Stuart W. D. Grieve,
Milto Miltiadou,
Adam Noach,
Harry J. F. Owen,
Stefano Puliti
Abstract:
With the rise in high resolution remote sensing technologies there has been an explosion in the amount of data available for forest monitoring, and an accompanying growth in artificial intelligence applications to automatically derive forest properties of interest from these datasets. Many studies use their own data at small spatio-temporal scales, and demonstrate an application of an existing or…
▽ More
With the rise in high resolution remote sensing technologies there has been an explosion in the amount of data available for forest monitoring, and an accompanying growth in artificial intelligence applications to automatically derive forest properties of interest from these datasets. Many studies use their own data at small spatio-temporal scales, and demonstrate an application of an existing or adapted data science method for a particular task. This approach often involves intensive and time-consuming data collection and processing, but generates results restricted to specific ecosystems and sensor types. There is a lack of widespread acknowledgement of how the types and structures of data used affects performance and accuracy of analysis algorithms. To accelerate progress in the field more efficiently, benchmarking datasets upon which methods can be tested and compared are sorely needed.
Here, we discuss how lack of standardisation impacts confidence in estimation of key forest properties, and how considerations of data collection need to be accounted for in assessing method performance. We present pragmatic requirements and considerations for the creation of rigorous, useful benchmarking datasets for forest monitoring applications, and discuss how tools from modern data science can improve use of existing data. We list a set of example large-scale datasets that could contribute to benchmarking, and present a vision for how community-driven, representative benchmarking initiatives could benefit the field.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
A Scalable Finite Difference Method for Deep Reinforcement Learning
Authors:
Matthew Allen,
John Raisbeck,
Hakho Lee
Abstract:
Several low-bandwidth distributable black-box optimization algorithms in the family of finite differences such as Evolution Strategies have recently been shown to perform nearly as well as tailored Reinforcement Learning methods in some Reinforcement Learning domains. One shortcoming of these black-box methods is that they must collect information about the structure of the return function at ever…
▽ More
Several low-bandwidth distributable black-box optimization algorithms in the family of finite differences such as Evolution Strategies have recently been shown to perform nearly as well as tailored Reinforcement Learning methods in some Reinforcement Learning domains. One shortcoming of these black-box methods is that they must collect information about the structure of the return function at every update, and can often employ only information drawn from a distribution centered around the current parameters. As a result, when these algorithms are distributed across many machines, a significant portion of total runtime may be spent with many machines idle, waiting for a final return and then for an update to be calculated. In this work we introduce a novel method to use older data in finite difference algorithms, which produces a scalable algorithm that avoids significant idle time or wasted computation.
△ Less
Submitted 19 January, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Account credibility inference based on news-sharing networks
Authors:
Bao Tran Truong,
Oliver Melbourne Allen,
Filippo Menczer
Abstract:
The spread of misinformation poses a threat to the social media ecosystem. Effective countermeasures to mitigate this threat require that social media platforms be able to accurately detect low-credibility accounts even before the content they share can be classified as misinformation. Here we present methods to infer account credibility from information diffusion patterns, in particular leveragin…
▽ More
The spread of misinformation poses a threat to the social media ecosystem. Effective countermeasures to mitigate this threat require that social media platforms be able to accurately detect low-credibility accounts even before the content they share can be classified as misinformation. Here we present methods to infer account credibility from information diffusion patterns, in particular leveraging two networks: the reshare network, capturing an account's trust in other accounts, and the bipartite account-source network, capturing an account's trust in media sources. We extend network centrality measures and graph embedding techniques, systematically comparing these algorithms on data from diverse contexts and social media platforms. We demonstrate that both kinds of trust networks provide useful signals for estimating account credibility. Some of the proposed methods yield high accuracy, providing promising solutions to promote the dissemination of reliable information in online communities. Two kinds of homophily emerge from our results: accounts tend to have similar credibility if they reshare each other's content or share content from similar sources. Our methodology invites further investigation into the relationship between accounts and news sources to better characterize misinformation spreaders.
△ Less
Submitted 24 January, 2024; v1 submitted 31 January, 2022;
originally announced February 2022.
-
Agent Spaces
Authors:
John C. Raisbeck,
Matthew W. Allen,
Hakho Lee
Abstract:
Exploration is one of the most important tasks in Reinforcement Learning, but it is not well-defined beyond finite problems in the Dynamic Programming paradigm (see Subsection 2.4). We provide a reinterpretation of exploration which can be applied to any online learning method. We come to this definition by approaching exploration from a new direction. After finding that concepts of exploration cr…
▽ More
Exploration is one of the most important tasks in Reinforcement Learning, but it is not well-defined beyond finite problems in the Dynamic Programming paradigm (see Subsection 2.4). We provide a reinterpretation of exploration which can be applied to any online learning method. We come to this definition by approaching exploration from a new direction. After finding that concepts of exploration created to solve simple Markov decision processes with Dynamic Programming are no longer broadly applicable, we reexamine exploration. Instead of extending the ends of dynamic exploration procedures, we extend their means. That is, rather than repeatedly sampling every state-action pair possible in a process, we define the act of modifying an agent to itself be explorative. The resulting definition of exploration can be applied in infinite problems and non-dynamic learning methods, which the dynamic notion of exploration cannot tolerate. To understand the way that modifications of an agent affect learning, we describe a novel structure on the set of agents: a collection of distances (see footnote 7) $d_{a} \in A$, which represent the perspectives of each agent possible in the process. Using these distances, we define a topology and show that many important structures in Reinforcement Learning are well behaved under the topology induced by convergence in the agent space.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
E(2) Equivariant Self-Attention for Radio Astronomy
Authors:
Micah Bowles,
Matthew Bromley,
Max Allen,
Anna Scaife
Abstract:
In this work we introduce group-equivariant self-attention models to address the problem of explainable radio galaxy classification in astronomy. We evaluate various orders of both cyclic and dihedral equivariance, and show that including equivariance as a prior both reduces the number of epochs required to fit the data and results in improved performance. We highlight the benefits of equivariance…
▽ More
In this work we introduce group-equivariant self-attention models to address the problem of explainable radio galaxy classification in astronomy. We evaluate various orders of both cyclic and dihedral equivariance, and show that including equivariance as a prior both reduces the number of epochs required to fit the data and results in improved performance. We highlight the benefits of equivariance when using self-attention as an explainable model and illustrate how equivariant models statistically attend the same features in their classifications as human astronomers.
△ Less
Submitted 30 November, 2021; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Crowdsourcing through Cognitive Opportunistic Networks
Authors:
M. Mordacchini,
A. Passarella,
M. Conti,
S. M. Allen,
M. J. Chorley,
G. B. Colombo,
V. Tanasescu,
R. M. Whitaker
Abstract:
Untile recently crowdsourcing has been primarily conceived as an online activity to harness resources for problem solving. However the emergence of opportunistic networking (ON) has opened up crowdsourcing to the spatial domain. In this paper we bring the ON model for potential crowdsourcing in the smart city environment. We introduce cognitive features to the ON that allow users' mobile devices t…
▽ More
Untile recently crowdsourcing has been primarily conceived as an online activity to harness resources for problem solving. However the emergence of opportunistic networking (ON) has opened up crowdsourcing to the spatial domain. In this paper we bring the ON model for potential crowdsourcing in the smart city environment. We introduce cognitive features to the ON that allow users' mobile devices to become aware of the surrounding physical environment. Specifically, we exploit cognitive psychology studies on dynamic memory structures and cognitive heuristics, i.e. mental models that describe how the human brain handle decision-making amongst complex and real-time stimuli. Combined with ON, these cognitive features allow devices to act as proxies in the cyber-world of their users and exchange knowledge to deliver awareness of places in an urban environment. This is done through tags associated with locations. They represent features that are perceived by humans about a place. We consider the extent to which this knowledge becomes available to participants, using interactions with locations and other nodes. This is assessed taking into account a wide range of cognitive parameters. Outcomes are important because this functionality could support a new type of recommendation system that is independent of the traditional forms of networking.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Develo** an OpenAI Gym-compatible framework and simulation environment for testing Deep Reinforcement Learning agents solving the Ambulance Location Problem
Authors:
Michael Allen,
Kerry Pearn,
Tom Monks
Abstract:
Background and motivation: Deep Reinforcement Learning (Deep RL) is a rapidly develo** field. Historically most application has been made to games (such as chess, Atari games, and go). Deep RL is now reaching the stage where it may offer value in real world problems, including optimisation of healthcare systems. One such problem is where to locate ambulances between calls in order to minimise ti…
▽ More
Background and motivation: Deep Reinforcement Learning (Deep RL) is a rapidly develo** field. Historically most application has been made to games (such as chess, Atari games, and go). Deep RL is now reaching the stage where it may offer value in real world problems, including optimisation of healthcare systems. One such problem is where to locate ambulances between calls in order to minimise time from emergency call to ambulance on-scene. This is known as the Ambulance Location problem.
Aim: To develop an OpenAI Gym-compatible framework and simulation environment for testing Deep RL agents.
Methods: A custom ambulance dispatch simulation environment was developed using OpenAI Gym and SimPy. Deep RL agents were built using PyTorch. The environment is a simplification of the real world, but allows control over the number of clusters of incident locations, number of possible dispatch locations, number of hospitals, and creating incidents that occur at different locations throughout each day.
Results: A range of Deep RL agents based on Deep Q networks were tested in this custom environment. All reduced time to respond to emergency calls compared with random allocation to dispatch points. Bagging Noisy Duelling Deep Q networks gave the most consistence performance. All methods had a tendency to lose performance if trained for too long, and so agents were saved at their optimal performance (and tested on independent simulation runs).
Conclusions: Deep RL agents, developed using simulated environments, have the potential to offer a novel approach to optimise the Ambulance Location problem. Creating open simulation environments should allow more rapid progress in this field.
△ Less
Submitted 13 January, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Provenance-Based Interpretation of Multi-Agent Information Analysis
Authors:
Scott Friedman,
Jeff Rye,
David LaVergne,
Dan Thomsen,
Matthew Allen,
Kyle Tunis
Abstract:
Analytic software tools and workflows are increasing in capability, complexity, number, and scale, and the integrity of our workflows is as important as ever. Specifically, we must be able to inspect the process of analytic workflows to assess (1) confidence of the conclusions, (2) risks and biases of the operations involved, (3) sensitivity of the conclusions to sources and agents, (4) impact and…
▽ More
Analytic software tools and workflows are increasing in capability, complexity, number, and scale, and the integrity of our workflows is as important as ever. Specifically, we must be able to inspect the process of analytic workflows to assess (1) confidence of the conclusions, (2) risks and biases of the operations involved, (3) sensitivity of the conclusions to sources and agents, (4) impact and pertinence of various sources and agents, and (5) diversity of the sources that support the conclusions. We present an approach that tracks agents' provenance with PROV-O in conjunction with agents' appraisals and evidence links (expressed in our novel DIVE ontology). Together, PROV-O and DIVE enable dynamic propagation of confidence and counter-factual refutation to improve human-machine trust and analytic integrity. We demonstrate representative software developed for user interaction with that provenance, and discuss key needs for organizations adopting such approaches. We demonstrate all of these assessments in a multi-agent analysis scenario, using an interactive web-based information validation UI.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
Integrating Deep Reinforcement Learning Networks with Health System Simulations
Authors:
Michael Allen,
Thomas Monks
Abstract:
Background and motivation: Combining Deep Reinforcement Learning (Deep RL) and Health Systems Simulations has significant potential, for both research into improving Deep RL performance and safety, and in operational practice. While individual toolkits exist for Deep RL and Health Systems Simulations, no framework to integrate the two has been established.
Aim: Provide a framework for integratin…
▽ More
Background and motivation: Combining Deep Reinforcement Learning (Deep RL) and Health Systems Simulations has significant potential, for both research into improving Deep RL performance and safety, and in operational practice. While individual toolkits exist for Deep RL and Health Systems Simulations, no framework to integrate the two has been established.
Aim: Provide a framework for integrating Deep RL Networks with Health System Simulations, and to ensure this framework is compatible with Deep RL agents that have been developed and tested using OpenAI Gym.
Methods: We developed our framework based on the OpenAI Gym framework, and demonstrate its use on a simple hospital bed capacity model. We built the Deep RL agents using PyTorch, and the Hospital Simulatation using SimPy.
Results: We demonstrate example models using a Double Deep Q Network or a Duelling Double Deep Q Network as the Deep RL agent.
Conclusion: SimPy may be used to create Health System Simulations that are compatible with agents developed and tested on OpenAI Gym environments.
GitHub repository of code: https://github.com/MichaelAllen1966/learninghospital
△ Less
Submitted 21 July, 2020;
originally announced August 2020.
-
Evolution Strategies Converges to Finite Differences
Authors:
John C. Raisbeck,
Matthew Allen,
Ralph Weissleder,
Hyungsoon Im,
Hakho Lee
Abstract:
Since the debut of Evolution Strategies (ES) as a tool for Reinforcement Learning by Salimans et al. 2017, there has been interest in determining the exact relationship between the Evolution Strategies gradient and the gradient of a similar class of algorithms, Finite Differences (FD).(Zhang et al. 2017, Lehman et al. 2018) Several investigations into the subject have been performed, investigating…
▽ More
Since the debut of Evolution Strategies (ES) as a tool for Reinforcement Learning by Salimans et al. 2017, there has been interest in determining the exact relationship between the Evolution Strategies gradient and the gradient of a similar class of algorithms, Finite Differences (FD).(Zhang et al. 2017, Lehman et al. 2018) Several investigations into the subject have been performed, investigating the formal motivational differences(Lehman et al. 2018) between ES and FD, as well as the differences in a standard benchmark problem in Machine Learning, the MNIST classification problem(Zhang et al. 2017). This paper proves that while the gradients are different, they converge as the dimension of the vector under optimization increases.
△ Less
Submitted 27 December, 2019;
originally announced January 2020.
-
Earthquake Early Warning and Beyond: Systems Challenges in Smartphone-based Seismic Network
Authors:
Qingkai Kong,
Qin Lv,
Richard M. Allen
Abstract:
Earthquake Early Warning (EEW) systems can effectively reduce fatalities, injuries, and damages caused by earthquakes. Current EEW systems are mostly based on traditional seismic and geodetic networks, and exist only in a few countries due to the high cost of installing and maintaining such systems. The MyShake system takes a different approach and turns people's smartphones into portable seismic…
▽ More
Earthquake Early Warning (EEW) systems can effectively reduce fatalities, injuries, and damages caused by earthquakes. Current EEW systems are mostly based on traditional seismic and geodetic networks, and exist only in a few countries due to the high cost of installing and maintaining such systems. The MyShake system takes a different approach and turns people's smartphones into portable seismic sensors to detect earthquake-like motions. However, to issue EEW messages with high accuracy and low latency in the real world, we need to address a number of challenges related to mobile computing. In this paper, we first summarize our experience building and deploying the MyShake system, then focus on two key challenges for smartphone-based EEW (sensing heterogeneity and user/system dynamics) and some preliminary exploration. We also discuss other challenges and new research directions associated with smartphone-based seismic network.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
Proof of Concept of Wireless TERS Monitoring
Authors:
Michael Allen,
Elena Gaura,
Ross Wilkins,
James Brusey,
Yuepeng Dong,
Andrew J. Whittle
Abstract:
Temporary earth retaining structures (TERS) help prevent collapse during construction excavation. To ensure that these structures are operating within design specifications, load forces on supports must be monitored. Current monitoring approaches are expensive, sparse, off-line, and thus difficult to integrate into predictive models. This work aims to show that wirelessly connected battery powered…
▽ More
Temporary earth retaining structures (TERS) help prevent collapse during construction excavation. To ensure that these structures are operating within design specifications, load forces on supports must be monitored. Current monitoring approaches are expensive, sparse, off-line, and thus difficult to integrate into predictive models. This work aims to show that wirelessly connected battery powered sensors are feasible, practical, and have similar accuracy to existing sensor systems. We present the design and validation of ReStructure, an end-to-end prototype wireless sensor network for collection, communication, and aggregation of strain data. ReStructure was validated through a six months deployment on a real-life excavation site with all but one node producing valid and accurate strain measurements at higher frequency than existing ones. These results and the lessons learnt provide the basis for future widespread wireless TERS monitoring that increase measurement density and integrate closely with predictive models to provide timely alerts of damage or potential failure.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
The Flip Diameter of Rectangulations and Convex Subdivisions
Authors:
Eyal Ackerman,
Michelle M. Allen,
Gill Barequet,
Maarten Löffler,
Joshua Mermelstein,
Diane L. Souvaine,
Csaba D. Tóth
Abstract:
We study the configuration space of rectangulations and convex subdivisions of $n$ points in the plane. It is shown that a sequence of $O(n\log n)$ elementary flip and rotate operations can transform any rectangulation to any other rectangulation on the same set of $n$ points. This bound is the best possible for some point sets, while $Θ(n)$ operations are sufficient and necessary for others. Some…
▽ More
We study the configuration space of rectangulations and convex subdivisions of $n$ points in the plane. It is shown that a sequence of $O(n\log n)$ elementary flip and rotate operations can transform any rectangulation to any other rectangulation on the same set of $n$ points. This bound is the best possible for some point sets, while $Θ(n)$ operations are sufficient and necessary for others. Some of our bounds generalize to convex subdivisions of $n$ points in the plane.
△ Less
Submitted 10 March, 2016; v1 submitted 16 December, 2013;
originally announced December 2013.
-
Surrogate Parenthood: Protected and Informative Graphs
Authors:
Barbara Blaustein,
Adriane Chapman,
Len Seligman,
M. David Allen,
Arnon Rosenthal
Abstract:
Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of provi…
▽ More
Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of providing what we call surrogates: alternate, less sensitive versions of nodes and edges releasable to a broader community. We describe techniques for interposing surrogate nodes and edges to protect sensitive graph components, while maximizing graph connectivity and giving users as much information as possible. In this work, we formalize the problem of creating a protected account G' of a graph G. We provide a utility measure to compare the informativeness of alternate protected accounts and an opacity measure for protected accounts, which indicates the likelihood that an attacker can recreate the topology of the original graph from the protected account. We provide an algorithm to create a maximally useful protected account of a sensitive graph, and show through evaluation with the PLUS prototype that using surrogates and protected accounts adds value for the user, with no significant impact on the time required to generate results for graph queries.
△ Less
Submitted 17 June, 2011;
originally announced June 2011.