-
An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging
Authors:
Sadjad Anzabi Zadeh,
W. Nick Street,
Barrett W. Thomas
Abstract:
Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging…
▽ More
Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging as an effective tool to achieve explainability. Our focus is on the maintenance dosing protocol. Results show that the final model is as easy to understand and deploy as the current dosing protocols and outperforms the baseline dosing algorithms.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Optimizing Warfarin Dosing using Deep Reinforcement Learning
Authors:
Sadjad Anzabi Zadeh,
W. Nick Street,
Barrett W. Thomas
Abstract:
Warfarin is a widely used anticoagulant, and has a narrow therapeutic range. Dosing of warfarin should be individualized, since slight overdosing or underdosing can have catastrophic or even fatal consequences. Despite much research on warfarin dosing, current dosing protocols do not live up to expectations, especially for patients sensitive to warfarin. We propose a deep reinforcement learning-ba…
▽ More
Warfarin is a widely used anticoagulant, and has a narrow therapeutic range. Dosing of warfarin should be individualized, since slight overdosing or underdosing can have catastrophic or even fatal consequences. Despite much research on warfarin dosing, current dosing protocols do not live up to expectations, especially for patients sensitive to warfarin. We propose a deep reinforcement learning-based dosing model for warfarin. To overcome the issue of relatively small sample sizes in dosing trials, we use a Pharmacokinetic/ Pharmacodynamic (PK/PD) model of warfarin to simulate dose-responses of virtual patients. Applying the proposed algorithm on virtual test patients shows that this model outperforms a set of clinically accepted dosing protocols by a wide margin. We tested the robustness of our dosing protocol on a second PK/PD model and showed that its performance is comparable to the set of baseline protocols.
△ Less
Submitted 23 December, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Same-Day Delivery with Fairness
Authors:
Xinwei Chen,
Tong Wang,
Barrett W. Thomas,
Marlin W. Ulmer
Abstract:
The demand for same-day delivery (SDD) has increased rapidly in the last few years and has particularly boomed during the COVID-19 pandemic. The fast growth is not without its challenge. In 2016, due to low concentrations of memberships and far distance from the depot, certain minority neighborhoods were excluded from receiving Amazon's SDD service, raising concerns about fairness. In this paper,…
▽ More
The demand for same-day delivery (SDD) has increased rapidly in the last few years and has particularly boomed during the COVID-19 pandemic. The fast growth is not without its challenge. In 2016, due to low concentrations of memberships and far distance from the depot, certain minority neighborhoods were excluded from receiving Amazon's SDD service, raising concerns about fairness. In this paper, we study the problem of offering fair SDD-service to customers. The service area is partitioned into different regions. Over the course of a day, customers request for SDD service, and the timing of requests and delivery locations are not known in advance. The dispatcher dynamically assigns vehicles to make deliveries to accepted customers before their delivery deadline. In addition to the overall service rate (utility), we maximize the minimal regional service rate across all regions (fairness). We model the problem as a multi-objective Markov decision process and develop a deep Q-learning solution approach. We introduce a novel transformation of learning from rates to actual services, which creates a stable and efficient learning process. Computational results demonstrate the effectiveness of our approach in alleviating unfairness both spatially and temporally in different customer geographies. We also show this effectiveness is valid with different depot locations, providing businesses with an opportunity to achieve better fairness from any location. Further, we consider the impact of ignoring fairness in service, and results show that our policies eventually outperform the utility-driven baseline when customers have a high expectation on service level.
△ Less
Submitted 22 December, 2021; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Deep Q-Learning for Same-Day Delivery with Vehicles and Drones
Authors:
Xinwei Chen,
Marlin W. Ulmer,
Barrett W. Thomas
Abstract:
In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they ha…
▽ More
In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they have limited capacity and require charging or battery swaps. To exploit the different strengths of the fleets, we propose a deep Q-learning approach. Our method learns the value of assigning a new customer to either drones or vehicles as well as the option to not offer service at all. In a systematic computational analysis, we show the superiority of our policy compared to benchmark policies and the effectiveness of our deep Q-learning approach. We also show that our policy can maintain effectiveness when the fleet size changes moderately. Experiments on data drawn from varied spatial/temporal distributions demonstrate that our trained policies can cope with changes in the input data.
△ Less
Submitted 7 March, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.