-
An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging
Authors:
Sadjad Anzabi Zadeh,
W. Nick Street,
Barrett W. Thomas
Abstract:
Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging…
▽ More
Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging as an effective tool to achieve explainability. Our focus is on the maintenance dosing protocol. Results show that the final model is as easy to understand and deploy as the current dosing protocols and outperforms the baseline dosing algorithms.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Optimizing Warfarin Dosing using Deep Reinforcement Learning
Authors:
Sadjad Anzabi Zadeh,
W. Nick Street,
Barrett W. Thomas
Abstract:
Warfarin is a widely used anticoagulant, and has a narrow therapeutic range. Dosing of warfarin should be individualized, since slight overdosing or underdosing can have catastrophic or even fatal consequences. Despite much research on warfarin dosing, current dosing protocols do not live up to expectations, especially for patients sensitive to warfarin. We propose a deep reinforcement learning-ba…
▽ More
Warfarin is a widely used anticoagulant, and has a narrow therapeutic range. Dosing of warfarin should be individualized, since slight overdosing or underdosing can have catastrophic or even fatal consequences. Despite much research on warfarin dosing, current dosing protocols do not live up to expectations, especially for patients sensitive to warfarin. We propose a deep reinforcement learning-based dosing model for warfarin. To overcome the issue of relatively small sample sizes in dosing trials, we use a Pharmacokinetic/ Pharmacodynamic (PK/PD) model of warfarin to simulate dose-responses of virtual patients. Applying the proposed algorithm on virtual test patients shows that this model outperforms a set of clinically accepted dosing protocols by a wide margin. We tested the robustness of our dosing protocol on a second PK/PD model and showed that its performance is comparable to the set of baseline protocols.
△ Less
Submitted 23 December, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Does Parking Matter? The Impact of Search Time for Parking on Last-Mile Delivery Optimization
Authors:
Sara Reed,
Ann Melissa Campbell,
Barrett W. Thomas
Abstract:
Parking is a necessary component of traditional last-mile delivery practices, but finding parking can be difficult. Yet, the routing literature largely does not account for the need to find parking. In this paper, we address this challenge of finding parking through the Capacitated Delivery Problem with Parking (CDPP). Unlike other models in the literature, the CDPP accounts for the search time fo…
▽ More
Parking is a necessary component of traditional last-mile delivery practices, but finding parking can be difficult. Yet, the routing literature largely does not account for the need to find parking. In this paper, we address this challenge of finding parking through the Capacitated Delivery Problem with Parking (CDPP). Unlike other models in the literature, the CDPP accounts for the search time for parking in the objective and minimizes the completion time of the delivery tour. When we restrict the customer geography to a complete grid, we identify conditions for when a Traveling Salesman Problem (TSP) solution that parks at each customer is an optimal solution to the CDPP. We then determine when the search time for parking is large enough for the CDPP optimal solution to differ from this TSP solution. We also identify model improvements that allow reasonably-sized instances of the CDPP to be solved exactly. We introduce a heuristic for the CDPP that quickly finds high quality solutions to large instances. Computational experiments show that parking matters in last-mile delivery optimization. The CDPP outperforms industry practice and models in the literature showing the greatest advantage when the search time for parking is high. This analysis provides immediate ways to improve routing in last-mile delivery.
△ Less
Submitted 31 January, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Same-Day Delivery with Fairness
Authors:
Xinwei Chen,
Tong Wang,
Barrett W. Thomas,
Marlin W. Ulmer
Abstract:
The demand for same-day delivery (SDD) has increased rapidly in the last few years and has particularly boomed during the COVID-19 pandemic. The fast growth is not without its challenge. In 2016, due to low concentrations of memberships and far distance from the depot, certain minority neighborhoods were excluded from receiving Amazon's SDD service, raising concerns about fairness. In this paper,…
▽ More
The demand for same-day delivery (SDD) has increased rapidly in the last few years and has particularly boomed during the COVID-19 pandemic. The fast growth is not without its challenge. In 2016, due to low concentrations of memberships and far distance from the depot, certain minority neighborhoods were excluded from receiving Amazon's SDD service, raising concerns about fairness. In this paper, we study the problem of offering fair SDD-service to customers. The service area is partitioned into different regions. Over the course of a day, customers request for SDD service, and the timing of requests and delivery locations are not known in advance. The dispatcher dynamically assigns vehicles to make deliveries to accepted customers before their delivery deadline. In addition to the overall service rate (utility), we maximize the minimal regional service rate across all regions (fairness). We model the problem as a multi-objective Markov decision process and develop a deep Q-learning solution approach. We introduce a novel transformation of learning from rates to actual services, which creates a stable and efficient learning process. Computational results demonstrate the effectiveness of our approach in alleviating unfairness both spatially and temporally in different customer geographies. We also show this effectiveness is valid with different depot locations, providing businesses with an opportunity to achieve better fairness from any location. Further, we consider the impact of ignoring fairness in service, and results show that our policies eventually outperform the utility-driven baseline when customers have a high expectation on service level.
△ Less
Submitted 22 December, 2021; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Deep Q-Learning for Same-Day Delivery with Vehicles and Drones
Authors:
Xinwei Chen,
Marlin W. Ulmer,
Barrett W. Thomas
Abstract:
In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they ha…
▽ More
In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they have limited capacity and require charging or battery swaps. To exploit the different strengths of the fleets, we propose a deep Q-learning approach. Our method learns the value of assigning a new customer to either drones or vehicles as well as the option to not offer service at all. In a systematic computational analysis, we show the superiority of our policy compared to benchmark policies and the effectiveness of our deep Q-learning approach. We also show that our policy can maintain effectiveness when the fleet size changes moderately. Experiments on data drawn from varied spatial/temporal distributions demonstrate that our trained policies can cope with changes in the input data.
△ Less
Submitted 7 March, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Special Polynomials and Exact Solutions of the Dispersive Water Wave and Modified Boussinesq Equations
Authors:
Peter A Clarkson,
Bryn W M Thomas
Abstract:
Exact solutions of the dispersive and modified equations are expressed in terms of special polynomials associated with rational solutions of the fourth Painleve equation, which arises as generalized scaling reductions of these equations. Generalized solutions that involve an infinite sequence of arbitrary constants are also derived which are analogues of generalized rational solutions for the Ko…
▽ More
Exact solutions of the dispersive and modified equations are expressed in terms of special polynomials associated with rational solutions of the fourth Painleve equation, which arises as generalized scaling reductions of these equations. Generalized solutions that involve an infinite sequence of arbitrary constants are also derived which are analogues of generalized rational solutions for the Korteweg-de Vries, Boussinesq and nonlinear Schrodinger equations
△ Less
Submitted 12 March, 2009;
originally announced March 2009.