Markov Decision Process and Approximate Dynamic Programming for a Patient Assignment Scheduling problem
Małgorzata M. O’Reilly
Sebastian Krasnicki
22footnotemark: 2
James Montgomery
Mojtaba Heydar
Richard Turner
Pieter Van Dam
Peter Maree
email: [email protected] of Natural Sciences, University of Tasmania, Hobart TAS 7001, Australia.School of Information & Communication Technology, University of Tasmania, Hobart TAS 7001, Australia.BHP, Perth, Western Australia.School of Medicine, University of Tasmania, Hobart TAS 7001, Australia.School of Nursing, University of Tasmania, Hobart TAS 7001, Australia.Strategy and Planning, Department of Health, Hobart TAS 7001, Australia.
(June 26, 2024)
Abstract
We study the Patient Assignment Scheduling (PAS) problem in a random environment that arises in the management of patient flow in the hospital systems, due to the stochastic nature of the arrivals as well as the Length of Stay distribution.
We develop a Markov Decision Process (MDP) which aims to assign the newly arrived patients in an optimal way so as to minimise the total expected long-run cost per unit time over an infinite horizon. We assume Poisson arrival rates that depend on patient types, and Length of Stay distributions that depend on whether patients stay in their primary wards or not.
Since the instances of realistic size of this problem are not easy to solve, we develop numerical methods based on Approximate Dynamic Programming. We illustrate the theory with numerical examples with parameters obtained by fitting to data from a tertiary referral hospital in Australia, and demonstrate the application potential of our methodology under practical considerations.
Funding: This research was supported by funding through the Australian Research Council Linkage Project LP140100152.
1 Introduction
In this paper we consider the Patient Assignment Scheduling (PAS) problem in which, at the start of each day, newly arrived patients are assigned to beds in different wards, considering their needs, priority, and available resources, in a way so as to optimise total expected daily cost over an infinite time horizon. We use the term ‘bed’ in the sense of the resource that consists of the staff (nurses and clinicians) available to attend to a patient in a physical bed, rather than the physical bed itself.
We assume that that patients are assigned to the beds at the start of time period , see Figure 1. The allocation involves a group of patients, waiting to be admitted to a suitable bed. Upon ceompleting their treatments, patients are discharged. The duration of time between arrival and discharge is a random variable referred to as the Length of Stay (LoS). We assume an infinite horizon problem and develop a Markov Decision Process (MDP) to solve this stochastic problem so as to minimise the total expected long-run cost per unit time.
Here, without loss of generality we assume that the index corresponds to day , but note that this could denote some other time interval of interest such as an -hour time block.
Figure 1: Decision epochs of the PAS problem. At the start of time period we observe some state and then make some decision about how to allocate/transfer patients. This transforms the system into a post decision state and then the system evolves in a stochastic manner, until new state is observed at the start of the next time period .
Patient assignment is a key process in the management of hospitals, whereby bed managers allocate inbound patients to appropriate wards, rooms and beds. These decisions are made by teams of ward and bed managers at intermittent times throughout the day, using their experience and understanding of the natural dynamics of the hospital. From 2018 to 2019, the Australian hospital system recorded 11.5 millions hospitalisations, and 8.4 million emergency department presentations, resulting in 365,000 emergency admissions [2]. This corresponded to 31,500 patient assignment decisions made daily across Australia, with 1000 of these being unplanned [2].
These decisions are not trivial, since complications can arise due to variations in hospital policy and the highly stochastic and complex nature of healthcare. Potential patients can arrive at any time of day, with a variety of conditions and specific needs. Hospital policy may affect where patients can be assigned, on the basis of a patient’s gender or to comply with prescribed healthcare standards. Managers may also evaluate whether it is best to transfer patients between wards. Because of these complicating factors it can be unclear if the assignments made by bed managers are optimal.
The importance of maintaining good patient flow cannot be understated. Carter, Puch and Larson in their literature review of emergency department (ED) crowding found that ED overcrowding has a significant positive correlation with patient mortality and with patients leaving the hospital untreated [8]. Morley et al. in their literature review also found an increase in patient mortality, as well as a higher exposure to error, poorer patient outcomes and increased patient length of stay, both in the ED, and in the ward to which a patient is eventually assigned [20]. They also noted that the inability to quickly assign patients from the ED to an inpatient bed, was a significant contributor to overcrowding.
Regarding overcrowding in non-emergency wards, there is a long standing debate on what level of hospital occupancy is too high. Bagust et al. [4] found in their simulation-based approach that hospital occupancy above created discernible risk, while occupancy above caused regular bed crises. This figure has since been adopted by many as a standard benchmark for hospital occupancy [9]. However, multiple other authors [5, 9] have emphasized that adopting as the figure at which hospitals operate most efficiently, misinterprets Bagust et al. [4], as this figure only applies to the specific example in [4]. Further, Bain et al. [5] state that using any single figure as a target for average occupancy is overly simplistic. Cummings et al. [9] note that while the results of [4] have been misinterpreted, there is a positive correlation between hospital occupancy and emergency department delays.
Solutions to these problems require improvements in the management of patient flow. Howell et al. [17] found that by assigning additional resources to the tasks of bed management and patient assignment, ED length of stay was reduced by an average of 98 minutes for admitted patients. Clearly, tangible benefits are made to patients if they are assigned correctly.
We call the mathematical formulation of patient assignment, the Patient Assignment Scheduling (PAS) problem, where we attempt to find optimal assignments of all patients as they arrive to the hospital. This problem was described by Demeester et al in [11], Bilgin et al. in [6], and Vancroonenburg et al. in [26].
Demeester et al. [11] define a useful approach of hard and soft constraints. Patient assignments that violate hard constraints are infeasible, while assignments that violate soft constraints incur some cost, which contribute to the objective function. An example of a hard constraint is that two patients cannot be assigned to the same bed, while an example of a soft constraint is that a patient should be assigned to the ward that corresponds to the their condition.
In their model, the variable for patient length of stay (LoS) used is deterministic, and so too is the cost of assigning a patient. This approach ignores the inherent stochasticity of a hospital system, but provides a useful framework for building upon.
Bilgin et al. [6] and Vancroonenburg et al. in [26] use a very similar approach, defining a set of constraints to form an objective function, but each adding their own novel idea. Bilgin et al. [6] combine the PAS problem with a nurse scheduling problem, while Vancroonenburg et al. in [26] introduce parameters to study the effect of uncertainty and so their model is no longer fully deterministic. However, to truly capture the stochasticity of the system, consideration must be given to how the system evolves in time. The model must incorporate current assignment decisions having an impact on future assignments.
Using a similar integer programming approach to [6, 11, 26], Abera et al. [3] demonstrate how the PAS problem can be solved in a stochastic scenario. They consider first that a patient’s LoS is a random variable that may take any value on some distribution, and is unknown at the time of decision making. They then consider that patient arrivals to the system are also random, meaning that the cost of making any given assignment is stochastic. That is, the cost of an assignment depends on how long a patient stays, and what types of patients will arrive next. To evaluate the assignment cost, and hence the optimal assignment, Abera et al. [3] simulate the system several times to determine the expected cost over some planning horizon. The planning horizons used as examples in [3] are 14, 28 and 56 days.
One shared result that is found in [3, 6, 11, 26], is that the integer linear program of any non-trivial size problem, cannot be solved explicitly. That is, it was not achievable to find the exact optimal solution. Demeester et al. [11] found in their first example of a hospital with beds, an optimal solution was not found after a whole week of computation, which is an unacceptable amount of time for assigning patients. To overcome this, various heuristics were required in all of the above work. The authors of the papers listed above all begin by using a neighbourhood search technique. That is, starting with some initial solution, and searching ‘neighbouring’ solutions to see if there is an improvement. The exact methodology of how the neighbourhood is defined and searched, differs between authors. Demeester et al. [11] use a Tabu search, Abera et al. [3] use simulated annealing, while Bilgin et al. [6] use a hyper-heuristic approach (heuristics for selecting heuristics). Regardless of the approach, it is clear that heuristics are a necessity for solving problems of any realistic size.
While the formulation of the PAS problem as an integer program is well researched, alternative formulations exist. Hulshof et al. [18] and Dai and Shi [10] build models related to the PAS problem, based on Markovian Decision Processes (MDP). Such an alternative formulation allows for an exploration of possible decisions that a hospital manager can make. Hulshof et al. [18] explore resource assignment in hospitals and decision making on how many patients should be assigned to different queues. Dai and Shi [10] explore the use of patient overflow (allowing wards to ‘overflow’ into one another when full) and take a ‘higher-level’ approach to decision making. Rather than making assignment decisions for each patient, decisions are made on whether to allow patient overflow for each time period.
Similarly to the integer program approaches, the exact solutions of these MDPs cannot be obtained for large-size problems. Approximate Dynamic Programming (ADP) approaches are applied in [10, 18] to overcome what is called the curse of dimensionality. The ADP approach, described by Powell in [23], allows the cost in each state to be approximated by a set of features, which record some partial information about states, and corresponding weights. In addition to ADP, Dai and Shi [10] utilize a least-squares temporal difference (LSTD) learning algorithm to solve the problem.
We now discuss the work of Heydar et al. [15], which is the basis of the model proposed here. Heydar et al. [15] models the PAS as a continuous-time MDP, where the system is observed at every individual patient arrival or departure. The goal of the model is to identify the best possible patient assignment given information about the arriving patient and the occupancy of the hospital. Heterogeneity is included in the model by defining different patient classes and wards, and assuming that each patient class has an ideal ward, while other wards are unsuitable. In addition to assigning patients directly, Heydar et al. [15] also consider transfers of patients between wards to best fit patients, so as to minimise their appropriately defined objective function.
However, Heydar et al. [15] note that transfers are positively correlated with patient mortality and LoS, as shown in [14], meaning that transfers are given careful consideration and an additional cost in [15]. We adopt this feature as a key component in our model.
Similarly to Abera et al. [3], Heydar et al. [15] evaluate total expected cost over a finite horizon, meaning limited consideration is given to the state of the system in the long run. In order to address this gap, we generalise the model used in Heydar et al. [15] and extend it to an infinite horizon problem. In order to address the curse of dimensionality, we propose methodology based on ADP and the approximate policy iteration algorithm used by Dai and Shi in [10].
Furthermore, we assume that patient assignments are made at discrete time points such as at the start of each day. This is arguably a more realistic approach involving a group of patients to be assigned, which also means that assignment decisions are far more complex, as multiple assignments must be made, and an order of assignments must be decided.
Finally, we demonstrate that the algorithm used by Dai and Shi in [10] can be adapted to our model to find near-optimal solutions to the PAS problem.
The rest of this paper is structured as follows. In Section 2 we describe the key components of our model, such as state space, transition probabilities, decision variables, constraints, and cost variables. In Section 3 we give the details of the Approximate Dynamic Programming approach, and illustrate the application of the theory through our numerical examples in Section 4. This is followed by concluding remarks in Section 5.
2 Markov Decision Process
Let be the set of all patient types, where type may correspond to the medical needs of the patients, their age, gender, and other aspects of inclusive care, see e.g. [13]. We assume that type- patients arrive at a rate per day on day . Further, we assume that there are wards in the hospital, each with capacity , . The waiting room labelled has capacity .
The set of wards that are suitable for patients type is denoted , for some . We denote by the best ward for type- patient, by the second-best ward for type- patient, and so on, and let be the ordered sequence of wards in corresponding to type- patients. For example, if and , then this means that ward is the best ward for type- patient, ward is the second best, ward is the third best, and wards and are not suitable for type- patients.
In Section 2.1 below, we contruct a model based on a suitable Markov Decision Process to find the optimal policy from the set of all policies consisting of decisions taken whenever state in state space is observed, so as to minimise the long-run expected cost per unit time,
(1)
where is a state of the system observed at the start of day , and is the cost of a decision then taken assuming policy is in place.
In practice, we find by solving the Bellman’s optimality equation for all states ,
where is the minimum long-run average cost given current state , and is the set of all decisions that are possible when state is observed. We use Approximate Dynamic Programming methods [21, 22, 24] to solve Equation (2) in real-sized problems, in which the number of states and decisions are intractable, as noted in [10, 12, 15, 18].
2.1 Model
Consider a discrete-time Markov chain , where is the state of the system at the start of day recording , the number of type- patients in ward , and , the number of newly arrived type- patients who are yet to be assigned to the wards. The state space of the process is given by,
with possibly additional constraints on the total number of accepted arrivals, as discussed below.
2.2 Decisions
Suppose that we observe state at the start of a day and choose a suitable decision from the set of available decisions , such that , where
•
is the number of type- newly arrived patients to assign to ward , and,
•
is the number of type- patients to be transferred from ward to ward .
Each decision should satisfy the following sets of constraints. First, all newly arrived type- patients have to be assigned to some ward, and so
(3)
Next, the total number of patients in ward , denoted , and given by,
(4)
(5)
where is the total number of type- patients in ward , cannot exceed the capacity of the ward, and so,
(6)
Further, we may only transfer the available patients, which gives,
(7)
and if a transfer occurs, it should be in one direction, that is,
(8)
where is an indicator function taking value if the statement in the brackets is true, and otherwise, which ensures that only one of and takes a nonzero value.
2.3 Transition probabilities
Now, given decision and current state , we define the following key random variables:
•
, the number of type- patients who depart from ward during one day;
•
, the total number of departures;
•
, the total number of available beds after departures;
•
, the total number of arrivals;
which we use below to determine the transition probabilities of the process . The distribution of these variables depends on , as we discuss below.
Assuming state and decision , the resulting post-decision state is such that,
(9)
Next, the process transitions from state on day to some state on day , for , with probability given by
(10)
when the condition
(11)
is met for all , ; and otherwise.
We apply conditional probabilities in (10) since the number of accepted type- arrivals may depend on the number of available beds after departures, and consider the following alternative modelling approaches:
•
Suppose that there is no restriction on the total number of arrivals, as in Dai and Shi [10]. Then follows Poisson distribution, , and so
(12)
In this approach, arriving patients are not lost to the system, however the number of patients still waiting to be assigned and their total waiting times could be very large (and exceed many days). This may not be a realistic assumption since in practice the capacity of the system is limited (due to the availability of beds and staffing) and there are limits on the accepted maximum waiting times for various patient types (e.g. -hour limit for some types of
emergency patients).
•
Suppose that the total number of arrivals may not exceed the total number of available beds. Then, with recording the total number of available beds after the departures, we have,
(13)
(14)
and, with ,
(15)
since the conditional distribution of given is multinomial, where is the probability that an arriving patient is of type . In this approach, patients arriving to a system with no free beds, are redirected to other health systems. Such approach may be more realistic and can be used to determine the rate of patients that are redirected to help decide a suitable size of the hospital system.
To model random variables , we assume that departures are independent of one another and so follows Binomial distribution, , with
(16)
(17)
where is a random variable recording the remaining length of stay of type- patient that is in ward at the start of the day. That is, for each type- patient in ward , we perform a Bernoulli trial with probability of success to determine if the patient leaves the system on the following day or not.
2.4 Costs
There is an immediate cost associated with decision given current state , which may include costs of assignment, transfer, and patients being in nonprimary wards, defined as follows.
•
Assignment cost: .
•
Transfer cost: .
•
Penalty cost for being in a nonprimary ward: .
The total immediate cost of decision given state is then,
where and are given by Eqs. (10) and (18), respectively.
3 Approximate Dynamic Programming approach
Policy Iteration is a standard method of solving Equation (2) when the size of the state space is not too large, see Algorithm 1 in Appendix A, also refer to Puterman [25] for further details.
However, Policy Iteration is not suitable here, and so we apply Approximate Dynamic Programming methods introduced by Powell in [21, 22, 24] to address the curses of dimensionality in Eq. (2), as follows.
1.
First, similar to Heydar et al. [15], we apply basis functions , , to record some suitable information about states , referred to as state features, and then the approximation,
(20)
where is the vector of features, and is the vector of corresponding weights , evaluated at the -th iteration of an algorithm.
2.
Next, we apply the Approximate Policy Iteration presented by Dai and Shi in [10], in which the weights are recursively updated in an iteration that is repeated times. Each iteration involves a simulation of states for some large . We summarise this approach in Algorithms 2–3 in Appendix A. We note that the Markov chains applied in our models are irreducible and positive recurrent, and so the algorithms are guaranteed to converge as [10].
Consider our Model in Section 2.1. Suppose that the vector of features of state is the vector , and so a vector containing all information about state . We note that to compute the expression in Line 6 of Algorithms 2 in Appendix A and in Line 4 in Algorithm 3 in Appendix A, it is then convenient to apply the following equivalence. Assuming , and with , we have
(21)
(22)
(23)
(24)
where is given by Eq. (9), (binomial mean), is a random variable recording the number of type- patients in ward and a random variable recording the number of type- patients waiting to be assigned when state is observed, and is the total number of arrivals.
If the number of arrivals is unrestricted with , then . Alternatively, if the number of arrivals is restricted so that we only allow arrivals that can be assigned, which can be written as,
(25)
then we have the following approach for evaluating .
Let be the random variable recording the total number of departures,
(26)
taking values , with . Let be the number of available beds for the new arrivals, given by
(27)
taking values . Therefore,
and so
(28)
where .
Further, in order to compute in (28), we note that is a sum of independent binomial random variables such that .
A method for computing the distribution of a sum of independent binomial random variables was discussed by Butler and Stephens in [7]. Below, we suggest an alternative method, which relies on simple matrix multiplications and standard theory of Markov chains.
Without loss of generality, suppose that is a random variable such that are independent Binomial random variables with some parameters and , for .
To compute , it is convenient to construct a discrete-time Markov chain corresponding to the Bernoulli trials such that a success at each trial results in the chain moving one step to the right, and failure results in the chain remaining at the original state. Then, the distribution of the chain after all trials will give the distribution of .
So, we consider a discrete-time Markov chain terminating at time , with state space and an initial state , which evolves as follows. We perform the first Bernoulli trials at times and let , . Then, the distribution of the chain after the first trials, and so at time , is given by
(29)
where with , and such that .
Next, we repeat this and perform Bernoulli trials at times and let , , for each . Then, the distribution after additional trials is given by the recursion
(30)
where , . To compute we apply the formula
(31)
where
It follows that, for ,
(33)
This method for computing the probabilities for a sum of independent nonnegative random variables that take values in finite sets, can be applied for any desired discrete distributions of , by replacing (31) with a suitable formula for .
Alternatively, can be computed by numerically inverting the probability generating function of the random variable , which here is given by
(34)
using a suitable inversion algorithm, for example see Abate and Whitt [1].
4 Numerical examples
Here, we construct examples to illustrate the theory and the application of algorithms discussed above. First, in Example 1, we construct a Markov model with a small state space, and demonstrate that the approximate solution converges to the exact optimal solution (which we obtained by applying standard dynamic programming methods). Next, in Example 2, we construct a realistically-sized Markov model with assumptions driven by practical considerations driven by the conditions of real-world hospitals, and demonstrate the application potential of our methodology.
4.1 Small-sized example
We consider the following simple example to illustrate the application of our Model in Section 2.1. As the size of the state space in this example is small enough to allow for the application of standard dynamic programming methods, we apply Algorithm 1 in Appendix A to find the exact solution and then compare it with the approximation obtained using Algorithms 2–3.
Example 1
Consider a healthcare facility with wards, with capacity for each ward , and patient types. Assuming that the maximum capacity of the waiting area is , that is , and that the arrivals are only accepted if there is available capacity in the system according to
(35)
we define the state space of the system as follows:
with
Further, assume that ward is the preferred ward for patient type , for , which is reflected by the expected values of the LoS, which are lower when patient type stays in ward for the whole duration of their stay. To model this, we assume that probabilities that patient type that is in ward today, leaves the hospital the next day, are given by,
(38)
and so the expected LoS for patient type is days if they stay in ward , or if they stay in ward , and some number between these two if the patient is transferred between the wards during their stay at the hospital. For patient type we have days if they stay in ward , or if they stay in ward , and some number between these two if the patient is transferred between the wards during their stay at the hospital.
We consider two decisions,
•
, assign arrived patients to their best available wards without transferring the patients between the wards, and
•
, assign arrived patients to their best available wards and allow transferring patients if required.
As example, given state
decision with transform it into post decision state
and then the process will move to state
if patient type departs from ward , or to state
if patient type departs from ward . Alternatively, decision will transform into post-decision state
and then the process will move to state
if patient type departs from ward , or to state
if patient type departs from ward . No arrivals to waiting area are permitted if they cannot be assigned, that is, when the system is already full.
We assume the following cost parameters:
•
Assignment cost to ward per type- patient: for all .
•
Transfer cost from ward to ward per type- patient: for all .
•
Penalty cost for being in a nonprimary ward per type- patient: for all .
We apply policy iteration summarised in Algorithm 1 in Appendix A to find the exact solution. To do so, we first write explicit expressions for the probability matrices and cost vectors for . The details of this derivation are given in Appendix B.
From policy iteration we identify the optimal policy in our states of interest to be , ‘assign the waiting patients to their most preferred ward by transferring patients currently in the ward’. We determine that this optimal policy invokes a long run average cost per stage of units.
We now present the results of the approximate policy iteration algorithm, summarised in Algorithms 2 and 3. For simulations, we assumed the starting value of each weight .
Figure 2: Values of in iteration of Algorithm 2 in in Example 1. Simulations were run for and states, respectively.
Figure 3: Values of (blue) compared to the optimal value (red) in iteration of Algorithm 2 in Example 1. Simulations were run for and states, respectively.
We find that the estimated value of converges to the optimal value for large values of , after a small number of iterations. As expected, the values of and converge more tightly as , the number of states simulated, increases, as shown in Figure 3. Second, we observe that in this example, each value of converges to a distinct value, as shown in Figure 2. This is because the parameters and are asymmetric, so we do not expect patients of different types to be weighted equally.
4.2 Realistically-sized example
We consider a realistically-sized example of our Model in Section 2.1. The structure of the model here is influenced by the practical considerations within real-world hospitals, in which patients are allocated to their primary (preferred) wards based on their types (medical needs), and when these wards are full, suitable policies are applied to decide which nonprimary ward a patient should be allocated to instead. Furthermore, we assume that patients arriving to a hospital when the system is full (all wards are full), are redirected to another facility. Therefore, the number of daily arrivals is unrestricted, however the number of arrivals admitted to the hospital is bounded by the system capacity. We fitted the parameters of the arrival and length of stay distributions to data, and focused on the application of our model to the analysis of how decision making affects the number of patients in nonprimary wards.
Example 2
Similary to Dai and Shi [10], assume five types of patients, (): Orthopedic (Orth), Cardio (Card), Surgery (Surg), General Medicine (GenMed), and Other Medicine (OthMed); and corresponding wards, (), most suitable to these patient types, respectively.
To estimate the key parameters of our model, we applied Matlab and performed the analysis of five-year patient flow data obtained from an Australian tertiary referral hospital, see Table 1.
(Ortho)
(Card)
(Surg)
(GenMed)
(OthMed)
,
,
Table 1:
Model parameters in Example 2 ( and were fitted to data, and the remaining parameters were estimated): (daily arrival rate of patients type ), (probability of patient of type departing from ward within a day), and (capacity of ward ). The total capacity of the system is . We apply (probability of patient of type departing within a day, assuming they are in the most suitable ward). When , we let , for
some (so that the LoS of patients that are in less suitable wards, is increased). Here, we set
(and so the LoS of patients in less suitable wards increases by on the average).
Further, to estimate , for each , we applied M/M/N/N queueing model with arrival rate and service rate to find such that the proportion of time the queue is full is .
In order to focus on the impact of decision making on the number of patients observed in nonprimary wards, we applied the following cost parameters:
•
Transfer cost from ward to ward per type- patient: for all and assume large value at (to avoid such decision).
•
Penalty cost for being in a nonprimary ward per type- patient: for all .
(Here, we assumed the assignment costs for each admitted or redirected patient, but have not included these costs in the optimisation, since all patients need to be admitted to some facility, and our focus is on the number of patients in nonprimary wards.)
We assume that the capacity of the waiting area is unlimited with and so the total number of arrivals is unrestricted. However, the arrivals are only accepted if there is available capacity in the system according to , where is post-decision state. Patients that may not be admitted, are redirected to another facility. The state space of the system is then
Next, we assume the following order of assignment, represented as a matrix , where means that ward is the first choice for type , is the second choice for type , and so on, with
(45)
As example, for the assignment of ’Ortho’ patients, the considered order of wards is ‘Ortho’, then ‘Surg’, then ‘GenMed’, then ‘OthMed’, and ‘Card’ is the last choice (used when there are no other choices). This order is similar to the order of assignment in Dai and Shi [10, Figure 1].
Suppose that patient type represents the order of assignment, and so we assign patients in the order of their type, assigning type patients first, then type patients, and so on. We consider two decisions,
•
, assign arrived patients to their best available wards, in the order of their types, without transferring patients between the wards, and
•
, assign arrived patients to their best available wards, in the order of their types, and allow transferring patients if required, with the constraint of no more than transfers in total, respectively.
As example, if
•
, , for (there are type- patients and type- patients in the queue);
•
and , , for (there are patients in ward , so it is full),
then applying means that
•
type- patients will be assigned to ward , while type- patients will be assigned to their best available ward other than ward ,
•
while type- patients will need to be transferred from ward to their best available ward (to make space for type- patients),
•
and finally, type- patients will be assigned from the queue to their best available ward.
More precisely, the assignments of patients corresponding to decisions and , are described in Algorithms 4-5 in Appendix A, respectively.
Next, we present the simulation output in Figures 4-6, focusing on the impact of these decisions on the number of patients observed that are in nonprimary wards (and ignoring the costs of these decisions for now). Simulation output suggests that under the model parameters assumed in Table 1, a policy in which we apply decision all the time, results in a better performance of the system in the sense that it reduces the number of patients in nonprimary wards when compared with applying decision or all the time. The performance of the system under decision is only slightly better than when compared with decision however.
In order to evalute the performance of the system under decisions , we require to also consider the costs. Since the size of the state space is very large, we apply the Approximate Policy Iteration summarised in Algorithms 2–3 in Appendix A.
We consider the vector of features of state defined as
(47)
that is, for each ward , the total number of type- patients in the primary ward ; , the total number of other-type patients in that ward; and for each patient type , the number of type- patients in the queue waiting to be assigned.
Then, by (24), , and given state and decision , we have
which is the expression we use in our computatations in Algorithms 2–3. The output presented in Figures 7-8 demonstrates that the algorithm converged as expected.
To analyse the impact of the algorithm on the performance of the system, we have then also simulated the evolution of a system in which the decision given current state is chosen according to
(48)
using the parameters obtained from the output of the algorithm, and compared the output with the performance of the system under decisions . Each simulation , was executed over a 5-year period. The output is presented in Figures 9-10 and Table 2 below.
Policy
Mean cost
Mean nonprimary
Mean redirected
Apply a=1 always
6.8177
34.0887
6.6719
Apply a=2 always
5.7643
20.1370
5.9918
Apply a=3 always
5.6983
18.9862
5.9408
Near optimal solution
5.6449
20.2566
5.9720
Table 2: The output from simulations in Example 2.
We observe (Figure 9, Table 2) that under the near-optimal policy obtained by the application of this approach, the costs are minimised (costs are the lowest under near-optimal policy, and are the highest under ), with mean costs: (), (), (), and (our near optimal solution). Simultaneously, the performance of the system in the sense of reducing the number of patients observed in the nonprimary wards and the number of redirected patients per day, is also improved. The number of patients in nonprimary wards is significantly lower than under , and similar to but with lower costs than under .
Next (Figure 10), under near-optimal solution, of the time no additional transfers are required. Also, under near-optimal solution, decisions are chosen approximately , and of the time, respectively.
Figure 4: Simulation of a system with limited capacity under policy that applies (no transfers): is the number of type patients in ward , is the number of patients in nonprimary wards on day , post decision. We apply (45) and the parameters from Table 1.
Figure 5: Simulation of a system with limited capacity under policy that applies (with no more than transfers): We note the reduction of the total number of patients in nonprimary wards in comparison to the output for in Figure 4.
Figure 6: Simulation of a system with limited capacity under policy that applies (with no more than transfers): We note the further reduction of the total number of patients in nonprimary wards in comparison to the output for in Figure 5.
Figure 7: Values of in iteration of Algorithm 2 in Example 2, for and .
Figure 8: Values of in iteration of Algorithm 2 in Example 2, for and .
Figure 9: Simulations under decisions (top row), (middle row), and near-optimal solution.
Figure 10: Percentage of decisions made under near-optimal solution (top left); and the number of additional transfers needed (if they were permitted without constraints).
5 Conclusion
We proposed a Markov Decision Process for a Patient Assignment Scheduling problem where, at the start of each time period (e.g. day, 8-hour block etc), patients are allocated to suitable wards, following relevant hospital policy. Due to the large size of the state space required to record the information about the hospital system, we developed methodology based on Approximate Dynamic Programming, for the computation of near-optimal solutions. We demonstrated the application potential of our model through examples with parameters fitted to data from a tertiary referral hospital in Australia.
Our model can be applied to a range of scenarios that are of interest in practical contexts in hospitals. We will analyse scheduling problems in data-driven examples, and consider additional costs, such as waiting costs or redirecting costs, and decisions such as transfers of inpatients within the hospitals that are not triggerred by arrivals, and take into account the duration of stay at the time of decision making. We will also consider scenarios with time varying parameters, in particular with seasonal components. These analyses will be reported in our forthcoming and future work.
6 Statements and declarations
Data
Data used in the paper was obtained following ethical approval from the Tasmanian Health and Medical Human Research Ethics Committee (HREC No 23633) and site-specific approval from the Research Governance Office of the Tasmanian Health Service.
Authorship contribution statement
This research has contributed to the Honours thesis by Krasnicki [19]. The following are the contributions of the authors, Małgorzata M. O’Reilly (MMO), Sebastian Krasnicki (SK), James Montgomery (JM), Mojtaba Heydar (MH), Richard Turner (RT), Pieter Van Dam (PVD), Peter Maree (PM):
•
Conceptualisation, Mathematical background: MMO, SK, MH, and JM;
Sections 2-4, Appendix A-B, Methodology development: MMO, SK, and JM;
•
Sections 2-4, Derivation of the probability formulae: MMO;
•
Section 4, Example 1, Coding and analysis: SK, MMO, and MH;
•
Section 4, Example 2, Coding and analysis: MMO and JM.
Declaration of competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
References
[1]
Numerical inversion of probability generating functions.
Operations Research Letters, 12(4):245–251, 1992.
[2]
Australia’s hospitals at a glance 2018-19. Cat. no. HSE 247. Canberra:
AIHW.
Australian Institute of Health and Welfare, 2020.
[3]
A. K. Abera, M. M. O’Reilly, M. Fackrell, B. R. Holland, and M. Heydar.
On the decision support model for the patient admission scheduling
problem with random arrivals and departures: A solution approach.
Stochastic Models, 36(2):312–336, 2020.
[4]
A. Bagust, M. Place, and J. W. Posnett.
Dynamics of bed use in accommodating emergency admissions: Stochastic
simulation model.
British medical journal, 318(7203):155–158, 1999.
[5]
C. A. Bain, P. G. Taylor, G. McDonnell, and A. Georgiou.
Myths of ideal hospital occupancy.
Medical Journal of Australia, 192(1):42–43, 2010.
[6]
B. Bilgin, P. Demeester, M. Misir, W. Vancroonenburg, and G. Vanden Berghe.
One hyperheuristic approach to two timetabling problems in health
care.
Journal of Heuristics, 18(3):401–434, 2012.
[7]
K. Butler and M. A. Stephens.
The distribution of a sum of independent binomial random variables.
Methodology and Computing in Applied Probability,
19(2):557–571, 2017.
[8]
E. J. Carter, S. M. Pouch, and E. L. Larson.
The relationship between emergency department crowding and patient
outcomes: A systematic review.
Journal of Nursing Scholarship, 46(2):106–115, 2014.
[9]
E. Cummings, L. Ellis, A. Georgiou, K. E., C. Showell, and P. Turner.
An evidence-based review and training resource on smooth patient
flow.
Ministry of Health, New South Wales Government, Australia,
pages 1–34, 2012.
[10]
J. G. Dai and P. Shi.
Inpatient overflow: An approximate dynamic programming approach.
Manufacturing and Service Operations Management,
21(4):894–911, 2019.
[11]
P. Demeester, W. Souffriau, P. De Causmaecker, and G. Vanden Berghe.
A hybrid tabu search algorithm for automatically assigning patients
to beds.
Artificial Intelligence in Medicine, 48(1):61–70, 2010.
[12]
Y. Gocgun.
Simulation-based approximate policy iteration for dynamic patient
scheduling for radiation therapy.
Health Care Management Science, 21(3):317–325, 2018.
[13]
R. Grant, A. K. J. Smith, L. Newett, M. Nash, R. Turner, and L. Owen.
Tasmanian healthcare professionals’ & students’ capacity for
LGBTI + inclusive care: A qualitative inquiry.
Health and Social Care in the Community, 29(4):957–966, 2021.
[14]
W. B. Hall, L. E. Willis, S. Medvedev, and S. S. Carson.
The implications of long-term acute care hospital transfer practices
for measures of in-hospital mortality and length of stay.
American Journal of Respiratory and Critical Care Medicine,
185(1):53–57, 2012.
[15]
M. Heydar, M. M. O‘Reilly, E. Trainer, M. Fackrell, P. G. Taylor, and
A. Tirdad.
A stochastic model for the patient-bed assignment problem with random
arrivals and departures.
Annals of Operations Research, pages 1–33, 2021.
[16]
R. Howard.
Dynamic Programming and Markov Processes.
The MIT Press, Cambridge, 1960.
[17]
E. Howell, E. Bessman, S. Kravet, K. Kolodner, R. Marshall, and S. Wright.
Active bed management by hospitalists and emergency department
throughput.
Annals of Internal Medicine, 149(11):804–810, 2008.
[18]
P. J. H. Hulshof, M. R. K. Mes, R. J. Boucherie, and E. W. Hans.
Patient admission planning using approximate dynamic programming.
Flexible Services and Manufacturing Journal, 28(1-2):30–61,
2016.
[19]
S. Krasnicki.
Modelling of optimal decision making in healthcare systems, Honours
Thesis, The University of Tasmania, 2021.
[20]
C. Morley, M. Unwin, G. M. Peterson, J. Stankovich, and L. Kinsman.
Emergency department crowding: A systematic review of causes,
consequences and solutions.
PLoS ONE, 13(8), 2018.
[21]
W. Powell.
What you should know about approximate dynamic programming.
Naval Research Logistics, 56(3):239–249, 2009.
[22]
W. Powell.
Approximate Dynamic Programming: Solving the Curses of
Dimensionality: Second Edition.
Wiley, 2011.
[23]
W. B. Powell.
What you should know about approximate dynamic programming.
Naval Research Logistics, 56(3):239–249, 2009.
[24]
W. B. Powell.
Perspectives of approximate dynamic programming.
Annals of Operations Research, 241(1-2):319–356, 2016.
[25]
M. L. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic
Programming.
John Wiley, New York, 1994.
[26]
W. Vancroonenburg, P. De Causmaecker, and G. Vanden Berghe.
A study of decision support models for online patient-to-room
assignment planning.
Annals of Operations Research, 239:253–271, 2016.
5: Randomly generate state using model parameters and post-decision state .
6: Let .
7:endfor
8:Compute .
Algorithm 4 Decision . We make a decision based on priorities so as to guarantee the best possible bed for higher priority patients, without allowing transfers. Recursively, we find the highest priority of patients to allocate yet and transfer them to their best available wards.
1:Input .
2:Initialise and .
3:whiledo
4: Let , current highest priority to allocate.
5:whiledo (allocate type- patients)
6: Find .
7: Let , best ward for type- patient with an available bed.
8: Let , and .
9: Let .
10:endwhile
11:endwhile
Algorithm 5 Decisions . We make a decision based on priorities so as to guarantee the best possible bed for higher priority patients, and allow up to transfers. Recursively, we find the highest priority of patients to allocate/transfer yet (Line 6). Next, we transfer type- patients (Lines 7-13) before allocating newly arrived type- patients (Lines 14-30). When transferring patients, we find a type- patient that is in the worst ward and transfer them to a ward with the lowest transfer cost.
1:Input and the maximum number of transfers .
2:Initialise , current number of available transfers.
3:Initialise , patients to be transferred.
4:Initialise and .
5:while or do
6: Let , current highest priority to allocate/transfer.
7:whiledo (transfer type- patient from ward to )
8: ;
9: is the worst ward with a type- patient;
10: has the lowest transfer cost.
11: Let .
12: Let , .
13:endwhile
14:whiledo (allocate type- patients)
15:if (no more transfers allowed) then (allocate type- patient)
16: ;
17: is the best ward for patient type with an available bed.
18: Let , and .
19:endif
20:if (transfers allowed) then (allocate type- patient to ward )
21: ;
22: is the best ward for patient type with a potential bed.
23: Let , and .
24:ifthen (must transfer patient out of ward )
25: is the lowest priority patient in ward .
26: Let , .
27:endif
28:endif
29: Let .
30:endwhile
31:endwhile
Appendix B Appendix: Example 1 – matrices and vectors
We note that the set of all possible post-decision states is for both , with
(60)
where the difference between and is in places corresponding to , which are the only two states where choosing between or can make a difference.
The probability matrices of the discrete-time Markov chains corresponding to decisions are evaluated as follows. First, denote , the probability that type- patient does not depart from ward . Next, denote , , the probability of observing arrivals in a Poisson process with rate . Let
(61)
with , interpreted as the probability that at least arrivals were observed; and
(62)
with , interpreted as the conditional probability that, given patients have been admitted, there were patients of type , respectively.
Further, let be the probability that patients have been admitted and there were patients of type , respectively, with . Then, if the number of admissions is restricted by , where is the number of available beds, we have,
(63)
(64)
and otherwise, with .
Also, denote,
(66)
(68)
Then, for , the nonzero entries of are,
(70)
for ,
(72)
(74)
for ,
(76)
(78)
for ,
(80)
(82)
(84)
(85)
for ,
(87)
(89)
(91)
(92)
for ,
(95)
(97)
(99)
(100)
for ,
(102)
(104)
for ,
(107)
(109)
(111)
(112)
and for ,
(114)
(116)
For , rows are the only rows in that differ from the rows of , with their nonzero entries given by,
(118)
(120)
(122)
(123)
and the remaining rows are the same as in .
Finally, costs are recorded as vectors corresponding to decisions , such that,
since the costs do not depend on the next observed state.