Markov Decision Process and Approximate Dynamic Programming for a Patient Assignment Scheduling problem

Małgorzata M. O’Reilly Sebastian Krasnicki ²²footnotemark: 2 James Montgomery
Mojtaba Heydar Richard Turner Pieter Van Dam Peter Maree email: [email protected] of Natural Sciences, University of Tasmania, Hobart TAS 7001, Australia.School of Information & Communication Technology, University of Tasmania, Hobart TAS 7001, Australia.BHP, Perth, Western Australia.School of Medicine, University of Tasmania, Hobart TAS 7001, Australia.School of Nursing, University of Tasmania, Hobart TAS 7001, Australia.Strategy and Planning, Department of Health, Hobart TAS 7001, Australia.

(June 26, 2024)

Abstract

We study the Patient Assignment Scheduling (PAS) problem in a random environment that arises in the management of patient flow in the hospital systems, due to the stochastic nature of the arrivals as well as the Length of Stay distribution.

We develop a Markov Decision Process (MDP) which aims to assign the newly arrived patients in an optimal way so as to minimise the total expected long-run cost per unit time over an infinite horizon. We assume Poisson arrival rates that depend on patient types, and Length of Stay distributions that depend on whether patients stay in their primary wards or not.

Since the instances of realistic size of this problem are not easy to solve, we develop numerical methods based on Approximate Dynamic Programming. We illustrate the theory with numerical examples with parameters obtained by fitting to data from a tertiary referral hospital in Australia, and demonstrate the application potential of our methodology under practical considerations.

Keywords: Patient Assignment Scheduling problem, Poisson arrivals, Length of Stay distribution, Markov chains, Markov Decision Process, Approximate Dynamic Programming.

Mathematics Subject Classification: 60J80 – 60J22 – 92D25 – 65H10

Funding: This research was supported by funding through the Australian Research Council Linkage Project LP140100152.

1 Introduction

In this paper we consider the Patient Assignment Scheduling (PAS) problem in which, at the start of each day, newly arrived patients are assigned to beds in different wards, considering their needs, priority, and available resources, in a way so as to optimise total expected daily cost over an infinite time horizon. We use the term ‘bed’ in the sense of the resource that consists of the staff (nurses and clinicians) available to attend to a patient in a physical bed, rather than the physical bed itself.

We assume that that patients are assigned to the beds at the start of time period $d=0,1,2,\ldots$ , see Figure 1. The allocation involves a group of patients, waiting to be admitted to a suitable bed. Upon ceompleting their treatments, patients are discharged. The duration of time between arrival and discharge is a random variable referred to as the Length of Stay (LoS). We assume an infinite horizon problem and develop a Markov Decision Process (MDP) to solve this stochastic problem so as to minimise the total expected long-run cost per unit time.

Here, without loss of generality we assume that the index $d=0,1,2,\ldots$ corresponds to day $d$ , but note that this could denote some other time interval of interest such as an $8$ -hour time block.

Figure 1: Decision epochs of the PAS problem. At the start of time period

d

we observe some state

s

and then make some decision

a

about how to allocate/transfer patients. This transforms the system into a post decision state

s^{(a)}

and then the system evolves in a stochastic manner, until new state

s^{\prime}

is observed at the start of the next time period

d+1

Patient assignment is a key process in the management of hospitals, whereby bed managers allocate inbound patients to appropriate wards, rooms and beds. These decisions are made by teams of ward and bed managers at intermittent times throughout the day, using their experience and understanding of the natural dynamics of the hospital. From 2018 to 2019, the Australian hospital system recorded 11.5 millions hospitalisations, and 8.4 million emergency department presentations, resulting in 365,000 emergency admissions [2]. This corresponded to 31,500 patient assignment decisions made daily across Australia, with 1000 of these being unplanned [2].

These decisions are not trivial, since complications can arise due to variations in hospital policy and the highly stochastic and complex nature of healthcare. Potential patients can arrive at any time of day, with a variety of conditions and specific needs. Hospital policy may affect where patients can be assigned, on the basis of a patient’s gender or to comply with prescribed healthcare standards. Managers may also evaluate whether it is best to transfer patients between wards. Because of these complicating factors it can be unclear if the assignments made by bed managers are optimal.

The importance of maintaining good patient flow cannot be understated. Carter, Puch and Larson in their literature review of emergency department (ED) crowding found that ED overcrowding has a significant positive correlation with patient mortality and with patients leaving the hospital untreated [8]. Morley et al. in their literature review also found an increase in patient mortality, as well as a higher exposure to error, poorer patient outcomes and increased patient length of stay, both in the ED, and in the ward to which a patient is eventually assigned [20]. They also noted that the inability to quickly assign patients from the ED to an inpatient bed, was a significant contributor to overcrowding.

Regarding overcrowding in non-emergency wards, there is a long standing debate on what level of hospital occupancy is too high. Bagust et al. [4] found in their simulation-based approach that hospital occupancy above $85\%$ created discernible risk, while occupancy above $90\%$ caused regular bed crises. This figure has since been adopted by many as a standard benchmark for hospital occupancy [9]. However, multiple other authors [5, 9] have emphasized that adopting $85\%$ as the figure at which hospitals operate most efficiently, misinterprets Bagust et al. [4], as this figure only applies to the specific example in [4]. Further, Bain et al. [5] state that using any single figure as a target for average occupancy is overly simplistic. Cummings et al. [9] note that while the results of [4] have been misinterpreted, there is a positive correlation between hospital occupancy and emergency department delays.

Solutions to these problems require improvements in the management of patient flow. Howell et al. [17] found that by assigning additional resources to the tasks of bed management and patient assignment, ED length of stay was reduced by an average of 98 minutes for admitted patients. Clearly, tangible benefits are made to patients if they are assigned correctly.

We call the mathematical formulation of patient assignment, the Patient Assignment Scheduling (PAS) problem, where we attempt to find optimal assignments of all patients as they arrive to the hospital. This problem was described by Demeester et al in [11], Bilgin et al. in [6], and Vancroonenburg et al. in [26].

Demeester et al. [11] define a useful approach of hard and soft constraints. Patient assignments that violate hard constraints are infeasible, while assignments that violate soft constraints incur some cost, which contribute to the objective function. An example of a hard constraint is that two patients cannot be assigned to the same bed, while an example of a soft constraint is that a patient should be assigned to the ward that corresponds to the their condition. In their model, the variable for patient length of stay (LoS) used is deterministic, and so too is the cost of assigning a patient. This approach ignores the inherent stochasticity of a hospital system, but provides a useful framework for building upon.

Bilgin et al. [6] and Vancroonenburg et al. in [26] use a very similar approach, defining a set of constraints to form an objective function, but each adding their own novel idea. Bilgin et al. [6] combine the PAS problem with a nurse scheduling problem, while Vancroonenburg et al. in [26] introduce parameters to study the effect of uncertainty and so their model is no longer fully deterministic. However, to truly capture the stochasticity of the system, consideration must be given to how the system evolves in time. The model must incorporate current assignment decisions having an impact on future assignments.

Using a similar integer programming approach to [6, 11, 26], Abera et al. [3] demonstrate how the PAS problem can be solved in a stochastic scenario. They consider first that a patient’s LoS is a random variable that may take any value on some distribution, and is unknown at the time of decision making. They then consider that patient arrivals to the system are also random, meaning that the cost of making any given assignment is stochastic. That is, the cost of an assignment depends on how long a patient stays, and what types of patients will arrive next. To evaluate the assignment cost, and hence the optimal assignment, Abera et al. [3] simulate the system several times to determine the expected cost over some planning horizon. The planning horizons used as examples in [3] are 14, 28 and 56 days.

One shared result that is found in [3, 6, 11, 26], is that the integer linear program of any non-trivial size problem, cannot be solved explicitly. That is, it was not achievable to find the exact optimal solution. Demeester et al. [11] found in their first example of a hospital with $\scriptstyle\sim$ $120$ beds, an optimal solution was not found after a whole week of computation, which is an unacceptable amount of time for assigning patients. To overcome this, various heuristics were required in all of the above work. The authors of the papers listed above all begin by using a neighbourhood search technique. That is, starting with some initial solution, and searching ‘neighbouring’ solutions to see if there is an improvement. The exact methodology of how the neighbourhood is defined and searched, differs between authors. Demeester et al. [11] use a Tabu search, Abera et al. [3] use simulated annealing, while Bilgin et al. [6] use a hyper-heuristic approach (heuristics for selecting heuristics). Regardless of the approach, it is clear that heuristics are a necessity for solving problems of any realistic size.

While the formulation of the PAS problem as an integer program is well researched, alternative formulations exist. Hulshof et al. [18] and Dai and Shi [10] build models related to the PAS problem, based on Markovian Decision Processes (MDP). Such an alternative formulation allows for an exploration of possible decisions that a hospital manager can make. Hulshof et al. [18] explore resource assignment in hospitals and decision making on how many patients should be assigned to different queues. Dai and Shi [10] explore the use of patient overflow (allowing wards to ‘overflow’ into one another when full) and take a ‘higher-level’ approach to decision making. Rather than making assignment decisions for each patient, decisions are made on whether to allow patient overflow for each time period.

Similarly to the integer program approaches, the exact solutions of these MDPs cannot be obtained for large-size problems. Approximate Dynamic Programming (ADP) approaches are applied in [10, 18] to overcome what is called the curse of dimensionality. The ADP approach, described by Powell in [23], allows the cost in each state to be approximated by a set of features, which record some partial information about states, and corresponding weights. In addition to ADP, Dai and Shi [10] utilize a least-squares temporal difference (LSTD) learning algorithm to solve the problem.

We now discuss the work of Heydar et al. [15], which is the basis of the model proposed here. Heydar et al. [15] models the PAS as a continuous-time MDP, where the system is observed at every individual patient arrival or departure. The goal of the model is to identify the best possible patient assignment given information about the arriving patient and the occupancy of the hospital. Heterogeneity is included in the model by defining different patient classes and wards, and assuming that each patient class has an ideal ward, while other wards are unsuitable. In addition to assigning patients directly, Heydar et al. [15] also consider transfers of patients between wards to best fit patients, so as to minimise their appropriately defined objective function.

However, Heydar et al. [15] note that transfers are positively correlated with patient mortality and LoS, as shown in [14], meaning that transfers are given careful consideration and an additional cost in [15]. We adopt this feature as a key component in our model.

Similarly to Abera et al. [3], Heydar et al. [15] evaluate total expected cost over a finite horizon, meaning limited consideration is given to the state of the system in the long run. In order to address this gap, we generalise the model used in Heydar et al. [15] and extend it to an infinite horizon problem. In order to address the curse of dimensionality, we propose methodology based on ADP and the approximate policy iteration algorithm used by Dai and Shi in [10].

Furthermore, we assume that patient assignments are made at discrete time points such as at the start of each day. This is arguably a more realistic approach involving a group of patients to be assigned, which also means that assignment decisions are far more complex, as multiple assignments must be made, and an order of assignments must be decided.

Finally, we demonstrate that the algorithm used by Dai and Shi in [10] can be adapted to our model to find near-optimal solutions to the PAS problem.

The rest of this paper is structured as follows. In Section 2 we describe the key components of our model, such as state space, transition probabilities, decision variables, constraints, and cost variables. In Section 3 we give the details of the Approximate Dynamic Programming approach, and illustrate the application of the theory through our numerical examples in Section 4. This is followed by concluding remarks in Section 5.

2 Markov Decision Process

Let $\mathcal{I}=\{1,2,\dots,I\}$ be the set of all patient types, where type $i\in\mathcal{I}$ may correspond to the medical needs of the patients, their age, gender, and other aspects of inclusive care, see e.g. [13]. We assume that type- $i$ patients arrive at a rate $\lambda_{i}$ per day on day $d$ . Further, we assume that there are $K$ wards in the hospital, each with capacity $m_{k}$ , $k\in\mathcal{K}=\{1,\ldots,K\}$ . The waiting room labelled $(K+1)$ has capacity $m_{K+1}$ .

The set of wards that are suitable for patients type $i$ is denoted $\mathcal{K}(i)$ , for some $\mathcal{K}(i)\subset\mathcal{K}$ . We denote by $w(1,i)\in\mathcal{K}(i)$ the best ward for type- $i$ patient, by $w(2,i)\in\mathcal{K}(i)$ the second-best ward for type- $i$ patient, and so on, and let $\mathcal{W}(i)=(w(1,i),\ldots,w(K,i))$ be the ordered sequence of wards in $\mathcal{K}(i)$ corresponding to type- $i$ patients. For example, if $\mathcal{K}(i)=\{1,2,3\}\subset\mathcal{K}=\{1,2,3,4,5\}$ and $\mathcal{W}(i)=(3,1,2)$ , then this means that ward $3$ is the best ward for type- $i$ patient, ward $1$ is the second best, ward $2$ is the third best, and wards $4$ and $5$ are not suitable for type- $i$ patients.

In Section 2.1 below, we contruct a model based on a suitable Markov Decision Process to find the optimal policy $\mbox{\boldmath$\pi$}^{*}$ from the set of all policies $\mbox{\boldmath$\pi$}=(a^{\mbox{\boldmath$\pi$}}(s))_{s\in\mathcal{S}}$ consisting of decisions $a^{\mbox{\boldmath$\pi$}}(s)$ taken whenever state $s$ in state space $\mathcal{S}$ is observed, so as to minimise the long-run expected cost per unit time,

\displaystyle E^{*}=\min_{\mbox{\boldmath$\pi$}}E^{\mbox{\boldmath$\pi$}}=\min% _{\mbox{\boldmath$\pi$}}\lim_{D\to\infty}\left(\frac{1}{D}\ \mathbb{E}\left(% \sum_{d=0}^{D-1}C(S_{d},a^{\mbox{\boldmath$\pi$}}(S_{d}))\right)\right),\ % \mbox{\boldmath$\pi$}^{*}=\arg\min_{\mbox{\boldmath$\pi$}}E^{\mbox{\boldmath$% \pi$}},

(1)

where $S_{d}\in\mathcal{S}$ is a state of the system observed at the start of day $d$ , and $C(S_{d},a^{\mbox{\boldmath$\pi$}}(s))$ is the cost of a decision then taken assuming policy $\pi$ is in place.

In practice, we find $\mbox{\boldmath$\pi$}^{*}$ by solving the Bellman’s optimality equation for all states $s\in\mathcal{S}$ ,

\displaystyle E^{*}+V(s)

\displaystyle=

\displaystyle\min_{a\in\mathcal{A}(s)}\Bigg{\{}C(s,a)+\mathbb{E}\left(V(s^{% \prime})\ |\ (s,a)\right)\Bigg{\}}=\min_{a\in\mathcal{A}(s)}\Bigg{\{}C(s,a)+% \sum_{s^{\prime}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)V(s^% {\prime})\Bigg{\}},

where $V(s)$ is the minimum long-run average cost given current state $s$ , and $\mathcal{A}(s)$ is the set of all decisions that are possible when state $s$ is observed. We use Approximate Dynamic Programming methods [21, 22, 24] to solve Equation (2) in real-sized problems, in which the number of states and decisions are intractable, as noted in [10, 12, 15, 18].

2.1 Model

Consider a discrete-time Markov chain $\{S_{d}:d=0,1,\ldots\}$ , where $S_{d}=\left([N_{k,i}]_{\mathcal{K}\times\mathcal{I}},[Q_{i}]_{1\times\mathcal{% I}}\right)\in\mathcal{S}$ is the state of the system at the start of day $d=0,1,2,\ldots$ recording $N_{k,i}$ , the number of type- $i$ patients in ward $k$ , and $Q_{i}$ , the number of newly arrived type- $i$ patients who are yet to be assigned to the wards. The state space $\mathcal{S}$ of the process is given by,

\displaystyle\mathcal{S}

\displaystyle=

\displaystyle\{\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times% \mathcal{I}}\right):n_{k,i}\geq 0,q_{i}\geq 0,\sum_{i=1}^{I}n_{k,i}\leq m_{k},% \sum_{i=1}^{I}q_{i}\leq m_{K+1}\},

with possibly additional constraints on the total number of accepted arrivals, as discussed below.

2.2 Decisions

Suppose that we observe state $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}% \right)\in\mathcal{S}$ at the start of a day and choose a suitable decision $a\in\mathcal{A}(s)$ from the set of available decisions $\mathcal{A}(s)$ , such that $a=(x_{k,i},y_{k,l,i})_{i\in\mathcal{I};k,\ell=1,\ldots K}$ , where

•

$x_{k,i}$ is the number of type- $i$ newly arrived patients to assign to ward $k$ , and,
•

$y_{k,\ell,i}$ is the number of type- $i$ patients to be transferred from ward $k$ to ward $\ell$ .

Each decision should satisfy the following sets of constraints. First, all newly arrived type- $i$ patients have to be assigned to some ward, and so

\displaystyle\sum_{k=1}^{K}x_{k,i}=q_{i},\ \forall i\in\mathcal{I}.

(3)

Next, the total number of patients in ward $k$ , denoted $n_{k,\bullet,t}$ , and given by,

	$\displaystyle n_{k,\bullet}$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{I}n_{k,i},$		(4)
	$\displaystyle n_{k,i}$	$\displaystyle=$	$\displaystyle n_{k,i}+x_{k,i}+\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}\big{(}y_{\ell,k,i}-y_{k,\ell,i}\big{)},$		(5)

where $n_{k,i}$ is the total number of type- $i$ patients in ward $k$ , cannot exceed the capacity of the ward, and so,

\displaystyle n_{k,\bullet}\leq m_{k},\ \forall k\in\mathcal{K}.

(6)

Further, we may only transfer the available patients, which gives,

\displaystyle\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}y_{k,\ell,i}\leq n_{k,i},\ \forall k\in\mathcal{K% },\ \forall i\in\mathcal{I},

(7)

and if a transfer occurs, it should be in one direction, that is,

\displaystyle I\{y_{k,\ell,i}\not=0\}+I\{y_{\ell,k,i}\not=0\}

\displaystyle\leq

\displaystyle 1,\ \forall k,\ell\in\mathcal{K},\ \forall i\in\mathcal{I},

(8)

where $I\{\cdot\}$ is an indicator function taking value $1$ if the statement in the brackets is true, and $0$ otherwise, which ensures that only one of $y_{k,\ell,i}$ and $y_{\ell,k,i}$ takes a nonzero value.

2.3 Transition probabilities

Now, given decision $a$ and current state $s\in\mathcal{S}$ , we define the following key random variables:

•

$Z_{k,i}$ , the number of type- $i$ patients who depart from ward $k$ during one day;
•

$Z=\sum_{k=1}^{K}\sum_{i=1}^{I}Z_{k,i}$ , the total number of departures;
•

$B=\sum_{k=1}^{K}m_{k}-Z$ , the total number of available beds after departures;
•

$Q=\sum_{i=1}^{I}Q_{i}$ , the total number of arrivals;

which we use below to determine the transition probabilities of the process $\{S_{d}:d=0,1,\ldots\}$ . The distribution of these variables depends on $(s,a)$ , as we discuss below.

Assuming state $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ and decision $a=(x_{k,i},y_{k,l,i})_{i\in\mathcal{I};k\ell=1,\ldots K}$ , the resulting post-decision state is $s^{(a)}=\left([n^{(a)}_{k,i}]_{\mathcal{K}\times\mathcal{I}}\right)\equiv\left% ([n^{(a)}_{k,i}]_{\mathcal{K}\times\mathcal{I}},[0]_{1\times\mathcal{I}}\right)$ such that,

n^{(a)}_{k,i}=n_{k,i}+x_{k,i}+\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}\big{(}y_{\ell,k,i}-y_{k,\ell,i}\big{)},\ \forall k% \in\mathcal{K},\ \forall i\in\mathcal{I}.

(9)

Next, the process transitions from state $s^{(a)}$ on day $d$ to some state $s^{\prime}=\left([n^{\prime}_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q^{\prime}% _{i}]_{1\times\mathcal{I}}\right)$ on day $(d+1)$ , for $d=0,1,2,\ldots$ , with probability given by

$\displaystyle\mathbb{P}\left(s^{\prime}\ \|\ (s,a)\right)$	$\displaystyle=$	$\displaystyle\mathbb{P}\left((Z_{k,i}=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}% \mbox{ and }(Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\right)$	(10)
	$\displaystyle=$	$\displaystyle\mathbb{P}\left((Z_{k,i}=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}% \right)\times\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ (Z_{k,i% }=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}\right)$
	$\displaystyle=$	$\displaystyle\left(\prod_{k=1}^{K}\prod_{i=1}^{I}{\mathbb{P}(Z_{k,i}=z_{k,i})}% \right)\times\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ (Z_{k,i% }=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}\right)$

when the condition

n^{\prime}_{k,i}=n^{(a)}_{k,i}-z_{k,i}

(11)

is met for all $k\in\mathcal{K}$ , $i\in\mathcal{I}$ ; and $\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)=0$ otherwise.

We apply conditional probabilities $\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ |\ (Z_{k,i}=z_{k,i})_{k% =1,\ldots,K;i=1,\ldots,I}\right)$ in (10) since the number of accepted type- $i$ arrivals may depend on the number of available beds after departures, and consider the following alternative modelling approaches:

•

Suppose that there is no restriction on the total number of arrivals, as in Dai and Shi [10]. Then $Q_{i}$ follows Poisson distribution, $Q_{i}\sim Poi(\lambda_{i})$ , and so

\displaystyle\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ |\ (Z_{k,i% }=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}\right)

\displaystyle=

\displaystyle\prod_{i=1}^{I}\mathbb{P}(Q_{i}=q_{i})=\prod_{i=1}^{I}\dfrac{(% \lambda_{i})^{q_{i}}e^{-\lambda_{i}}}{(q_{i})!}.

(12)

In this approach, arriving patients are not lost to the system, however the number of patients still waiting to be assigned and their total waiting times could be very large (and exceed many days). This may not be a realistic assumption since in practice the capacity of the system is limited (due to the availability of beds and staffing) and there are limits on the accepted maximum waiting times for various patient types (e.g. $24$ -hour limit for some types of emergency patients).

•

Suppose that the total number of arrivals $Q$ may not exceed the total number of available beds. Then, with $b=\sum_{k=1}^{K}m_{k}-\sum_{k=1}^{K}\sum_{i=1}^{I}z_{k,i}$ recording the total number of available beds after the departures, we have,

	$\displaystyle\mathbb{P}(Q=q\ \|\ B=b)$	$\displaystyle=$	$\displaystyle\dfrac{(\lambda)^{q}e^{-\lambda}}{(q)!},\ q=0,1,2\ldots,b-1,$		(13)
	$\displaystyle\mathbb{P}(Q=b\ \|\ B=b)$	$\displaystyle=$	$\displaystyle 1-\sum_{q=0}^{b-1}\dfrac{(\lambda)^{q}e^{-\lambda}}{(q)!},$		(14)

and, with $q=\sum_{i=1}^{I}q_{i}$ ,

$\displaystyle\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ (Z_{k,i% }=z_{k,i})_{k=1,\ldots,K;i=1,\ldots,I}\right)$			(15)
	$\displaystyle=$	$\displaystyle\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ B=b\right)$
	$\displaystyle=$	$\displaystyle\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ Q=q,B=b% \right)\times\mathbb{P}(Q=q\ \|\ B=b)$
	$\displaystyle=$	$\displaystyle\mathbb{P}\left((Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}\ \|\ Q=q% \right)\times\mathbb{P}(Q=q\ \|\ B=b)$
	$\displaystyle=$	$\displaystyle\frac{q!}{\prod_{i=1}^{I}q_{i}!}\times\prod_{i=1}^{I}\left(\frac{% \lambda_{i}}{\lambda}\right)^{q_{i}}\times\mathbb{P}(Q=q\ \|\ B=b),$

since the conditional distribution of $(Q_{i}=q^{\prime}_{i})_{i=1,\ldots,I}$ given $Q=q$ is multinomial, where $\lambda_{i}/\lambda$ is the probability that an arriving patient is of type $i$ . In this approach, patients arriving to a system with no free beds, are redirected to other health systems. Such approach may be more realistic and can be used to determine the rate of patients that are redirected to help decide a suitable size of the hospital system.

To model random variables $Z_{k,i}$ , we assume that departures are independent of one another and so $Z_{k,i}$ follows Binomial distribution, $Z_{k,i}\sim Bin(n_{k,i},p_{k,i})$ , with

	$\displaystyle\mathbb{P}(Z_{k,i}=z)$	$\displaystyle=$	$\displaystyle\binom{n_{k,i}}{z}(p_{k,i})^{z}(1-p_{k,i})^{n_{k,i}-z},\ z=0,1,% \ldots,n_{k,i},$		(16)
	$\displaystyle p_{k,i}$	$\displaystyle=$	$\displaystyle\mathbb{P}(RLOS_{k,i}\leq 1),$		(17)

where $RLOS_{k,i}$ is a random variable recording the remaining length of stay of type- $i$ patient that is in ward $k$ at the start of the day. That is, for each type- $i$ patient in ward $k$ , we perform a Bernoulli trial with probability of success $p_{k,i}$ to determine if the patient leaves the system on the following day or not.

2.4 Costs

There is an immediate cost $C(s,a)$ associated with decision $a$ given current state $s\in\mathcal{S}$ , which may include costs of assignment, transfer, and patients being in nonprimary wards, defined as follows.

•

Assignment cost: $\sum_{k=1}^{K}\sum_{i=1}^{I}x_{k,i}\times c^{(\sigma)}_{k,i}$ .
•

Transfer cost: $\sum_{k=1}^{K}\sum_{\ell=1}^{K}\sum_{i=1}^{I}y_{k,\ell,i}\times c^{(t)}_{k,% \ell,i}$ .
•

Penalty cost for being in a nonprimary ward: $\sum_{k=1}^{K}\sum_{i=1}^{I}n_{k,i}\times c^{(p)}_{k,i}$ .

The total immediate cost of decision $a=(x_{k,i},y_{k,l,i})_{i\in\mathcal{I};k,\ell=1,\ldots K}$ given state $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ is then,

	$\displaystyle C(s,a)$	$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}\bigg{\{}x_{k,i}\times c^{(\sigma)}_{% k,i}+\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}y_{k,\ell,i}\times c^{(t)}_{k,\ell,i}+n^{(a)}_{k,% i}\times c^{(p)}_{k,i}\bigg{\}}$		(18)
		$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}\bigg{\{}x_{k,i}\times c^{(\sigma)}_{% k,i}+\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}y_{k,\ell,i}\times c^{(t)}_{k,\ell,i}+\Big{(}n_{k% ,i}+x_{k,i}+\sum_{\begin{subarray}{c}\ell=1\\ \ell\neq k\end{subarray}}^{K}(y_{\ell,k,i}-y_{k,\ell,i})\Big{)}\times c^{(p)}_% {k,i}\bigg{\}}.$		(18)

By above, for all $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ , Eq. (2) can be written as,

\displaystyle E^{*}+V\left(s\right)

\displaystyle=

\displaystyle\min_{a\in\mathcal{A}(s)}\Bigg{\{}C(s,a)+\sum_{s^{\prime}=\left([% n^{{}^{\prime}}_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q^{\prime}_{i}]_{1% \times\mathcal{I}}\right)\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a)% \right)V\left(s^{\prime}\right)\Bigg{\}},

where $\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)$ and $C\left(s,a\right)$ are given by Eqs. (10) and (18), respectively.

3 Approximate Dynamic Programming approach

Policy Iteration is a standard method of solving Equation (2) when the size of the state space $\mathcal{S}$ is not too large, see Algorithm 1 in Appendix A, also refer to Puterman [25] for further details.

However, Policy Iteration is not suitable here, and so we apply Approximate Dynamic Programming methods introduced by Powell in [21, 22, 24] to address the curses of dimensionality in Eq. (2), as follows.

First, similar to Heydar et al. [15], we apply basis functions $\phi_{f}(s)$ , $f\in\mathcal{F}$ , to record some suitable information about states $s$ , referred to as state features, and then the approximation,

\displaystyle V(s)\approx\sum_{f\in\mathcal{F}}\phi_{f}(s)\theta^{(f)}_{n}=% \mbox{\boldmath$\phi$}(s)^{T}\mbox{\boldmath$\theta$}_{n},

(20)

where $\mbox{\boldmath$\phi$}(s)=[\phi_{f}(s)]_{f\in\mathcal{F}}$ is the vector of features, and $\mbox{\boldmath$\theta$}_{n}=[\theta^{(f)}_{n}]_{f\in\mathcal{F}}$ is the vector of corresponding weights $\theta^{(f)}_{n}$ , evaluated at the $n$ -th iteration of an algorithm.

2.

Next, we apply the Approximate Policy Iteration presented by Dai and Shi in [10], in which the weights $\theta^{(f)}_{n}$ are recursively updated in an iteration that is repeated $N$ times. Each iteration $n$ involves a simulation of $M$ states for some large $M$ . We summarise this approach in Algorithms 2–3 in Appendix A. We note that the Markov chains applied in our models are irreducible and positive recurrent, and so the algorithms are guaranteed to converge as $M\to\infty$ [10].

Consider our Model in Section 2.1. Suppose that the vector of features of state $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ is the vector $\vec{s}$ , and so a vector containing all information about state $s$ . We note that to compute the expression in Line 6 of Algorithms 2 in Appendix A and in Line 4 in Algorithm 3 in Appendix A, it is then convenient to apply the following equivalence. Assuming $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ , and with $b=\sum_{k=1}^{K}m_{k}-\sum_{k=1}^{K}\sum_{i=1}^{I}n^{(a)}_{k,i}$ , we have

$\displaystyle\sum_{s^{{}^{\prime}}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ \|% \ (s,a)\right)\mbox{\boldmath$\phi$}(s^{{}^{\prime}})\mbox{\boldmath$\theta$}$	$\displaystyle=$	$\displaystyle\mathbb{E}(\mbox{\boldmath$\phi$}(s^{{}^{\prime}})\mbox{\boldmath% $\theta$}\ \|\ (s,a))$	(21)
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}\mathbb{E}(N_{k,i}^{{}^{\prime}})% \theta_{k,i}+\sum_{i=1}^{I}\mathbb{E}(Q_{i}^{{}^{\prime}})\theta_{i}$	(22)
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}(n^{(a)}_{k,i}-\mathbb{E}(Z_{k,i}))% \theta_{k,i}+\sum_{i=1}^{I}\mathbb{E}\left(\mathbb{E}(Q_{i}^{{}^{\prime}}\ \|\ % Q\ )\right)\theta_{i}$	(23)
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}n^{(a)}_{k,i}(1-p_{k,i})\theta_{k,i}+% \sum_{i=1}^{I}\frac{\lambda_{i}}{\lambda}\times\mathbb{E}(Q)\theta_{i},$	(24)

where $n^{(a)}_{k,i}$ is given by Eq. (9), $\mathbb{E}(Z_{k,i})=n^{(a)}_{k,i}p_{k,i}$ (binomial mean), $N_{k,i}^{{}^{\prime}}$ is a random variable recording the number of type- $i$ patients in ward $k$ and $Q_{i}^{{}^{\prime}}$ a random variable recording the number of type- $i$ patients waiting to be assigned when state $s^{{}^{\prime}}$ is observed, and $Q=\sum_{i=1}^{I}Q_{i}^{{}^{\prime}}$ is the total number of arrivals.

If the number of arrivals is unrestricted with $m_{K+1}=\infty$ , then $\mathbb{E}(Q)=\lambda$ . Alternatively, if the number of arrivals is restricted so that we only allow arrivals that can be assigned, which can be written as,

\sum_{k=1}^{K}\sum_{i=1}^{I}n_{k,i}^{{}^{\prime}}+\sum_{i=1}^{I}q_{i}^{{}^{% \prime}}\leq\sum_{k=1}^{K}m_{k}=m,

(25)

then we have the following approach for evaluating $\mathbb{E}(Q)$ .

Let $Z$ be the random variable recording the total number of departures,

\displaystyle Z

\displaystyle=

\displaystyle\sum_{k=1}^{K}\sum_{i=1}^{I}Z_{k,i},

(26)

taking values $z=0,\ldots,z_{max}$ , with $z_{max}=\sum_{k=1}^{K}\sum_{i=1}^{I}n_{k,i}^{(a)}$ . Let $N$ be the number of available beds for the new arrivals, given by

\displaystyle N

\displaystyle=

\displaystyle m-\sum_{k=1}^{K}\sum_{i=1}^{I}N_{k,i}^{{}^{\prime}}=m-z_{max}+Z,

(27)

taking values $n=m-z_{max},\ldots,m$ . Therefore,

$\displaystyle\mathbb{E}\left(Q\ \|\ Z=z\right)$	$\displaystyle=$	$\displaystyle\mathbb{E}\left(Q\ \|\ N=m-z_{max}+z\right)$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{m-z_{max}+z}k\times\frac{\lambda^{k}}{k!}e^{-\lambda}% +\sum_{k=m-z_{max}+z+1}^{\infty}(m-z_{max}+z)\times\frac{\lambda^{k}}{k!}e^{-\lambda}$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{m-z_{max}+z}k\times\frac{\lambda^{k}}{k!}e^{-\lambda}% +(m-z_{max}+z)\left(1-\sum_{k=0}^{m-z_{max}+z}\frac{\lambda^{k}}{k!}e^{-% \lambda}\right)$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{m-z_{max}+z-1}\lambda\times\frac{\lambda^{k}}{k!}e^{-% \lambda}+(m-z_{max}+z)\left(1-\sum_{k=0}^{m-z_{max}+z}\frac{\lambda^{k}}{k!}e^% {-\lambda}\right)$

and so

$\displaystyle\mathbb{E}(Q)$	$\displaystyle=$	$\displaystyle\sum_{z=0}^{z_{max}}\mathbb{E}\left(Q\ \|\ Z=z\right)\mathbb{P}(Z=z)$	(28)
	$\displaystyle=$	$\displaystyle\sum_{z=0}^{z_{max}}\left(\sum_{k=0}^{m-z_{max}+z}k\times\frac{% \lambda^{k}}{k!}e^{-\lambda}+(m-z_{max}+z)\left(1-\sum_{k=0}^{m-z_{max}+z}% \frac{\lambda^{k}}{k!}e^{-\lambda}\right)\right)\mathbb{P}(Z=z)$
	$\displaystyle=$	$\displaystyle\sum_{z=0}^{z_{max}}\mathbb{P}(Z=z)(m-z_{max}+z)$
		$\displaystyle+\sum_{z=0}^{z_{max}}\left(\sum_{k=0}^{m-z_{max}+z-1}\lambda% \times\frac{\lambda^{k}}{k!}e^{-\lambda}-(m-z_{max}+z)\left(\sum_{k=0}^{m-z_{% max}+z}\frac{\lambda^{k}}{k!}e^{-\lambda}\right)\right)\mathbb{P}(Z=z)$
	$\displaystyle=$	$\displaystyle m-z_{max}+\mathbb{E}(Z)$
		$\displaystyle+\sum_{z=0}^{z_{max}}\left(\lambda-(m-z_{max}+z)\right)\left(\sum% _{k=0}^{m-z_{max}+z-1}\frac{\lambda^{k}}{k!}e^{-\lambda}\right)\mathbb{P}(Z=z)$
		$\displaystyle-\sum_{z=0}^{z_{max}}(m-z_{max}+z)\left(\frac{\lambda^{m-z_{max}+% z}}{(m-z_{max}+z)!}e^{-\lambda}\right)\mathbb{P}(Z=z),$

where $\mathbb{E}(Z)=\sum_{k=1}^{K}\sum_{i=1}^{I}n^{(a)}_{k,i}p_{k,i}$ .

Further, in order to compute $\mathbb{P}(Z=z)$ in (28), we note that $Z=\sum_{k=1}^{K}\sum_{i=1}^{I}Z_{k,i}I(n_{k,i}^{(a)}\not=0)$ is a sum of independent binomial random variables $Z_{k,i}\sim Bin(n_{k,i}^{(a)},p_{k,i})$ such that $n_{k,i}^{(a)}\not=0$ .

A method for computing the distribution of a sum of independent binomial random variables was discussed by Butler and Stephens in [7]. Below, we suggest an alternative method, which relies on simple matrix multiplications and standard theory of Markov chains.

Without loss of generality, suppose that $Z=\sum_{i=1}^{M}Z_{i}$ is a random variable such that $Z_{i}\sim Bin(n_{i},p_{i})$ are independent Binomial random variables with some parameters $n_{i}>0$ and $0<p_{i}<1$ , for $i=1,\ldots M$ .

To compute $\mathbb{P}(Z=z)$ , it is convenient to construct a discrete-time Markov chain corresponding to the Bernoulli trials $n_{1},n_{2},\ldots,n_{M}$ such that a success at each trial results in the chain moving one step to the right, and failure results in the chain remaining at the original state. Then, the distribution of the chain after all $n_{1}+\ldots+n_{M}$ trials will give the distribution of $Z$ .

So, we consider a discrete-time Markov chain $\{(J(t)):t=0,1,2,\ldots,z_{max}\}$ terminating at time $z_{max}=\sum_{i=1}^{M}n_{i}$ , with state space $\mathcal{S}=\{0,\ldots,z_{max}\}$ and an initial state $J(0)=0$ , which evolves as follows. We perform the first $n_{1}$ Bernoulli trials at times $t=1,\ldots,n_{1}$ and let $\mathbb{P}(J(t)=J(t-1)+1)=p_{1}$ , $\mathbb{P}(J(t)=J(t-1))=1-p_{1}$ . Then, the distribution of the chain after the first $n_{1}$ trials, and so at time $t=n_{1}$ , is given by

\displaystyle\mbox{\boldmath$\alpha$}(n_{1})

\displaystyle=

\displaystyle\left[\mbox{\boldmath$\alpha$}(0)\quad{\bf 0}_{1\times n_{1}}% \right]{\bf P}^{(1)},

(29)

where $\mbox{\boldmath$\alpha$}(0)=[\alpha(0)_{j}]_{j=0}$ with $\alpha(0)_{0}=1$ , and ${\bf P}^{[1]}=[P_{j,j^{\prime}}^{[1]}]_{j,j^{\prime}=0,1,\ldots,n_{1}}$ such that $P_{j,j^{\prime}}^{[1]}=\mathbb{P}(Z_{1}=j^{\prime}-j)$ .

Next, we repeat this and perform $n_{i}$ Bernoulli trials at times $t=n_{1}+\ldots+n_{i-1}+1,\ldots,n_{1}+\ldots+n_{i}$ and let $\mathbb{P}(J(t)=J(t-1)+1)=p_{i}$ , $\mathbb{P}(J(t)=J(t-1))=1-p_{i}$ , for each $i=2,\ldots,M$ . Then, the distribution after additional $n_{i}$ trials is given by the recursion

\displaystyle\mbox{\boldmath$\alpha$}(n_{1}+\ldots+n_{i})

\displaystyle=

\displaystyle\left[\mbox{\boldmath$\alpha$}(n_{1}+\ldots+n_{i-1})\quad{\bf 0}_% {1\times n_{i}}\right]{\bf P}^{(i)},

(30)

where ${\bf P}^{[i]}=[P_{j,j^{\prime}}^{[i]}]_{j,j^{\prime}=0,1,\ldots,n_{1}+\ldots+n% _{i}}$ , $P_{j,j^{\prime}}^{[i]}=\mathbb{P}(Z_{1}=j^{\prime}-j)$ . To compute ${\bf P}^{(i)}$ we apply the formula

\displaystyle{\bf P}^{(i)}

\displaystyle=

\displaystyle\left({\bf A}^{(i)}\right)^{n_{i}},

(31)

where

\displaystyle{\bf A}^{(i)}

\displaystyle=

\displaystyle[A^{(i)}]_{j,j^{\prime}=0,1,\ldots,n_{1}+\ldots+n_{i}}=\left[% \begin{array}[]{llllll}1-p_{i}&p_{i}&0&\ldots&\ldots&0\\ 0&1-p_{i}&p_{i}&\ldots&\ldots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ 0&\ldots&0&\ldots&1-p_{i}&p_{i}\end{array}\right].

It follows that, for $j=0,1,\ldots,z_{max}$ ,

\displaystyle\mathbb{P}(Z=j)

\displaystyle=

\displaystyle[\mbox{\boldmath$\alpha$}(n_{1}+\ldots+n_{M})]_{j}=\left[\mbox{% \boldmath$\alpha$}(n_{1}+\ldots+n_{M-1})\quad{\bf 0}_{1\times n_{M}}\right]{% \bf P}^{(M)}.

(33)

This method for computing the probabilities $\mathbb{P}(Z=z)$ for a sum $Z=\sum_{i=1}^{M}Z_{i}$ of independent nonnegative random variables $Z_{i}$ that take values in finite sets, can be applied for any desired discrete distributions of $Z_{i}$ , by replacing (31) with a suitable formula for ${\bf P}^{(i)}$ .

Alternatively, $\mathbb{P}(Z=z)$ can be computed by numerically inverting the probability generating function $G_{Z}(s)$ of the random variable $Z$ , which here is given by

\displaystyle G_{Z}(s)

\displaystyle=

\displaystyle\prod_{i=1}^{M}G_{Z_{i}}(s)=\prod_{i=1}^{M}(1-p_{i}+sp_{i})^{n_{i% }},

(34)

using a suitable inversion algorithm, for example see Abate and Whitt [1].

4 Numerical examples

Here, we construct examples to illustrate the theory and the application of algorithms discussed above. First, in Example 1, we construct a Markov model with a small state space, and demonstrate that the approximate solution converges to the exact optimal solution (which we obtained by applying standard dynamic programming methods). Next, in Example 2, we construct a realistically-sized Markov model with assumptions driven by practical considerations driven by the conditions of real-world hospitals, and demonstrate the application potential of our methodology.

4.1 Small-sized example

We consider the following simple example to illustrate the application of our Model in Section 2.1. As the size of the state space $\mathcal{S}$ in this example is small enough to allow for the application of standard dynamic programming methods, we apply Algorithm 1 in Appendix A to find the exact solution and then compare it with the approximation obtained using Algorithms 2–3.

Example 1

Consider a healthcare facility with $K=2$ wards, with capacity $m_{k}=1$ for each ward $k$ , and $I=2$ patient types. Assuming that the maximum capacity of the waiting area is $m_{3}=2$ , that is $\sum_{i=1}^{I}q_{i}\leq m_{3}$ , and that the arrivals are only accepted if there is available capacity in the system according to

\sum_{k=1}^{K}\sum_{i=1}^{I}n_{k,i}+\sum_{i=1}^{I}q_{i}\leq\sum_{k=1}^{K}m_{k}% =m,

(35)

we define the state space of the system as follows:

	$\displaystyle\mathcal{S}$	$\displaystyle=$	$\displaystyle\left\{\left(\begin{bmatrix}n_{1,1}\ n_{1,2}\\ n_{2,1}\ n_{2,2}\end{bmatrix},\begin{bmatrix}q_{1}\ q_{2}\end{bmatrix}\right):% \sum_{j=1}^{2}n_{1,j}\leq 1,\sum_{j=1}^{2}n_{2,j}\leq 1,\sum_{i=1}^{2}+\sum_{j% =1}^{2}n_{i,j}+\sum_{i=1}^{2}q_{i}\leq 2\right\}$
		$\displaystyle=$	$\displaystyle\left\{1,2,3,\ldots,22\right\},$

with

	$\displaystyle 1\equiv\left(\begin{bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),2\equiv\left(\begin{% bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}1\ 0\end{bmatrix}\right),3\equiv\left(\begin{% bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),4\equiv\left(\begin{% bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}2\ 0\end{bmatrix}\right),$
	$\displaystyle 5\equiv\left(\begin{bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}1\ 1\end{bmatrix}\right),6\equiv\left(\begin{% bmatrix}0\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 2\end{bmatrix}\right),7\equiv\left(\begin{% bmatrix}1\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),8\equiv\left(\begin{% bmatrix}1\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}1\ 0\end{bmatrix}\right),$
	$\displaystyle 9\equiv\left(\begin{bmatrix}1\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),10\equiv\left(\begin% {bmatrix}0\ 1\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),11\equiv\left(\begin% {bmatrix}0\ 1\\ 0\ 0\end{bmatrix},\begin{bmatrix}1\ 0\end{bmatrix}\right),12\equiv\left(\begin% {bmatrix}0\ 1\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),$
	$\displaystyle 13\equiv\left(\begin{bmatrix}0\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),14\equiv\left(\begin% {bmatrix}0\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}1\ 0\end{bmatrix}\right),15\equiv\left(\begin% {bmatrix}0\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),16\equiv\left(\begin% {bmatrix}0\ 0\\ 0\ 1\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),$
	$\displaystyle 17\equiv\left(\begin{bmatrix}0\ 0\\ 0\ 1\end{bmatrix},\begin{bmatrix}1\ 0\end{bmatrix}\right),18\equiv\left(\begin% {bmatrix}0\ 0\\ 0\ 1\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),19\equiv\left(\begin% {bmatrix}1\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),20\equiv\left(\begin% {bmatrix}0\ 1\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),$
	$\displaystyle 21\equiv\left(\begin{bmatrix}1\ 0\\ 0\ 1\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),22\equiv\left(\begin% {bmatrix}0\ 1\\ 0\ 1\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right).$

Further, assume that ward $i$ is the preferred ward for patient type $i$ , for $i=1,2$ , which is reflected by the expected values of the LoS, which are lower when patient type $i$ stays in ward $i$ for the whole duration of their stay. To model this, we assume that probabilities $p_{k,i}=\mathbb{P}(RLOS_{k,i}\leq 1)$ that patient type $i$ that is in ward $k$ today, leaves the hospital the next day, are given by,

\displaystyle{\bf p}=[p_{k,i}]_{\mathcal{K}\times\mathcal{I}}

\displaystyle=

\displaystyle\left[\begin{array}[]{cc}1/5&1/4\\ 1/10&1/3\end{array}\right],

(38)

and so the expected LoS for patient type $i=1$ is $E(LoS_{k,i})=5$ days if they stay in ward $k=1$ , or $E(LoS_{k,i})=10$ if they stay in ward $k=2$ , and some number between these two if the patient is transferred between the wards during their stay at the hospital. For patient type $i=2$ we have $E(LoS_{k,i})=3$ days if they stay in ward $k=2$ , or $E(LoS_{k,i})=4$ if they stay in ward $k=1$ , and some number between these two if the patient is transferred between the wards during their stay at the hospital.

We consider two decisions,

•

$a=1$ , assign arrived patients to their best available wards without transferring the patients between the wards, and
•

$a=2$ , assign arrived patients to their best available wards and allow transferring patients if required.

As example, given state $s=15\equiv\left(\begin{bmatrix}0\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 1\end{bmatrix}\right),$ decision $a=1$ with transform it into post decision state $s^{(a)}=\left(\begin{bmatrix}0\ 1\\ 1\ 0\end{bmatrix}\right),$ and then the process will move to state $s=10\equiv\left(\begin{bmatrix}0\ 1\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right)$ if patient type $1$ departs from ward $2$ , or to state $s=13\equiv\left(\begin{bmatrix}0\ 0\\ 1\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right)$ if patient type $2$ departs from ward $1$ . Alternatively, decision $a=2$ will transform $s=15$ into post-decision state $s^{(a)}=\left(\begin{bmatrix}1\ 0\\ 0\ 1\end{bmatrix}\right),$ and then the process will move to state $s=16\equiv\left(\begin{bmatrix}0\ 0\\ 0\ 1\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),$ if patient type $1$ departs from ward $1$ , or to state $s=7\equiv\left(\begin{bmatrix}1\ 0\\ 0\ 0\end{bmatrix},\begin{bmatrix}0\ 0\end{bmatrix}\right),$ if patient type $2$ departs from ward $2$ . No arrivals to waiting area are permitted if they cannot be assigned, that is, when the system is already full.

We assume the following cost parameters:

•

Assignment cost to ward $k$ per type- $i$ patient: $c^{(\sigma)}_{k,i}=c^{(\sigma)}=1$ for all $k,i$ .
•

Transfer cost from ward $k$ to ward $\ell$ per type- $i$ patient: $c^{(t)}_{k,\ell,i}=c^{(t)}=1.1$ for all $k,\ell,i$ .
•

Penalty cost for being in a nonprimary ward $k$ per type- $i$ patient: $c^{(p)}_{k,i}=c^{(p)}=0.2$ for all $k,i$ .

We apply policy iteration summarised in Algorithm 1 in Appendix A to find the exact solution. To do so, we first write explicit expressions for the probability matrices ${\bf P}^{(a)}$ and cost vectors ${\bf C}^{(a)}$ for $a=1,2$ . The details of this derivation are given in Appendix B.

From policy iteration we identify the optimal policy in our states of interest to be $a=2$ , ‘assign the waiting patients to their most preferred ward by transferring patients currently in the ward’. We determine that this optimal policy invokes a long run average cost per stage of $E^{*}=0.4098$ units.

We now present the results of the approximate policy iteration algorithm, summarised in Algorithms 2 and 3. For simulations, we assumed the starting value of each weight $\mbox{\boldmath$\theta$}_{0}=\boldsymbol{1}\times 10^{-4}$ .

Refer to caption — Figure 2: Values of $\theta^{(f)}_{n}$ in iteration $n$ of Algorithm 2 in in Example 1. Simulations were run for $M=10^{3},10^{4},10^{5}$ and $10^{6}$ states, respectively.

We find that the estimated value of $E$ converges to the optimal value $E^{*}=0.4098$ for large values of $M$ , after a small number of iterations. As expected, the values of $\theta$ and $E$ converge more tightly as $M$ , the number of states simulated, increases, as shown in Figure 3. Second, we observe that in this example, each value of $\theta$ converges to a distinct value, as shown in Figure 2. This is because the parameters $\boldsymbol{p}$ and $\boldsymbol{\lambda}$ are asymmetric, so we do not expect patients of different types $i$ to be weighted equally.

4.2 Realistically-sized example

We consider a realistically-sized example of our Model in Section 2.1. The structure of the model here is influenced by the practical considerations within real-world hospitals, in which patients are allocated to their primary (preferred) wards based on their types (medical needs), and when these wards are full, suitable policies are applied to decide which nonprimary ward a patient should be allocated to instead. Furthermore, we assume that patients arriving to a hospital when the system is full (all wards are full), are redirected to another facility. Therefore, the number of daily arrivals is unrestricted, however the number of arrivals admitted to the hospital is bounded by the system capacity. We fitted the parameters of the arrival and length of stay distributions to data, and focused on the application of our model to the analysis of how decision making affects the number of patients in nonprimary wards.

Example 2

Similary to Dai and Shi [10], assume five types of patients, $i=1,\ldots,5$ ( $I=5$ ): Orthopedic (Orth), Cardio (Card), Surgery (Surg), General Medicine (GenMed), and Other Medicine (OthMed); and corresponding wards, $k=i=1,\ldots,5$ ( $K=5$ ), most suitable to these patient types, respectively.

To estimate the key parameters of our model, we applied Matlab and performed the analysis of five-year patient flow data obtained from an Australian tertiary referral hospital, see Table 1.

$i$	$1$ (Ortho)	$2$ (Card)	$3$ (Surg)	$4$ (GenMed)	$5$ (OthMed)
$\lambda_{i}$	$2.0252$	$3.3565$	$10.0159$	$11.7442$	$38.7853$
$\mathbb{E}(LoS_{i})$	$5.1473$	$4.1414$	$3.9373$	$4.5209$	$2.8505$
$p_{k,i}$ , $k=i$	$0.1943$	$0.2415$	$0.2540$	$0.2212$	$0.3508$
$m_{k}$ , $k=i$	$12$	$15$	$38$	$50$	$99$

Table 1: Model parameters in Example 2 (

\lambda_{i}

and

\mathbb{E}(LoS_{i})

were fitted to data, and the remaining parameters were estimated):

\lambda_{i}

(daily arrival rate of patients type

i

p_{k,i}

(probability of patient of type

i

departing from ward

k

within a day), and

m_{k}

(capacity of ward

k

). The total capacity of the system is

m=\sum_{k}m_{k}=214

. We apply

p_{i,i}=1/\mathbb{E}(LoS_{i})

(probability of patient of type

i

departing within a day, assuming they are in the most suitable ward). When

k\not=i

, we let

p_{k,i}=\beta\times p_{i,i}

, for some

\beta<1

(so that the LoS of patients that are in less suitable wards, is increased). Here, we set

\beta=1/1.25

(and so the LoS of patients in less suitable wards increases by

25\%

on the average). Further, to estimate

m_{k}

, for each

k=i

, we applied M/M/N/N queueing model with arrival rate

\lambda_{i}

and service rate

\mu_{i}=1/\mathbb{E}(LoS_{i})

to find

N=m_{k}

such that the proportion of time the queue is full is

\pi_{N}<0.15

In order to focus on the impact of decision making on the number of patients observed in nonprimary wards, we applied the following cost parameters:

•

Transfer cost from ward $k$ to ward $\ell$ per type- $i$ patient: $c^{(t)}_{k,\ell,i}=c^{(t)}=1.1$ for all $k,\ell\not=k,i$ and assume large value at $k=\ell$ (to avoid such decision).
•

Penalty cost for being in a nonprimary ward $k$ per type- $i$ patient: $c^{(p)}_{k,i}=c^{(p)}=0.2$ for all $k,i$ .

(Here, we assumed the assignment costs $c^{(\sigma)}=1$ for each admitted or redirected patient, but have not included these costs in the optimisation, since all patients need to be admitted to some facility, and our focus is on the number of patients in nonprimary wards.)

We assume that the capacity of the waiting area is unlimited with $m_{k+1}=\infty$ and so the total number of arrivals $\sum_{i=1}^{I}q_{i}\leq m_{k+1}$ is unrestricted. However, the arrivals are only accepted if there is available capacity in the system according to $\sum_{i=1}^{I}n^{\prime}_{k,i}\leq m_{k}$ , where $\left([n^{\prime}_{k,i}]_{\mathcal{K}\times\mathcal{I}}\right)$ is post-decision state. Patients that may not be admitted, are redirected to another facility. The state space of the system is then

\displaystyle\mathcal{S}

\displaystyle=

\displaystyle\{\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times% \mathcal{I}}\right):n_{k,i}\geq 0,q_{i}\geq 0,\sum_{i=1}^{I}n_{k,i}\leq m_{k}\},

while the set of post-decision states is

\displaystyle\mathcal{S}^{\prime}

\displaystyle=

\displaystyle\left\{\left([n^{\prime}_{k,i}]_{\mathcal{K}\times\mathcal{I}}% \right):\sum_{i=1}^{I}n^{\prime}_{k,i}\leq m_{k}\right\}\ ,

with $|\mathcal{S}^{\prime}|=\prod_{k=1}^{K}\sum_{g=0}^{m_{k}}\binom{g+I-1}{I-1}=% \prod_{k=1}^{K}\binom{m_{k}+I}{I}$ by Heydar et al. [15]. Here, $|\mathcal{S}^{\prime}|=2.9544e+28$ .

Next, we assume the following order of assignment, represented as a matrix ${\bf O}=[O_{i,k}]$ , where $O_{i,k}=1$ means that ward $k$ is the first choice for type $i$ , $O_{i,k}=2$ is the second choice for type $i$ , and so on, with

\displaystyle{\bf O}

\displaystyle=

\displaystyle\left[\begin{array}[]{l||ccccc}&k=\mbox{Ortho}&k=\mbox{Card}&k=% \mbox{Surg}&k=\mbox{GenMed}&k=\mbox{OthMed}\\ \hline\cr\hline\cr i=\mbox{Ortho}&1&5&2&3&4\\ \hline\cr i=\mbox{Card}&3&1&2&4&5\\ \hline\cr i=\mbox{Surg}&2&3&1&5&4\\ \hline\cr i=\mbox{GenMed}&3&5&4&1&2\\ \hline\cr i=\mbox{OthMed}&3&5&4&2&1\end{array}\right].

(45)

As example, for the assignment of ’Ortho’ patients, the considered order of wards is ‘Ortho’, then ‘Surg’, then ‘GenMed’, then ‘OthMed’, and ‘Card’ is the last choice (used when there are no other choices). This order is similar to the order of assignment in Dai and Shi [10, Figure 1].

Suppose that patient type represents the order of assignment, and so we assign patients in the order of their type, assigning type $1$ patients first, then type $2$ patients, and so on. We consider two decisions,

•

$a=1$ , assign arrived patients to their best available wards, in the order of their types, without transferring patients between the wards, and
•

$a=2,3$ , assign arrived patients to their best available wards, in the order of their types, and allow transferring patients if required, with the constraint of no more than $y=4,10$ transfers in total, respectively.

As example, if

•

$q_{1}=7$ , $q_{2}=3$ , $q_{i}=0$ for $i=3,4,5$ (there are $7$ type- $1$ patients and $3$ type- $2$ patients in the queue);
•

and $n_{1,1}=11$ , $n_{1,2}=4$ , $n_{1,i}=0$ for $i=3,4,5$ (there are $22$ patients in ward $1$ , so it is full),

then applying $a=2$ means that

•

$4$ type- $1$ patients will be assigned to ward $1$ , while $3$ type- $1$ patients will be assigned to their best available ward other than ward $1$ ,
•

while $4$ type- $2$ patients will need to be transferred from ward $1$ to their best available ward (to make space for type- $1$ patients),
•

and finally, $3$ type- $2$ patients will be assigned from the queue to their best available ward.

More precisely, the assignments of patients corresponding to decisions $a=1$ and $a=2,3$ , are described in Algorithms 4-5 in Appendix A, respectively.

Next, we present the simulation output in Figures 4-6, focusing on the impact of these decisions on the number of patients observed that are in nonprimary wards (and ignoring the costs of these decisions for now). Simulation output suggests that under the model parameters assumed in Table 1, a policy in which we apply decision $a=3$ all the time, results in a better performance of the system in the sense that it reduces the number of patients in nonprimary wards when compared with applying decision $a=1$ or $a=2$ all the time. The performance of the system under decision $a=3$ is only slightly better than when compared with decision $a=2$ however.

In order to evalute the performance of the system under decisions $a=1,2,3$ , we require to also consider the costs. Since the size of the state space is very large, we apply the Approximate Policy Iteration summarised in Algorithms 2–3 in Appendix A.

We consider the vector of features $\mbox{\boldmath$\phi$}(s)$ of state $s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)$ defined as

\displaystyle\mbox{\boldmath$\phi$}(s)

\displaystyle=

\displaystyle\left[\begin{array}[]{cccccccccc}n_{1,1},&\sum_{i\not=1}n_{1,i},&% \ldots,&n_{5,5},&\sum_{i\not=5}n_{5,i},&q_{1},&\ldots,&q_{5}\end{array}\right],

(47)

that is, for each ward $k=1,\ldots,5$ , $n_{k,k}=$ the total number of type- $i$ patients in the primary ward $i=k$ ; $\sum_{i\not=k}n_{k,i}$ , the total number of other-type patients in that ward; and for each patient type $i=1,\ldots,5$ , $q_{i}=$ the number of type- $i$ patients in the queue waiting to be assigned.

Then, by (24), $\mathbb{E}(Q)=\lambda$ , and given state $s$ and decision $a$ , we have

\displaystyle\sum_{s^{{}^{\prime}}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |% \ (s,a)\right)\mbox{\boldmath$\phi$}(s^{{}^{\prime}})\mbox{\boldmath$\theta$}

\displaystyle=

\displaystyle\mathbb{E}(\mbox{\boldmath$\phi$}(s^{{}^{\prime}})\mbox{\boldmath% $\theta$}\ |\ (s,a))=\sum_{k=1}^{K}\sum_{i=1}^{I}n^{(a)}_{k,i}(1-p_{k,i})% \theta_{k,i}+\sum_{i=1}^{I}\lambda_{i}\theta_{i},

which is the expression we use in our computatations in Algorithms 2–3. The output presented in Figures 7-8 demonstrates that the algorithm converged as expected.

To analyse the impact of the algorithm on the performance of the system, we have then also simulated the evolution of a system in which the decision $a(s)$ given current state $s$ is chosen according to

\displaystyle a(s)=\operatorname*{arg\,min}_{a\in\mathcal{A}(s)}\{C(s,a)+\sum_% {k=1}^{K}\sum_{i=1}^{I}n^{(a)}_{k,i}(1-p_{k,i})\theta_{k,i}+\sum_{i=1}^{I}% \lambda_{i}\theta_{i},

(48)

using the parameters $\theta$ obtained from the output of the algorithm, and compared the output with the performance of the system under decisions $a=1,2,3$ . Each simulation $M=1,\ldots,1000$ , was executed over a 5-year period. The output is presented in Figures 9-10 and Table 2 below.

Policy	Mean cost	Mean nonprimary	Mean redirected
Apply a=1 always	6.8177	34.0887	6.6719
Apply a=2 always	5.7643	20.1370	5.9918
Apply a=3 always	5.6983	18.9862	5.9408
Near optimal solution	5.6449	20.2566	5.9720

Table 2: The output from simulations in Example 2.

We observe (Figure 9, Table 2) that under the near-optimal policy obtained by the application of this approach, the costs are minimised (costs are the lowest under near-optimal policy, and are the highest under $a=1$ ), with mean costs: $6.8177$ ( $a=1$ ), $5.7643$ ( $a=2$ ), $5.6983$ ( $a=3$ ), and ${\bf 5.6449}$ (our near optimal solution). Simultaneously, the performance of the system in the sense of reducing the number of patients observed in the nonprimary wards and the number of redirected patients per day, is also improved. The number of patients in nonprimary wards is significantly lower than under $a=1$ , and similar to $a=2$ but with lower costs than under $a=2,3$ .

Next (Figure 10), under near-optimal solution, $82.1918\%$ of the time no additional transfers are required. Also, under near-optimal solution, decisions $a=1,2,3$ are chosen approximately $50\%$ , $40\%$ and $10\%$ of the time, respectively.

5 Conclusion

We proposed a Markov Decision Process for a Patient Assignment Scheduling problem where, at the start of each time period (e.g. day, 8-hour block etc), patients are allocated to suitable wards, following relevant hospital policy. Due to the large size of the state space required to record the information about the hospital system, we developed methodology based on Approximate Dynamic Programming, for the computation of near-optimal solutions. We demonstrated the application potential of our model through examples with parameters fitted to data from a tertiary referral hospital in Australia.

Our model can be applied to a range of scenarios that are of interest in practical contexts in hospitals. We will analyse scheduling problems in data-driven examples, and consider additional costs, such as waiting costs or redirecting costs, and decisions such as transfers of inpatients within the hospitals that are not triggerred by arrivals, and take into account the duration of stay at the time of decision making. We will also consider scenarios with time varying parameters, in particular with seasonal components. These analyses will be reported in our forthcoming and future work.

6 Statements and declarations

Data

Data used in the paper was obtained following ethical approval from the Tasmanian Health and Medical Human Research Ethics Committee (HREC No 23633) and site-specific approval from the Research Governance Office of the Tasmanian Health Service.

Authorship contribution statement

This research has contributed to the Honours thesis by Krasnicki [19]. The following are the contributions of the authors, Małgorzata M. O’Reilly (MMO), Sebastian Krasnicki (SK), James Montgomery (JM), Mojtaba Heydar (MH), Richard Turner (RT), Pieter Van Dam (PVD), Peter Maree (PM):

•

Conceptualisation, Mathematical background: MMO, SK, MH, and JM;
•

Conceptualisation, Clinical background: RT, PVD, PM;
•

Sections 2-4, Appendix A-B, Methodology development: MMO, SK, and JM;
•

Sections 2-4, Derivation of the probability formulae: MMO;
•

Section 4, Example 1, Coding and analysis: SK, MMO, and MH;
•

Section 4, Example 2, Coding and analysis: MMO and JM.

Declaration of competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

References

[1] Numerical inversion of probability generating functions. Operations Research Letters, 12(4):245–251, 1992.
[2] Australia’s hospitals at a glance 2018-19. Cat. no. HSE 247. Canberra: AIHW. Australian Institute of Health and Welfare, 2020.
[3] A. K. Abera, M. M. O’Reilly, M. Fackrell, B. R. Holland, and M. Heydar. On the decision support model for the patient admission scheduling problem with random arrivals and departures: A solution approach. Stochastic Models, 36(2):312–336, 2020.
[4] A. Bagust, M. Place, and J. W. Posnett. Dynamics of bed use in accommodating emergency admissions: Stochastic simulation model. British medical journal, 318(7203):155–158, 1999.
[5] C. A. Bain, P. G. Taylor, G. McDonnell, and A. Georgiou. Myths of ideal hospital occupancy. Medical Journal of Australia, 192(1):42–43, 2010.
[6] B. Bilgin, P. Demeester, M. Misir, W. Vancroonenburg, and G. Vanden Berghe. One hyperheuristic approach to two timetabling problems in health care. Journal of Heuristics, 18(3):401–434, 2012.
[7] K. Butler and M. A. Stephens. The distribution of a sum of independent binomial random variables. Methodology and Computing in Applied Probability, 19(2):557–571, 2017.
[8] E. J. Carter, S. M. Pouch, and E. L. Larson. The relationship between emergency department crowding and patient outcomes: A systematic review. Journal of Nursing Scholarship, 46(2):106–115, 2014.
[9] E. Cummings, L. Ellis, A. Georgiou, K. E., C. Showell, and P. Turner. An evidence-based review and training resource on smooth patient flow. Ministry of Health, New South Wales Government, Australia, pages 1–34, 2012.
[10] J. G. Dai and P. Shi. Inpatient overflow: An approximate dynamic programming approach. Manufacturing and Service Operations Management, 21(4):894–911, 2019.
[11] P. Demeester, W. Souffriau, P. De Causmaecker, and G. Vanden Berghe. A hybrid tabu search algorithm for automatically assigning patients to beds. Artificial Intelligence in Medicine, 48(1):61–70, 2010.
[12] Y. Gocgun. Simulation-based approximate policy iteration for dynamic patient scheduling for radiation therapy. Health Care Management Science, 21(3):317–325, 2018.
[13] R. Grant, A. K. J. Smith, L. Newett, M. Nash, R. Turner, and L. Owen. Tasmanian healthcare professionals’ & students’ capacity for LGBTI + inclusive care: A qualitative inquiry. Health and Social Care in the Community, 29(4):957–966, 2021.
[14] W. B. Hall, L. E. Willis, S. Medvedev, and S. S. Carson. The implications of long-term acute care hospital transfer practices for measures of in-hospital mortality and length of stay. American Journal of Respiratory and Critical Care Medicine, 185(1):53–57, 2012.
[15] M. Heydar, M. M. O‘Reilly, E. Trainer, M. Fackrell, P. G. Taylor, and A. Tirdad. A stochastic model for the patient-bed assignment problem with random arrivals and departures. Annals of Operations Research, pages 1–33, 2021.
[16] R. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, 1960.
[17] E. Howell, E. Bessman, S. Kravet, K. Kolodner, R. Marshall, and S. Wright. Active bed management by hospitalists and emergency department throughput. Annals of Internal Medicine, 149(11):804–810, 2008.
[18] P. J. H. Hulshof, M. R. K. Mes, R. J. Boucherie, and E. W. Hans. Patient admission planning using approximate dynamic programming. Flexible Services and Manufacturing Journal, 28(1-2):30–61, 2016.
[19] S. Krasnicki. Modelling of optimal decision making in healthcare systems, Honours Thesis, The University of Tasmania, 2021.
[20] C. Morley, M. Unwin, G. M. Peterson, J. Stankovich, and L. Kinsman. Emergency department crowding: A systematic review of causes, consequences and solutions. PLoS ONE, 13(8), 2018.
[21] W. Powell. What you should know about approximate dynamic programming. Naval Research Logistics, 56(3):239–249, 2009.
[22] W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality: Second Edition. Wiley, 2011.
[23] W. B. Powell. What you should know about approximate dynamic programming. Naval Research Logistics, 56(3):239–249, 2009.
[24] W. B. Powell. Perspectives of approximate dynamic programming. Annals of Operations Research, 241(1-2):319–356, 2016.
[25] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York, 1994.
[26] W. Vancroonenburg, P. De Causmaecker, and G. Vanden Berghe. A study of decision support models for online patient-to-room assignment planning. Annals of Operations Research, 239:253–271, 2016.

Appendix A Appendix: Algorithms

Algorithm 1 Policy Iteration (Howard [16])

1:Initialise an aribitrary policy

\widehat{}\mbox{\boldmath$\pi$}=(a^{\widehat{}\mbox{\boldmath$\pi$}}(s))_{s\in% \mathcal{S}}

2:Let

v(m)=0

, where

\mathcal{S}=\{1,\ldots,m\}

3:Solve

E+v(s)-\sum_{s^{\prime}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a^{% \widehat{}\mbox{\boldmath$\pi$}}(s))\right)v(s^{\prime})=C(s,a^{\widehat{}% \mbox{\boldmath$\pi$}}(s))

for all

s=1,\ldots,m-1

4:Solve

\mbox{\boldmath$\pi$}=\arg\min_{a\in\mathcal{A}(s)}\Bigg{\{}C(s,a)+\sum_{s^{% \prime}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)v(s^{\prime})% \Bigg{\}}

5:If

\mbox{\boldmath$\pi$}=\widehat{}\mbox{\boldmath$\pi$}

, then go to Step 6. Else, let

\widehat{}\mbox{\boldmath$\pi$}=\mbox{\boldmath$\pi$}

and go to Step 2.

6:The optimal policy is

\mbox{\boldmath$\pi$}^{*}=\mbox{\boldmath$\pi$}

Algorithm 2 (Adapted from Dai and Shi [10])

1:Initialise

\mbox{\boldmath$\theta$}=\mbox{\boldmath$\theta$}_{0}

n=0

. Choose large

N

2:Initialise

E_{0}

using Algorithm 3 for

E

with

\mbox{\boldmath$\theta$}=\mbox{\boldmath$\theta$}_{0}

3:for

n=0,1,2,\ldots,N-1

4: Randomly initialise starting state

s_{0}

. Let

\boldsymbol{A}={\bf O}

\boldsymbol{b}=\boldsymbol{0}

5: for

m=0,1,2,\ldots,M-1

6: Given

s=s_{m}

, take decision

a(s)=\operatorname*{arg\,min}_{a\in\mathcal{A}(s)}\{C(s,a)+\sum_{s^{{}^{\prime% }}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)\mbox{\boldmath$% \phi$}(s^{{}^{\prime}})\mbox{\boldmath$\theta$}\}

7: Randomly generate state

s_{m+1}

using model parameters and post-decision state

s_{m}^{(a)}

8: Compute

\boldsymbol{A}_{m}=\frac{1}{M}\mbox{\boldmath$\phi$}(s_{m})^{T}(\mbox{% \boldmath$\phi$}(s_{m})-\mbox{\boldmath$\phi$}(s_{m+1}))

9: Compute

\boldsymbol{b}_{m}=\frac{1}{M}\mbox{\boldmath$\phi$}(s_{m})^{T}(C(s_{m},a(s_{m% }))-E_{n})

10: Let

\boldsymbol{A}=\boldsymbol{A}+\boldsymbol{A}_{m}

11: Let

\boldsymbol{b}=\boldsymbol{b}+\boldsymbol{b}_{m}

12: end for

13: Solve the set of linear equations

\boldsymbol{A}\mbox{\boldmath$\theta$}_{n+1}=\boldsymbol{b}

for

\mbox{\boldmath$\theta$}_{n+1}

14: Update

\mbox{\boldmath$\theta$}=\mbox{\boldmath$\theta$}_{n+1}

15: Let

n=n+1

16: Compute

E_{n}

using Algorithm 3 for

E

with

\mbox{\boldmath$\theta$}=\mbox{\boldmath$\theta$}_{n}

17:end for

18:

E^{\mbox{\boldmath$\pi$}^{*}}\approx E_{N}

Algorithm 3 (Adapted from Dai and Shi [10])

1:Input

\theta

. Let

E=0

2:Randomly initialise starting state

s_{0}

3:for

m=0,1,2,\ldots,M-1

4: Given state

s=s_{m}

, take decision

a(s)=\operatorname*{arg\,min}_{a\in\mathcal{A}(s)}\{C(s,a)+\sum_{s^{{}^{\prime% }}\in\mathcal{S}}\mathbb{P}\left(s^{\prime}\ |\ (s,a)\right)\mbox{\boldmath$% \phi$}(s^{{}^{\prime}})\mbox{\boldmath$\theta$}\}

5: Randomly generate state

s_{m+1}

using model parameters and post-decision state

s_{m}^{a(s)}

6: Let

E=E+C(s,a(s))

7:end for

8:Compute

E=E/M

Algorithm 4 Decision

a=1

. We make a decision based on priorities so as to guarantee the best possible bed for higher priority patients, without allowing transfers. Recursively, we find the highest priority of patients to allocate yet and transfer them to their best available wards.

1:Input

s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)

2:Initialise

s^{(a)}=[n_{k,i}^{(a)}]_{\mathcal{K}\times\mathcal{I}}=[n_{k,i}]_{\mathcal{K}% \times\mathcal{I}}

and

[x_{k,i}]_{\mathcal{K}\times\mathcal{I}}={\bf 0}

3:while

\sum_{i=1}^{I}q_{i}>0

4: Let

i=\min\{j:q_{j}>0\}

, current highest priority to allocate.

5: while

q_{i}>0

do (allocate type-

i

patients)

6: Find

\ell^{*}=\min\{\ell:\sum_{j=1}^{I}n_{w(\ell,i),j}^{(a)}<m_{w(\ell,i)}\}

7: Let

k^{*}=w(\ell^{*},i)

, best ward for type-

i

patient with an available bed.

8: Let

x_{k^{*},i}=x_{k^{*},i}+1

, and

n_{k^{*},i}^{(a)}=n_{k^{*},i}^{(a)}+1

9: Let

q_{i}=q_{i}-1

10: end while

11:end while

Algorithm 5 Decisions

a=2,3

. We make a decision based on priorities so as to guarantee the best possible bed for higher priority patients, and allow up to

y

transfers. Recursively, we find the highest priority of patients to allocate/transfer yet (Line 6). Next, we transfer type-

i

patients (Lines 7-13) before allocating newly arrived type-

i

patients (Lines 14-30). When transferring patients, we find a type-

i

patient that is in the worst ward and transfer them to a ward with the lowest transfer cost.

1:Input

s=\left([n_{k,i}]_{\mathcal{K}\times\mathcal{I}},[q_{i}]_{1\times\mathcal{I}}\right)

and the maximum number of transfers

y

2:Initialise

y^{*}=y

, current number of available transfers.

3:Initialise

[y_{k,i}]_{\mathcal{K}\times\mathcal{I}}={\bf 0}

, patients to be transferred.

4:Initialise

s^{(a)}=[n_{k,i}^{(a)}]_{\mathcal{K}\times\mathcal{I}}=[n_{k,i}]_{\mathcal{K}% \times\mathcal{I}}

and

(x_{k,i},y_{k,\ell,i})_{i\in\mathcal{I};k,\ell=1,\ldots K}={\bf 0}

5:while

\sum_{i=1}^{I}q_{i}>0

\sum_{i=1}^{I}\sum_{k=1}^{K}y_{k,i}>0

6: Let

i=\min\{j:q_{j}>0\mbox{ or }\sum_{k=1}^{K}y_{k,j}>0\}

, current highest priority to allocate/transfer.

7: while

\sum_{k=1}^{K}y_{k,i}>0

do (transfer type-

i

patient from ward

k^{*}

\ell^{*}

)

h^{*}=\max\{\ell:y_{w(\ell,i),i}>0\}

;

k^{*}=w(h^{*},i)

is the worst ward with a type-

i

patient;

10:

\ell^{*}=\arg\min_{\ell=1,\ldots,K}\{c^{(t)}_{k^{*},\ell,i}:\sum_{j=1}^{I}n_{% \ell,j}^{(a)}<m_{\ell}\}

has the lowest transfer cost.

11: Let

n_{\ell^{*},i}^{(a)}=n_{\ell^{*},i}^{(a)}+1

12: Let

y_{k^{*},\ell^{*},i}=y_{k^{*},\ell^{*},i}+1

y_{k^{*},i}=y_{k^{*},i}-1

13: end while

14: while

q_{i}>0

do (allocate type-

i

patients)

15: if

y^{*}=0

(no more transfers allowed) then (allocate type-

i

patient)

16:

\ell^{*}=\min\{\ell:\sum_{j=1}^{I}n_{w(\ell,i),j}^{(a)}<m_{w(\ell,i)}\}

;

17:

k^{*}=w(\ell^{*},i)

is the best ward for patient type

i

with an available bed.

18: Let

x_{k^{*},i}=x_{k^{*},i}+1

, and

n_{k^{*},i}^{(a)}=n_{k^{*},i}^{(a)}+1

19: end if

20: if

y^{*}>0

(transfers allowed) then (allocate type-

i

patient to ward

k^{*}

)

21:

\ell^{*}=\min\{\ell:\sum_{j=1}^{i}n_{w(\ell,i),j}^{(a)}<m_{w(\ell,i)}\}

;

22:

k^{*}=w(\ell^{*},i)

is the best ward for patient type

i

with a potential bed.

23: Let

x_{k^{*},i}=x_{k^{*},i}+1

, and

n_{k^{*},i}^{(a)}=n_{k^{*},i}^{(a)}+1

24: if

\sum_{j=1}^{I}n_{k^{*},j}^{(a)}=m_{k^{*}}+1

then (must transfer patient

i^{*}

out of ward

k^{*}

)

25:

i^{*}=\max\{j:j=i+1,\ldots,I;n_{k^{*},j}^{(a)}\not=0\}

is the lowest priority patient in ward

k^{*}

26: Let

y_{k^{*},i^{*}}=y_{k^{*},i^{*}}+1

y^{*}=y^{*}-1

27: end if

28: end if

29: Let

q_{i}=q_{i}-1

30: end while

31:end while

Appendix B Appendix: Example 1 – matrices ${\bf P}^{(a)}$ and vectors ${\bf C}^{(a)}$

We note that the set of all possible post-decision states $s^{(a)}$ is $\{1,7,10,13,16,19,20,21,22\}$ for both $a=1,2$ , with

\displaystyle s^{(1)}=\left\{\begin{array}[]{lll}1&\mbox{ for }&s=1\\ 7&\mbox{ for }&s=2,7\\ 10&\mbox{ for }&s=10\\ 13&\mbox{ for }&s=13\\ 16&\mbox{ for }&s=3,16\\ 19&\mbox{ for }&s=4,8,14,19\\ 20&\mbox{ for }&s=11,15,20\\ 21&\mbox{ for }&s=5,9,17,21\\ 22&\mbox{ for }&s=6,12,18,22\end{array}\right.\quad,\quad s^{(2)}=\left\{% \begin{array}[]{lll}s^{(1)}&\mbox{ for }&s\not=11,15\\ 21&\mbox{ for }&s=11,15\end{array}\right.\quad,

(60)

where the difference between $s^{(1)}$ and $s^{(2)}$ is in places corresponding to $s=11,15$ , which are the only two states where choosing between $a=1$ or $a=2$ can make a difference.

The probability matrices ${\bf P}^{(a)}$ of the discrete-time Markov chains corresponding to decisions $a=1,2$ are evaluated as follows. First, denote $q_{k,i}=1-p_{k,i}$ , the probability that type- $i$ patient does not depart from ward $k$ . Next, denote $p(k)=\frac{\lambda^{k}}{k!}e^{-\lambda}$ , $k=0,1,2,\ldots$ , the probability of observing $k$ arrivals in a Poisson process with rate $\lambda$ . Let

\displaystyle p(\widetilde{n})

\displaystyle=

\displaystyle 1-\sum_{k=0}^{n-1}p(k)=1-\sum_{k=0}^{n-1}\frac{\lambda^{k}}{k!}e% ^{-\lambda},\quad n=1,2,\ldots,

(61)

with $p(\widetilde{0})=1$ , interpreted as the probability that at least $n$ arrivals were observed; and

\displaystyle p(\ell_{1},\ldots,\ell_{I}\ |\ n)

\displaystyle=

\displaystyle\frac{n!}{\prod_{i=1}^{I}\ell_{i}!}\times\prod_{i=1}^{I}\left(% \frac{\lambda_{i}}{\lambda}\right)^{\ell_{i}}=\frac{n!}{\lambda^{n}}\times% \prod_{i=1}^{I}\frac{\lambda_{i}^{\ell_{i}}}{\ell_{i}!}\quad n=1,2,\ldots,

(62)

with $\ell_{1}+\ldots+\ell_{I}=n$ , interpreted as the conditional probability that, given $n$ patients have been admitted, there were $\ell_{1},\ldots,\ell_{I}$ patients of type $1,\ldots,I$ , respectively.

Further, let $p^{(n)}_{(\ell_{1},\ldots,\ell_{I})}$ be the probability that $n$ patients have been admitted and there were $\ell_{1},\ldots,\ell_{I}$ patients of type $1,\ldots,I$ , respectively, with $\ell_{1}+\ldots+\ell_{I}=n$ . Then, if the number of admissions is restricted by $\ell_{1}+\ldots+\ell_{I}\leq N$ , where $N$ is the number of available beds, we have,

	$\displaystyle p^{(n)}_{(\ell_{1},\ldots,\ell_{I})}$	$\displaystyle=$	$\displaystyle p(\ell_{1},\ldots,\ell_{I}\ \|\ n)p(n)=e^{-\lambda}\times\prod_{i% =1}^{I}\frac{\lambda_{i}^{\ell_{i}}}{\ell_{i}!}\quad\mbox{ for }n<N,$		(63)
	$\displaystyle p^{(\widetilde{N})}_{(\ell_{1},\ldots,\ell_{I})}$	$\displaystyle=$	$\displaystyle p(\ell_{1},\ldots,\ell_{I}\ \|\ N)p(\widetilde{N})=\left(1-\sum_{% k=0}^{N-1}\frac{\lambda^{k}}{k!}e^{-\lambda}\right)\times\frac{N!}{\lambda^{N}% }\times\prod_{i=1}^{I}\frac{\lambda_{i}^{\ell_{i}}}{\ell_{i}!},$		(64)

and $p^{(n)}_{(\ell_{1},\ldots,\ell_{I})}=0$ otherwise, with $p^{(\widetilde{0})}_{(0,0)}=1$ .

Also, denote,

	$\displaystyle{\bf p}$	$\displaystyle=$	$\displaystyle\left[\begin{array}[]{cccccc}p^{(0)}_{(0,0)}&p^{(1)}_{(1,0)}&p^{(% 1)}_{(0,1)}&p^{(\widetilde{2})}_{(2,0)}&p^{(\widetilde{2})}_{(1,1)}&p^{(% \widetilde{2})}_{(0,2)}\end{array}\right],$		(66)
	$\displaystyle{\bf w}$	$\displaystyle=$	$\displaystyle\left[\begin{array}[]{ccc}p^{(0)}_{(0,0)}&p^{(\widetilde{1})}_{(1% ,0)}&p^{(\widetilde{1})}_{(0,1)}\end{array}\right].$		(68)

Then, for $a=1$ , the nonzero entries of ${\bf P}^{(1)}$ are,

\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{1,1}&\ldots&{\bf P}^{(1)% }_{1,6}\end{array}\right]

\displaystyle=

\displaystyle{\bf p},

(70)

for $s=2,7$ ,

	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}\times{\bf p},$		(72)
	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,7}&\ldots&{\bf P}^{(1)% }_{s,9}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,1}\times{\bf w},$		(74)

for $s=3,16$ ,

	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{2,2}\times{\bf p},$		(76)
	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,16}&\ldots&{\bf P}^{(1% )}_{s,18}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{2,2}\times{\bf w},$		(78)

for $s=4,8,14,19$ ,

$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}p_{2,1}\times{\bf p},$	(80)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,7}&\ldots&{\bf P}^{(1)% }_{s,9}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,1}p_{2,1}\times{\bf w}$	(82)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,13}&\ldots&{\bf P}^{(1% )}_{s,15}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}q_{2,1}\times{\bf w},$	(84)
$\displaystyle{\bf P}^{(1)}_{s,19}$	$\displaystyle=$	$\displaystyle q_{1,1}q_{2,1},$	(85)

for $s=5,9,17,21$ ,

$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}p_{2,2}\times{\bf p},$	(87)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,7}&\ldots&{\bf P}^{(1)% }_{s,9}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,1}p_{2,2}\times{\bf w},$	(89)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,16}&\ldots&{\bf P}^{(1% )}_{s,18}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}q_{2,2}\times{\bf w},$	(91)
$\displaystyle{\bf P}^{(1)}_{s,21}$	$\displaystyle=$	$\displaystyle q_{1,1}q_{2,2},$	(92)

for $s=6,12,18,22$ ,

$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,2}p_{2,2}\times{\bf p},$	(95)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,10}&\ldots&{\bf P}^{(1% )}_{s,12}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,2}p_{2,2}\times{\bf w},$	(97)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,16}&\ldots&{\bf P}^{(1% )}_{s,18}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,2}q_{2,2}\times{\bf w},$	(99)
$\displaystyle{\bf P}^{(1)}_{s,22}$	$\displaystyle=$	$\displaystyle q_{1,2}q_{2,2},$	(100)

for $s=10$ ,

	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{10,1}&\ldots&{\bf P}^{(1% )}_{10,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,2}\times{\bf p},$		(102)
	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{10,10}&\ldots&{\bf P}^{(% 1)}_{10,12}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,2}\times{\bf w},$		(104)

for $s=11,15,20$ ,

$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,2}p_{2,1}\times{\bf p},$	(107)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,10}&\ldots&{\bf P}^{(1% )}_{s,12}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,2}p_{2,1}\times{\bf w},$	(109)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,13}&\ldots&{\bf P}^{(1% )}_{s,15}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,2}q_{2,1}\times{\bf w},$	(111)
$\displaystyle{\bf P}^{(1)}_{s,20}$	$\displaystyle=$	$\displaystyle q_{1,2}q_{2,1},$	(112)

and for $s=13$ ,

	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{13,1}&\ldots&{\bf P}^{(1% )}_{13,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{2,1}\times{\bf p},$		(114)
	$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{13,13}&\ldots&{\bf P}^{(% 1)}_{13,15}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{2,1}\times{\bf w}.$		(116)

For $a=2$ , rows $s=11,15$ are the only rows in ${\bf P}^{(2)}$ that differ from the rows of ${\bf P}^{(1)}$ , with their nonzero entries given by,

$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,1}&\ldots&{\bf P}^{(1)% }_{s,6}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}p_{2,2}\times{\bf p},$	(118)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,7}&\ldots&{\bf P}^{(1)% }_{s,9}\end{array}\right]$	$\displaystyle=$	$\displaystyle q_{1,1}p_{2,2}\times{\bf w},$	(120)
$\displaystyle\left[\begin{array}[]{ccc}{\bf P}^{(1)}_{s,16}&\ldots&{\bf P}^{(1% )}_{s,18}\end{array}\right]$	$\displaystyle=$	$\displaystyle p_{1,1}q_{2,2}\times{\bf w},$	(122)
$\displaystyle{\bf P}^{(1)}_{s,21}$	$\displaystyle=$	$\displaystyle q_{1,1}q_{2,2},$	(123)

and the remaining rows are the same as in ${\bf P}^{(1)}$ .

Finally, costs $C(s,a)$ are recorded as vectors ${\bf C}^{(a)}=[C(s,a)]_{s=1,\ldots,22}$ corresponding to decisions $a=1,2$ , such that,

$\displaystyle C(s,1)$	$\displaystyle=$	$\displaystyle 2c^{(\sigma)}I\{s=4,5,6\}+c^{(\sigma)}I\{s=2,3,8,9,11,12,14,15,1% 7,18\}$
		$\displaystyle+c^{(p)}I\{s=4,6,8,10,12,13,14,18,19,22\}+2c^{(p)}I\{s=11,15,20\}$
$\displaystyle C(s,2)$	$\displaystyle=$	$\displaystyle 2c^{(\sigma)}I\{s=4,5,6\}+c^{(\sigma)}I\{s=2,3,8,9,11,12,14,15,1% 7,18\}$
		$\displaystyle+c^{(p)}I\{s=4,6,8,10,12,13,14,18,19,22\}+2c^{(p)}I\{s=20\}+c^{(t% )}I\{s=11,15\},$

since the costs do not depend on the next observed state.

Markov Decision Process and Approximate Dynamic Programming for a Patient Assignment Scheduling problem

Abstract

1 Introduction

2 Markov Decision Process

2.1 Model

2.2 Decisions

2.3 Transition probabilities

2.4 Costs

3 Approximate Dynamic Programming approach

4 Numerical examples

4.1 Small-sized example

Example 1

4.2 Realistically-sized example

Example 2

5 Conclusion

6 Statements and declarations

References

Appendix A Appendix: Algorithms

Appendix B Appendix: Example 1 – matrices 𝐏(a)superscript𝐏𝑎{\bf P}^{(a)}bold_P start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT and vectors 𝐂(a)superscript𝐂𝑎{\bf C}^{(a)}bold_C start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT

Appendix B Appendix: Example 1 – matrices ${\bf P}^{(a)}$ and vectors ${\bf C}^{(a)}$