Tiered Service Architecture for Remote Patient Monitoring

Siddharth Chandak1, Isha Thapa2, Nicholas Bambos1,2 and David Scheinker2,3
1 Department of Electrical Engineering, Stanford University, USA.
2 Department of Management Science & Engineering, Stanford University, USA.
3 School of Medicine, Stanford University, USA.
{chandaks, ishadt, bambos, dscheink}@stanford.edu
Abstract

We develop a remote patient monitoring (RPM) service architecture, which has two tiers of monitoring: ordinary and intensive. The patient’s health state improves or worsens in each time period according to certain probabilities, which depend on the monitoring tier. The patient incurs a “loss of quality of life” cost or an “invasiveness” cost, which is higher under intensive monitoring than under ordinary. On the other hand, their health improves faster under intensive monitoring than under ordinary. In each period, the service decides which monitoring tier to use, based on the health of the patient. We investigate the optimal policy for making that choice by formulating the problem using dynamic programming. We first provide analytic conditions for selecting ordinary vs intensive monitoring in the asymptotic regime where the number of health states is large. In the general case, we investigate the optimal policy numerically. We observe a threshold behavior, that is, when the patient’s health drops below a certain threshold the service switches them to intensive monitoring, while ordinary monitoring is used during adequately good health states of the patient. The modeling and analysis provides a general framework for managing RPM services for various health conditions with medically/clinically defined system parameters.

I Introduction

Remote Patient Monitoring (RPM) is increasingly receiving attention as a method for monitoring patients with certain medical conditions in their normal living/working environments to increase their quality of life and the level of delivered health care [1, 2, 3]. This is becoming feasible via advancements in wearable medical devices, for example, wearable glucose monitors [4], smart watches with vital sign monitoring capabilities (e.g., heart rate, ECG, pulse oximetry) [5, 6], and other such sensors. Further, such devices are increasingly networked and can transmit and receive data over the Internet and act as edge devices in communication with computation servers in the Cloud.

Studies have shown the effectiveness of RPM for various medical conditions. For example, continuous glucose monitoring has been shown to improve glycemic control in patients with diabetes [7, 8]. Smart watches have been effectively used to monitor stress, movement disorders, sleep patterns, blood pressure, heart disease, and COVID-19 [5]. Other RPM devices have also been used to manage and track cardiac conditions, such as heart failure, arrhythmia, and hypertension [3]. These studies highlight the potential of RPM to improve patient outcomes and quality of life by allowing for timely interventions and personalized care.

However, the question remains of how intensively to monitor patients. Intensive monitoring schemes could range from remotely collecting more data on the patient health state and administering more medical intervention remotely (e.g., alerting the patient to increase medication dosage) to calling the patient into an urgent care facility.

While aggressive monitoring may provide more comprehensive data, it can be resource intensive, draining the wearable device battery [6] faster and requiring the clinicians to review more RPM data. From the patient’s perspective, under intensive monitoring the patient naturally experiences a higher “loss of quality of life” cost or “invasiveness” cost, since intensive monitoring would normally be more invasive to their personal lifestyle. In contrast, intensive monitoring (and correspondingly elevated medical intervention) would enable early detection and intervention for adverse events, and hence the patient’s health is expected to improve faster than under ordinary one. Therefore, to account for this trade-off between the invasiveness cost and the possibility of an early intervention, there is an inherent need for a systematic approach to determining the appropriate level of monitoring, based on the patient’s health state.

In this paper we develop a RPM service architecture, where the patient is placed under less or more intensive levels or tiers of monitoring, based on their health state. We then study the optimal monitoring strategy for this model and how it varies with different parameters. One would intuitively expect that a patient would be placed under intensive one when their health state deteriorates; on the other hand, they would be returned to ordinary monitoring when their health state improves enough. The decision to switch from ordinary to intensive monitoring (and/or vice versa) requires a systematic analysis and depends heavily on the various parameters of the service in a rather subtle and complicated way, as analyzed in the following sections.

Of course, the design of an RPM service for a specific medical condition is highly dependent on the specifics of that condition and requires specialized medical knowledge. The point of this paper is not to design a particular RPM service but to provide a general framework and systematic methodology for RPM services based on tunable parameters (e.g., health improvement or deterioration probabilities, monitoring options and invasiveness costs) so as to make justifiable monitoring choices. The parameters will have to be decided on and tuned by medical/clinical experts for condition-specific RPM services. While we work with a simplified model here, the intuition gained from the analysis can help clinicians take more informed monitoring decisions.

In section II, we develop the model of evolution of the patient health state under ordinary and intensive monitoring and demonstrate how it can be managed, using the methodology of dynamic programming. In section III, we provide some analytical results on the optimal management policy in the asymptotic regime of a large number of health states and provide conditions on the parameters for choosing ordinary vs intensive monitoring. In section IV, we numerically investigate the structure of optimal monitoring policies and demonstrate that they place the patient under intensive monitoring when the health state deteriorates below a certain threshold; otherwise they use ordinary monitoring. Finally, in section V, we discuss some extensions. Sketches of some necessary proofs are presented in the Appendix.

II The RPM Service Model

Consider a patient who can be in a health state ht{0,1,2,3,,H}subscript𝑡0123𝐻h_{t}\in\{0,1,2,3,...,H\}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 , 2 , 3 , … , italic_H } in each service time period t{0,1,2,3,}𝑡0123t\in\{0,1,2,3,...\}italic_t ∈ { 0 , 1 , 2 , 3 , … }. The RPM service places the patient in a monitoring/intervention state mt={o,i}subscript𝑚𝑡𝑜𝑖m_{t}\in\mathcal{M}=\{o,i\}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_M = { italic_o , italic_i } in each time period t𝑡titalic_t, abbreviated to monitoring state, where o𝑜oitalic_o denotes ordinary monitoring and i𝑖iitalic_i intensive monitoring. Thus, one can view the monitoring-patient joint state st=(mt,ht)×subscript𝑠𝑡subscript𝑚𝑡subscript𝑡s_{t}=(m_{t},h_{t})\in\mathcal{M}\times\mathcal{H}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ caligraphic_M × caligraphic_H as the system or service state at time t𝑡titalic_t.

A higher patient health state h{0,1,2,3,,H}0123𝐻h\in\{0,1,2,3,...,H\}italic_h ∈ { 0 , 1 , 2 , 3 , … , italic_H } corresponds to the patient having better health. In particular, the lowest health state 00 is critical in the sense that, when the patient drifts into that state, they go beyond the scope of the current service model; at that point other emergency and/or more severe medical interventions are required, which are outside the scope of this service. Because of that, the states (i,0)𝑖0(i,0)( italic_i , 0 ) and (o,0)𝑜0(o,0)( italic_o , 0 ) are absorbing for the Markovian evolution of the health state, as explained below. Indeed, when the patient enters heath state 00 under any monitoring state i𝑖iitalic_i or o𝑜oitalic_o, the service evolution stops, as other medical measures/interventions are initiated.

As seen below, in defining costs incurred at the various states, we first take the patient’s quality of life point of view. Under ordinary monitoring, the patient incurs a constant cost Co0subscript𝐶𝑜0C_{o}\geq 0italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≥ 0 at any state (o,h)𝑜(o,h)( italic_o , italic_h ) with h{1,2,,H}12𝐻h\in\{1,2,...,H\}italic_h ∈ { 1 , 2 , … , italic_H }. Correspondingly, under intensive monitoring, the patient incurs a constant cost Ci0subscript𝐶𝑖0C_{i}\geq 0italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 at any state (i,h)𝑖(i,h)( italic_i , italic_h ) with h{1,2,..,H}h\in\{1,2,..,H\}italic_h ∈ { 1 , 2 , . . , italic_H }. These costs reflect the invasiveness loss for the patient. One may argue about the costs in more elaborate ways, for example, including patient risk factors and operational considerations of the service. For simplicity, we focus here on the invasiveness argument mentioned above.

Of special interest is the critical health state h=00h=0italic_h = 0, where this model ceases to apply. On either state (o,0)𝑜0(o,0)( italic_o , 0 ) or (i,0)𝑖0(i,0)( italic_i , 0 ) where the patients health is critical a cost Ccsubscript𝐶𝑐C_{c}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is incurred.

We finally define the transition costs Coisubscript𝐶𝑜𝑖C_{o\to i}italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT (and Ciosubscript𝐶𝑖𝑜C_{i\to o}italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT), which is associated with the service transitioning the patient from ordinary to intensive monitoring (and vice versa, respectively).

We model the system as a controlled Markov chain, trying to stay as simple as possible, yet still capture the essence of the problem and get insights into its solution. We discuss the more general case in the final section. At the beginning of every time period t𝑡titalic_t, the service takes the decision/action (control) to either keep the monitoring state the same (as in the previous time period) or switch it to the alternate monitoring state. Formally, the decision/action space is 𝒜={o,i}𝒜𝑜𝑖\mathcal{A}=\{o,i\}caligraphic_A = { italic_o , italic_i } and each (state, action) pair is associated with a cost given by the function c:𝒮×𝒜+:𝑐maps-to𝒮𝒜superscriptc:\mathcal{S}\times\mathcal{A}\mapsto\mathbb{R}^{+}italic_c : caligraphic_S × caligraphic_A ↦ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. The transition probabilities are given by p(s|s,a)𝑝conditionalsuperscript𝑠𝑠𝑎p(s^{\prime}|s,a)italic_p ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_s , italic_a ) where s,s𝒮superscript𝑠𝑠𝒮s^{\prime},s\in\mathcal{S}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_s ∈ caligraphic_S and a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A. The cost functions and transition probabilities are defined as follows.

1. At health state 𝐡=𝟎𝐡0\mathbf{h=0}bold_h = bold_0:

No action is taken with the the service ceasing operation. A cost of Ccsubscript𝐶𝑐C_{c}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is incurred.

2. At health states 𝟏𝐡𝐇1𝐡𝐇\mathbf{1\leq h\leq H}bold_1 ≤ bold_h ≤ bold_H:

  1. a)

    Ordinary Monitoring (m=o𝑚𝑜m=oitalic_m = italic_o), no Switching (a=o𝑎𝑜a=oitalic_a = italic_o): Does not induce a monitoring change, and the system state transitions as follows:

    1. i)

      (o,h)a=o(o,min{h+1,H})𝑎𝑜𝑜𝑜1𝐻(o,h)\xrightarrow{a=o}(o,\min\{h+1,H\})( italic_o , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_o end_OVERACCENT → end_ARROW ( italic_o , roman_min { italic_h + 1 , italic_H } ) with prob. λosubscript𝜆𝑜\lambda_{o}italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT

    2. ii)

      (o,h)a=o(o,h1)𝑎𝑜𝑜𝑜1(o,h)\xrightarrow{a=o}(o,h-1)( italic_o , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_o end_OVERACCENT → end_ARROW ( italic_o , italic_h - 1 ) with prob. μo=1λosubscript𝜇𝑜1subscript𝜆𝑜\mu_{o}=1-\lambda_{o}italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT,

    and a cost c((o,h),o)=Co𝑐𝑜𝑜subscript𝐶𝑜c\big{(}(o,h),o\big{)}=C_{o}italic_c ( ( italic_o , italic_h ) , italic_o ) = italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is incurred. Note that min{h+1,H}1𝐻\min\{h+1,H\}roman_min { italic_h + 1 , italic_H } above is used to account for (o,H)(o,H)𝑜𝐻𝑜𝐻(o,H)\to(o,H)( italic_o , italic_H ) → ( italic_o , italic_H ) with prob. λosubscript𝜆𝑜\lambda_{o}italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT since H𝐻Hitalic_H is the highest health state.

  2. b)

    Ordinary Monitoring (m=o𝑚𝑜m=oitalic_m = italic_o) with Switching (a=i𝑎𝑖a=iitalic_a = italic_i): Induces a switch to intensive monitoring m=i𝑚𝑖m=iitalic_m = italic_i, and the system state transitions as follows:

    1. i)

      (o,h)a=i(i,min{h+1,H})𝑎𝑖𝑜𝑖1𝐻(o,h)\xrightarrow{a=i}(i,\min\{h+1,H\})( italic_o , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_i end_OVERACCENT → end_ARROW ( italic_i , roman_min { italic_h + 1 , italic_H } ) with prob. λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

    2. ii)

      (o,h)a=i(i,h1)𝑎𝑖𝑜𝑖1(o,h)\xrightarrow{a=i}(i,h-1)( italic_o , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_i end_OVERACCENT → end_ARROW ( italic_i , italic_h - 1 ) with prob. μi=1λisubscript𝜇𝑖1subscript𝜆𝑖\mu_{i}=1-\lambda_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT,

    and a cost c((o,h),i)=Coi+Ci𝑐𝑜𝑖subscript𝐶𝑜𝑖subscript𝐶𝑖c\big{(}(o,h),i\big{)}=C_{o\to i}+C_{i}italic_c ( ( italic_o , italic_h ) , italic_i ) = italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is incurred.

  3. c)

    Intensive Monitoring (m=i𝑚𝑖m=iitalic_m = italic_i), no Switching (a=i𝑎𝑖a=iitalic_a = italic_i): Does not induce a monitoring change, and the system state transitions as follows:

    1. i)

      (i,h)a=i(i,min{h+1,H})𝑎𝑖𝑖𝑖1𝐻(i,h)\xrightarrow{a=i}(i,\min\{h+1,H\})( italic_i , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_i end_OVERACCENT → end_ARROW ( italic_i , roman_min { italic_h + 1 , italic_H } ) with prob. λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

    2. ii)

      (i,h)a=i(i,h1)𝑎𝑖𝑖𝑖1(i,h)\xrightarrow{a=i}(i,h-1)( italic_i , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_i end_OVERACCENT → end_ARROW ( italic_i , italic_h - 1 ) with prob. μi=1λisubscript𝜇𝑖1subscript𝜆𝑖\mu_{i}=1-\lambda_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT,

    and a cost c((i,h),i)=Ci𝑐𝑖𝑖subscript𝐶𝑖c\big{(}(i,h),i\big{)}=C_{i}italic_c ( ( italic_i , italic_h ) , italic_i ) = italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is incurred.

  4. d)

    Intensive Monitoring (m=i𝑚𝑖m=iitalic_m = italic_i) with Switching (a=o𝑎𝑜a=oitalic_a = italic_o): Induces a switch to ordinary monitoring m=o𝑚𝑜m=oitalic_m = italic_o, and the system state transitions as follows:

    1. i)

      (i,h)a=o(o,max{h+1,H})𝑎𝑜𝑖𝑜1𝐻(i,h)\xrightarrow{a=o}(o,\max\{h+1,H\})( italic_i , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_o end_OVERACCENT → end_ARROW ( italic_o , roman_max { italic_h + 1 , italic_H } ) with prob. λosubscript𝜆𝑜\lambda_{o}italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT

    2. ii)

      (i,h)a=o(o,h1)𝑎𝑜𝑖𝑜1(i,h)\xrightarrow{a=o}(o,h-1)( italic_i , italic_h ) start_ARROW start_OVERACCENT italic_a = italic_o end_OVERACCENT → end_ARROW ( italic_o , italic_h - 1 ) with prob. μo=1λosubscript𝜇𝑜1subscript𝜆𝑜\mu_{o}=1-\lambda_{o}italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT,

    and a cost c((i,h),o)=Cio+Co𝑐𝑖𝑜subscript𝐶𝑖𝑜subscript𝐶𝑜c\big{(}(i,h),o\big{)}=C_{i\to o}+C_{o}italic_c ( ( italic_i , italic_h ) , italic_o ) = italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is incurred.

We can easily incorporate health state dependent costs and transition probabilities, but for simplicity we assume constant ones here. We make the following natural assumptions.

Assumption 1.
  1. (a)

    The transition probabilities satisfy: λiλosubscript𝜆𝑖subscript𝜆𝑜\lambda_{i}\geq\lambda_{o}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT.

  2. (b)

    The costs satisfy: 0CoCiCc0subscript𝐶𝑜subscript𝐶𝑖subscript𝐶𝑐0\leq C_{o}\leq C_{i}\leq C_{c}0 ≤ italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

The first assumption 1.a intuitively states that the patient’s health improves faster under intensive monitoring, rather than under ordinary. Regarding assumption 1.b, it is naturally expected that CoCisubscript𝐶𝑜subscript𝐶𝑖C_{o}\leq C_{i}italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, as the patient’s “annoyance” is higher under intensive monitoring/intervention than under ordinary. Further, given the severity of entering the critical state h=00h=0italic_h = 0, it is naturally expected that CoCiCcsubscript𝐶𝑜subscript𝐶𝑖subscript𝐶𝑐C_{o}\leq C_{i}\leq C_{c}italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, and practically Ccsubscript𝐶𝑐C_{c}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT may be much larger than Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

II-A Optimal Monitoring Control

We study this problem under the discounted cost setting of the dynamic programming methodology [9], hence, costs incurred t𝑡titalic_t time periods into the future (with respect to present) are discounted by a factor of γtsuperscript𝛾𝑡\gamma^{t}italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with 0<γ<10𝛾10<\gamma<10 < italic_γ < 1. Starting from state s0=s×subscript𝑠0𝑠s_{0}=s\in\mathcal{M}\times\mathcal{H}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_s ∈ caligraphic_M × caligraphic_H, the total expected (discounted) cost to be incurred is

𝔼[t=0T1γtc(st,at)+γTCc|so=s]𝔼delimited-[]superscriptsubscript𝑡0𝑇1superscript𝛾𝑡𝑐subscript𝑠𝑡subscript𝑎𝑡conditionalsuperscript𝛾𝑇subscript𝐶𝑐subscript𝑠𝑜𝑠\mathbb{E}\Big{[}\sum_{t=0}^{T-1}\gamma^{t}c(s_{t},a_{t})+\gamma^{T}C_{c}\Big{% |}\ s_{o}=s\Big{]}blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_c ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_s ]

when control action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is taken by the service, at cost c(st,at)𝑐subscript𝑠𝑡subscript𝑎𝑡c(s_{t},a_{t})italic_c ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) introduced above, when its state is st=(mt,ht)subscript𝑠𝑡subscript𝑚𝑡subscript𝑡s_{t}=(m_{t},h_{t})italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) at time t𝑡titalic_t, until the patient enters the critical state h=00h=0italic_h = 0 at time T𝑇Titalic_T and the service ceases operation. Hence, T𝑇Titalic_T is the time the patient spends in the service, and at time T𝑇Titalic_T the critical cost Ccsubscript𝐶𝑐C_{c}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is incurred, however, discounted to γTCCsuperscript𝛾𝑇subscript𝐶𝐶\gamma^{T}C_{C}italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. Thus, discounting by γ𝛾\gammaitalic_γ implicitly reflects the patient’s desire to stay longer in service, hence, incur the critical cost deeper in the future and discounted to γTCcsuperscript𝛾𝑇subscript𝐶𝑐\gamma^{T}C_{c}italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

A (stationary) monitoring policy π(s)𝜋𝑠\pi(s)italic_π ( italic_s ) is a rule map** each state s=(m,h)×𝑠𝑚s=(m,h)\in\mathcal{M}\times\mathcal{H}italic_s = ( italic_m , italic_h ) ∈ caligraphic_M × caligraphic_H to a control action a𝒜={o,i}𝑎𝒜𝑜𝑖a\in\mathcal{A}=\{o,i\}italic_a ∈ caligraphic_A = { italic_o , italic_i } to be taken at that state. The value function Vπ(s)subscript𝑉𝜋𝑠V_{\pi}(s)italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s ) of a policy π(s)𝜋𝑠\pi(s)italic_π ( italic_s ) is the total expected (discounted) cost the system will incur until reaching the critical state 00 and stop, when it starts from state s𝑠sitalic_s at time t=0𝑡0t=0italic_t = 0. That is,

Vπ(s)=𝔼[t=0T1γtc(st,π(st))+γTCc|s0=s]subscript𝑉𝜋𝑠𝔼delimited-[]superscriptsubscript𝑡0𝑇1superscript𝛾𝑡𝑐subscript𝑠𝑡𝜋subscript𝑠𝑡conditionalsuperscript𝛾𝑇subscript𝐶𝑐subscript𝑠0𝑠V_{\pi}(s)=\mathbb{E}\left[\sum_{t=0}^{T-1}\gamma^{t}c\Big{(}s_{t},\pi(s_{t})% \Big{)}+\gamma^{T}C_{c}\ \Big{|}\ s_{0}=s\right]italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_c ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) + italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_s ]

and satisfies the dynamic programming equation [9].

Vπ(s)=c(s,π(s))+γs𝒮(ss,π(s))Vπ(s),subscript𝑉𝜋𝑠𝑐𝑠𝜋𝑠𝛾subscriptsuperscript𝑠𝒮conditionalsuperscript𝑠𝑠𝜋𝑠subscript𝑉𝜋superscript𝑠V_{\pi}(s)=c\Big{(}s,\pi(s)\Big{)}+\gamma\sum_{s^{\prime}\in\mathcal{S}}% \mathbb{P}\Big{(}s^{\prime}\mid s,\pi(s)\Big{)}V_{\pi}(s^{\prime}),italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s ) = italic_c ( italic_s , italic_π ( italic_s ) ) + italic_γ ∑ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S end_POSTSUBSCRIPT blackboard_P ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ italic_s , italic_π ( italic_s ) ) italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ,

for all s𝒮=×𝑠𝒮s\in\mathcal{S}=\mathcal{M}\times\mathcal{H}italic_s ∈ caligraphic_S = caligraphic_M × caligraphic_H, given the Markovian evolution dynamics of the system, specified by the state transition probabilities defined above.

The goal is to find an optimal policy πsuperscript𝜋\pi^{*}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT minimizing the Vπsubscript𝑉𝜋V_{\pi}italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT over all policies π𝜋\piitalic_π, that is, Vπ(s)Vπ(s)subscript𝑉superscript𝜋𝑠subscript𝑉𝜋𝑠V_{\pi^{*}}(s)\leq V_{\pi}(s)italic_V start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s ) ≤ italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s ) for every s𝒮𝑠𝒮s\in\mathcal{S}italic_s ∈ caligraphic_S over all policies π𝜋\piitalic_π. For simplicity we write V(s)=Vπ(s)superscript𝑉𝑠subscript𝑉superscript𝜋𝑠V^{*}(s)=V_{\pi^{*}}(s)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_s ) = italic_V start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s ) and the following dynamic programming equation is satisfied [9].

V(s)=mina{o,i}{c(s,a)+γs𝒮(s|s,a)V(s)},superscript𝑉𝑠subscript𝑎𝑜𝑖𝑐𝑠𝑎𝛾subscriptsuperscript𝑠𝒮conditionalsuperscript𝑠𝑠𝑎superscript𝑉superscript𝑠V^{*}(s)=\min_{a\in\{o,i\}}\Bigg{\{}c(s,a)+\gamma\sum_{s^{\prime}\in\mathcal{S% }}\mathbb{P}\Big{(}s^{\prime}|s,a\Big{)}V^{*}(s^{\prime})\Bigg{\}},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_s ) = roman_min start_POSTSUBSCRIPT italic_a ∈ { italic_o , italic_i } end_POSTSUBSCRIPT { italic_c ( italic_s , italic_a ) + italic_γ ∑ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_S end_POSTSUBSCRIPT blackboard_P ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_s , italic_a ) italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } ,

and can be solved numerically to yield the optimal policy π(s)=π(m,h)superscript𝜋𝑠superscript𝜋𝑚\pi^{*}(s)=\pi^{*}(m,h)italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_s ) = italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_m , italic_h ), that is, what optimal decision to take when the patient is in health state hhitalic_h under monitoring m𝑚mitalic_m. For the state transition probabilities and costs defined before, this dynamic programming equation unfolds into:

  1. (i)

    For health states 1hH11𝐻11\leq h\leq H-11 ≤ italic_h ≤ italic_H - 1

    V(i,h)=min{Ci+γ[λiV(i,h+1)+μiV(i,h1)],\displaystyle V^{*}(i,h)=\min\bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,h{% +}1)+\mu_{i}V^{*}(i,h{-}1)\Big{]},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h ) = roman_min { italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h - 1 ) ] ,
    Cio+Co+γ[λoV(o,h+1)+μoV(o,h1)]},\displaystyle\;\;\;\;\;\;\;\;C_{i\to o}+C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,% h{+}1)+\mu_{o}V^{*}(o,h{-}1)\Big{]}\bigg{\}},italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h - 1 ) ] } ,
    V(o,h)=min{Co+γ[λoV(o,h+1)+μoV(o,h1)],\displaystyle V^{*}(o,h)=\min\bigg{\{}C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,h{% +}1)+\mu_{o}V^{*}(o,h{-}1)\Big{]},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h ) = roman_min { italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h - 1 ) ] ,
    Coi+Ci+γ[λiV(i,h+1)+μiV(i,h1)]}.\displaystyle\;\;\;\;\;\;\;\;C_{o\to i}+C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,% h{+}1)+\mu_{i}V^{*}(i,h{-}1)\Big{]}\bigg{\}}.italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h - 1 ) ] } .
  2. (ii)

    At health state H𝐻Hitalic_H,

    V(i,H)=min{Ci+γ[λiV(i,H)+μiV(i,H1)],\displaystyle V^{*}(i,H)=\min\bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,H)% +\mu_{i}V^{*}(i,H{-}1)\Big{]},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H ) = roman_min { italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H - 1 ) ] ,
    Cio+Co+γ[λoV(o,H)+μoV(o,H1)]},\displaystyle\;\;\;\;\;\;\;\;C_{i\to o}+C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,% H)+\mu_{o}V^{*}(o,H{-}1)\Big{]}\bigg{\}},italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H - 1 ) ] } ,
    V(o,H)=min{Co+γ[λoV(o,H)+μoV(o,H1)],\displaystyle V^{*}(o,H)=\min\bigg{\{}C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,H)% +\mu_{o}V^{*}(o,H{-}1)\Big{]},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H ) = roman_min { italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H - 1 ) ] ,
    Coi+Ci+γ[λiV(i,H)+μiV(i,H1)]}.\displaystyle\;\;\;\;\;\;\;\;C_{o\to i}+C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,% H)+\mu_{i}V^{*}(i,H{-}1)\Big{]}\bigg{\}}.italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H - 1 ) ] } .
  3. (iii)

    At the critical health state h=00h=0italic_h = 0,

    V(i,0)=V(o,0)=Cc.superscript𝑉𝑖0superscript𝑉𝑜0subscript𝐶𝑐V^{*}(i,0)=V^{*}(o,0)=C_{c}.italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , 0 ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , 0 ) = italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT .

Note that in every min{,}\min\{\cdot,\cdot\}roman_min { ⋅ , ⋅ } above, the first term corresponds to the control kee** the existing monitoring state, while the second corresponds to the control switching to the alternate monitoring state and incurring the switching cost.

II-B Simplified RPM Service

In order to reduce the number of parameters for tractability of the analysis, we make the following simplification.

Definition 1.

Simplified RPM Service:.

  1. (a)

    The cost for ordinary monitoring is set to zero: Co=0subscript𝐶𝑜0C_{o}=0italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.

  2. (b)

    The switching costs are set to zero: Cio=Coi=0subscript𝐶𝑖𝑜subscript𝐶𝑜𝑖0C_{i\to o}=C_{o\to i}=0italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT = 0.

Assumption 1 is still satisfied.

Given the limited space of this short paper, we work with the simplified RPM service below (analysis and numerical results), where interesting insights emerge. The zero cost of invasiveness under ordinary monitoring has no significant impact on the results, and is merely a technical assumption to make our analysis easier. The non-zero transition cost is true in several applications where the intensive monitoring just involves a higher rate of collecting data about the patient’s health. We briefly comment on how non-zero transition costs affect our results in Section V.

The simplified RPM service is illustrated in Figure 1 below.

Refer to caption
Figure 1: The Simplified RPM Service. The blue and red arrows represent the possible transitions from state (o,3)𝑜3(o,3)( italic_o , 3 ) for actions o𝑜oitalic_o and i𝑖iitalic_i respectively. The arrows are labeled with probability of transition and cost incurred.

The dynamic programming equations for Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT given above reduce in this case to the following:

  1. (i)

    For health states 1hH11𝐻11\leq h\leq H-11 ≤ italic_h ≤ italic_H - 1,

    V(i,h)=V(o,h)=min{Ci+γ[λiV(i,h+1)+μiV(i,h1)],Co+γ[λoV(o,h+1)+μoV(o,h1)]}superscript𝑉𝑖superscript𝑉𝑜subscript𝐶𝑖𝛾delimited-[]subscript𝜆𝑖superscript𝑉𝑖1subscript𝜇𝑖superscript𝑉𝑖1subscript𝐶𝑜𝛾delimited-[]subscript𝜆𝑜superscript𝑉𝑜1subscript𝜇𝑜superscript𝑉𝑜1\hskip-8.5359ptV^{*}(i,h)=V^{*}(o,h)\\ \hskip-8.5359pt=\min\Bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,h+1)+\mu_{i% }V^{*}(i,h-1)\Big{]},\\ \hskip 38.41139ptC_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,h+1)+\mu_{o}V^{*}(o,h-1% )\Big{]}\Bigg{\}}start_ROW start_CELL italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h ) end_CELL end_ROW start_ROW start_CELL = roman_min { italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h - 1 ) ] , end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h - 1 ) ] } end_CELL end_ROW (1)
  2. (ii)

    At health state H𝐻Hitalic_H,

    V(o,H)=V(i,H)=min{Ci+γ[λiV(i,H)+μiV(i,H1)],Co+γ[λoV(o,H)+μoV(o,H1)]}superscript𝑉𝑜𝐻superscript𝑉𝑖𝐻subscript𝐶𝑖𝛾delimited-[]subscript𝜆𝑖superscript𝑉𝑖𝐻subscript𝜇𝑖superscript𝑉𝑖𝐻1subscript𝐶𝑜𝛾delimited-[]subscript𝜆𝑜superscript𝑉𝑜𝐻subscript𝜇𝑜superscript𝑉𝑜𝐻1V^{*}(o,H)=V^{*}(i,H)\\ =\min\Bigg{\{}C_{i}+\gamma\Big{[}\lambda_{i}V^{*}(i,H)+\mu_{i}V^{*}(i,H-1)\Big% {]},\\ C_{o}+\gamma\Big{[}\lambda_{o}V^{*}(o,H)+\mu_{o}V^{*}(o,H-1)\Big{]}\Bigg{\}}start_ROW start_CELL italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H ) end_CELL end_ROW start_ROW start_CELL = roman_min { italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_H - 1 ) ] , end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ [ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_H - 1 ) ] } end_CELL end_ROW (2)
  3. (iii)

    At the critical health state h=00h=0italic_h = 0,

    V(i,0)=V(o,0)=Cc,superscript𝑉𝑖0superscript𝑉𝑜0subscript𝐶𝑐V^{*}(i,0)=V^{*}(o,0)=C_{c},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , 0 ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , 0 ) = italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , (3)

Note that, in the absence of switching costs (that is, Coi=Cio=0subscript𝐶𝑜𝑖subscript𝐶𝑖𝑜0C_{o\to i}=C_{i\to o}=0italic_C start_POSTSUBSCRIPT italic_o → italic_i end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_i → italic_o end_POSTSUBSCRIPT = 0), for any health state hhitalic_h the Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the same under both ordinary and intensive monitoring.

III Asymptotic Analysis

The optimal policy πsuperscript𝜋{\pi^{*}}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and value function Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be computed numerically (in general) from the dynamic programming equations (1)-(3) for any given set of system parameters. But to develop intuition and characterize the optimal policy, in this section, we analyse πsuperscript𝜋{\pi^{*}}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in the asymptotci regime of a large number of health states H1much-greater-than𝐻1H\gg 1italic_H ≫ 1, i.e., a dynamic range of health states H𝐻H\to\inftyitalic_H → ∞. This allows for analytic tractability of the optimal policy and closed-form conditions on our policies of interest. We make the following assumptions in this section.

Assumption 2.

Large H𝐻Hitalic_H Asymptotic Regime.

  1. (a)

    The number of health states is very large, i.e., the system operates in the asymptotic regime of H𝐻H\to\inftyitalic_H → ∞.

  2. (b)

    Under ordinary monitoring the patient’s health drifts downwards, i.e., the improvement probability is λo<0.5subscript𝜆𝑜0.5\lambda_{o}<0.5italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT < 0.5 and the worsening one is μo=1λo>0.5subscript𝜇𝑜1subscript𝜆𝑜0.5\mu_{o}=1-\lambda_{o}>0.5italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT > 0.5

Assumption 2.a allows us to use tools from random walk analysis, as done in Lemma 1, making the analysis tractable. Assumption 2.b allows for the Markov chain to remain stable (positive recurrent) in the asymptotic regime.

The results obtained in this asymptotic regime can be thought of as an approximation for the RPM service with finite H𝐻Hitalic_H, when H𝐻Hitalic_H grows large. In the next section, we numerically demonstrate that this asymptotic approximation tracks the optimal policy and the value function for our simplified RPM for a number of health states as low as H=5𝐻5H=5italic_H = 5.

We define Π~~Π\widetilde{\Pi}over~ start_ARG roman_Π end_ARG as the the set of policies under which the service chooses the same action irrespective of the monitoring state, that is,

Π~={ππ(i,h)=π(o,h),h1}.~Πconditional-set𝜋formulae-sequence𝜋𝑖𝜋𝑜for-all1\widetilde{\Pi}=\{\pi\mid\pi(i,h)=\pi(o,h),\;\forall h\geq 1\}.over~ start_ARG roman_Π end_ARG = { italic_π ∣ italic_π ( italic_i , italic_h ) = italic_π ( italic_o , italic_h ) , ∀ italic_h ≥ 1 } .

Based on (1), for all h11h\geq 1italic_h ≥ 1, we then have

π(o,h)=π(i,h)superscript𝜋𝑜superscript𝜋𝑖\displaystyle{\pi^{*}}(o,h)={\pi^{*}}(i,h)italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h ) = italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h )
=argmin{i,o}{Ci+γλiV(i,h+1)+γ(1λi)V(i,h1),\displaystyle=\operatorname*{arg\,min}_{\{i,o\}}\{C_{i}+\gamma\lambda_{i}V^{*}% (i,h+1)+\gamma(1-\lambda_{i})V^{*}(i,h-1),= start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT { italic_i , italic_o } end_POSTSUBSCRIPT { italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h + 1 ) + italic_γ ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h - 1 ) ,
Co+γλoV(o,h+1)+γ(1λo)V(o,h1)}\displaystyle\;\;\;\;\;\;C_{o}+\gamma\lambda_{o}V^{*}(o,h+1)+\gamma(1-\lambda_% {o})V^{*}(o,h-1)\}italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h + 1 ) + italic_γ ( 1 - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h - 1 ) }

This implies that πΠ~superscript𝜋~Π{\pi^{*}}\in\widetilde{\Pi}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ over~ start_ARG roman_Π end_ARG and we can restrict our attention to the set Π~~Π\widetilde{\Pi}over~ start_ARG roman_Π end_ARG. For simplicity, we introduce notation V(h)=V(o,h)=V(i,h)superscript𝑉superscript𝑉𝑜superscript𝑉𝑖V^{*}(h)=V^{*}(o,h)=V^{*}(i,h)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_o , italic_h ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_i , italic_h ) for hh\in\mathcal{H}italic_h ∈ caligraphic_H and similarly the notation π(h)superscript𝜋{\pi^{*}}(h)italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ).

We next define an important policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT where the patient stays under ordinary monitoring at all health states, i.e., πo(i,h)=πo(o,h)=osubscript𝜋𝑜𝑖subscript𝜋𝑜𝑜𝑜\pi_{o}(i,h)=\pi_{o}(o,h)=oitalic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_i , italic_h ) = italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_o , italic_h ) = italic_o for all h11h\geq 1italic_h ≥ 1. Note that πoΠ~subscript𝜋𝑜~Π\pi_{o}\in\widetilde{\Pi}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∈ over~ start_ARG roman_Π end_ARG and we define the corresponding value function Vo(h)subscript𝑉𝑜V_{o}(h)italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ). Our first lemma gives the value function for this policy and presents an important property about the optimal policy.

Lemma 1.

For the simplified RPM (Definition 1) and under Assumption 2,

  1. (a)

    The value function Vosubscript𝑉𝑜V_{o}italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT for the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is given by:

    Vo(h)=ϕhCc,subscript𝑉𝑜superscriptitalic-ϕsubscript𝐶𝑐V_{o}(h)=\phi^{h}C_{c},italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = italic_ϕ start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ,

    where

    ϕ=114λoμoγ22λoγ.italic-ϕ114subscript𝜆𝑜subscript𝜇𝑜superscript𝛾22subscript𝜆𝑜𝛾\phi=\frac{1-\sqrt{1-4\lambda_{o}\mu_{o}\gamma^{2}}}{2\lambda_{o}\gamma}.italic_ϕ = divide start_ARG 1 - square-root start_ARG 1 - 4 italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_γ end_ARG .

    Note that ϕ<1italic-ϕ1\phi<1italic_ϕ < 1 for γ<1𝛾1\gamma<1italic_γ < 1.

  2. (b)

    For any choice of parameters, there exists hsuperscripth^{\prime}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that, under the optimal policy, the patient prefers to stay in ordinary monitoring above health state hsuperscripth^{\prime}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, i.e., π(h)=osuperscript𝜋𝑜{\pi^{*}}(h)=oitalic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = italic_o, for all hhsuperscripth\geq h^{\prime}italic_h ≥ italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Proof.

See Appendix A. ∎

For the simplified RPM, the cost of invasiveness under ordinary monitoring is zero. Then Vo(h)=𝔼[γTCc|h0=h]subscript𝑉𝑜𝔼delimited-[]conditionalsuperscript𝛾𝑇subscript𝐶𝑐subscript0V_{o}(h)=\mathbb{E}[\gamma^{T}C_{c}|h_{0}=h]italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = blackboard_E [ italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_h ], where T𝑇Titalic_T is the time taken to reach health state 00, when started at health state hhitalic_h and when the patient always stay under ordinary monitoring. T𝑇Titalic_T here is precisely the hitting time of state 00 for a random walk initiated at state hhitalic_h. The proof for this lemma then follows from the moment generating function of the hitting time for an \infty-state random walk.

Refer to caption
(a) Optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT (always in ordinary monitoring)
Refer to caption
(b) Optimal policy is πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT (threshold policy with h¯=3¯3\bar{h}=3over¯ start_ARG italic_h end_ARG = 3)
Figure 2: Optimal policies (numerically computed) for H=6𝐻6H=6italic_H = 6 and two different parameter sets. (a) For λo=0.2,λi=0.3,Cc=20,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.3formulae-sequencesubscript𝐶𝑐20formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=20,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.3 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 20 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9 the optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT (using ordinary monitoring at all health states). (b) For λo=0.2,λi=0.3,Cc=60,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.3formulae-sequencesubscript𝐶𝑐60formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=60,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.3 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 60 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9 the optimal policy is πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT with h¯=3¯3\bar{h}=3over¯ start_ARG italic_h end_ARG = 3, that is, ordinary monitoring is used for h>33h>3italic_h > 3 and intensive for h33h\leq 3italic_h ≤ 3.

An important implication of the above lemma is that the policy where the patient chooses to stay in intensive monitoring for all health states is never optimal. Our next theorem shows that the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is actually optimal for a large choice of parameters.

Theorem 1.

Under Assumption 2, the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is optimal (π=πosuperscript𝜋subscript𝜋𝑜{\pi^{*}}=\pi_{o}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT) for the simplified RPM (Definition 1) when the parameters satisfy

γ(λiλo)(1ϕ2)CiCc.𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq\frac{C_{i}}{C_{c}}.italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG .
Proof.

See Appendix A. ∎

We next define a threshold-policy πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT characterized by the health state h¯>0¯0{\bar{h}}>0over¯ start_ARG italic_h end_ARG > 0. These are the policies under which there exists a threshold h¯¯{\bar{h}}over¯ start_ARG italic_h end_ARG such that the patient stays in intensive monitoring when the patient’s health is below or at the threshold h¯¯{\bar{h}}over¯ start_ARG italic_h end_ARG and in ordinary monitoring when their health is better than h¯¯{\bar{h}}over¯ start_ARG italic_h end_ARG. Note that πt,h¯Π~subscript𝜋𝑡¯~Π\pi_{t,\bar{h}}\in\widetilde{\Pi}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT ∈ over~ start_ARG roman_Π end_ARG. So πt,h¯(h)=isubscript𝜋𝑡¯𝑖\pi_{t,\bar{h}}(h)=iitalic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT ( italic_h ) = italic_i for 1hh¯1¯1\leq h\leq{\bar{h}}1 ≤ italic_h ≤ over¯ start_ARG italic_h end_ARG and πt,h¯(h)=osubscript𝜋𝑡¯𝑜\pi_{t,\bar{h}}(h)=oitalic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT ( italic_h ) = italic_o for h>h¯¯h>{\bar{h}}italic_h > over¯ start_ARG italic_h end_ARG. Our next theorem gives a set of conditions under which πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT is the optimal policy for some threshold h¯¯{\bar{h}}over¯ start_ARG italic_h end_ARG.

Theorem 2.

Under Assumption 2, the policy πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT is optimal (π=πt,h¯superscript𝜋subscript𝜋𝑡¯{\pi^{*}}=\pi_{t,\bar{h}}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT) for some threshold h¯¯{\bar{h}}over¯ start_ARG italic_h end_ARG for the simplified RPM (Definition 1) when the following two conditions are satisfied:

  1. (a)

    γ(λiλo)(1ϕ2)>CiCc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG.

  2. (b)

    γμo(1+γμo)1γ2λoμo1𝛾subscript𝜇𝑜1𝛾subscript𝜇𝑜1superscript𝛾2subscript𝜆𝑜subscript𝜇𝑜1\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}\leq 1divide start_ARG italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( 1 + italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG ≤ 1.

Proof.

See Appendix A. ∎

Condition a) here is precisely the complement of the condition in Theorem 1. Condition b) is an additional condition which our proof requires for the threshold policy to be optimal. In the asymptotic regime, we strongly believe that condition b) is not necessary, and condition a) alone is sufficient. Hence we believe that in the asymptotic regime, condition a) alone dictates what the optimal policy is and that the optimal policy can only be of two forms - πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT. This is reinforced by the numerical analysis, presented in the next section.

IV Performance

In this section, we glean insights on the optimal policy by numerically solving the dynamic programming equations given by (1)-(3) to find the optimal policy.

Figure 2 depicts the two policies discussed in the last section and a sample set of parameters under which they are optimal. Figure 2(a) shows the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, under which the patient stays in ordinary monitoring at all health states. Figure 2(b) shows the policy πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT with threshold t=3𝑡3t=3italic_t = 3, where the patient stays in intensive monitoring for health states h33h\leq 3italic_h ≤ 3 and in ordinary monitoring for health states h>33h>3italic_h > 3. Let model 2(a) use the set of parameters λo=0.2,λi=0.3,Cc=20,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.3formulae-sequencesubscript𝐶𝑐20formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=20,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.3 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 20 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9 and model 2(b) use the set of parameters λo=0.2,λi=0.3,Cc=60,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.3formulae-sequencesubscript𝐶𝑐60formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,\lambda_{i}=0.3,C_{c}=60,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.3 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 60 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9. Then πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT with t=3𝑡3t=3italic_t = 3 are optimal for model 2(a) and 2(b), respectively.

Note that the parameters for model 2(a) satisfy γ(λiλo)(1ϕ2)Ci/Cc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq C_{i}/C_{c}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which is sufficient for Theorem 1 to hold. Similarly the parameters for model 2(b) satisfy γ(λiλo)(1ϕ2)>Ci/Cc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>C_{i}/C_{c}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which is condition (a) in Theorem 2. Note, however, that the parameters in model 2(b) do not satisfy condition (b) of Theorem 2, implying that the condition is not necessary.

When H𝐻Hitalic_H is finite, there also exist instances where the optimal policy is πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where the patient chooses to stay in intensive monitoring for all health states hhitalic_h. But we observed that this policy is optimal only in extreme cases where H𝐻Hitalic_H is very small or γ𝛾\gammaitalic_γ is very close to 1111. Hence we do not further analyse this policy here.

Refer to caption
Figure 3: The optimal (blue) value function V(h)superscript𝑉V^{*}(h)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) (numerically computed) compared to its asymptotic counterpart (red, Vo(h)subscript𝑉𝑜V_{o}(h)italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) obtained from Lemma 1) for various health states hhitalic_h in a system with H=5,λo=0.2,λi=0.4,Cc=5,Ci=1,Co=0,γ=0.9formulae-sequence𝐻5formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.4formulae-sequencesubscript𝐶𝑐5formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9H=5,\lambda_{o}=0.2,\lambda_{i}=0.4,C_{c}=5,C_{i}=1,C_{o}=0,\gamma=0.9italic_H = 5 , italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.4 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9. Note the close proximity of the two plots.
Refer to caption
(a) Variation in optimal policy with Cc/Cisubscript𝐶𝑐subscript𝐶𝑖C_{c}/C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Refer to caption
(b) Variation in optimal policy with λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Refer to caption
(c) Variation in optimal policy with γ𝛾\gammaitalic_γ
Figure 4: Dependence of the (numerically computed) optimal monitoring policy on the (a) cost ratio Cc/Cisubscript𝐶𝑐subscript𝐶𝑖C_{c}/C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (with fixed λo=0.2,λi=0.4,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝜆𝑖0.4formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,\lambda_{i}=0.4,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.4 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9); on (b) λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (with fixed λo=0.2,Cc=50,Ci=1,Co=0,γ=0.9formulae-sequencesubscript𝜆𝑜0.2formulae-sequencesubscript𝐶𝑐50formulae-sequencesubscript𝐶𝑖1formulae-sequencesubscript𝐶𝑜0𝛾0.9\lambda_{o}=0.2,C_{c}=50,C_{i}=1,C_{o}=0,\gamma=0.9italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 50 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0 , italic_γ = 0.9); and on (c) γ𝛾\gammaitalic_γ (with fixed λo=0.2,λi=0.4Cc=50,Ci=1,Co=0formulae-sequenceformulae-sequencesubscript𝜆𝑜0.2subscript𝜆𝑖0.4subscript𝐶𝑐50formulae-sequencesubscript𝐶𝑖1subscript𝐶𝑜0\lambda_{o}=0.2,\lambda_{i}=0.4C_{c}=50,C_{i}=1,C_{o}=0italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0.2 , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.4 italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 50 , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 0). We set H=10𝐻10H=10italic_H = 10 in all cases. Below the vertical orange dashed line, the optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT (ordinary monitoring is used for all health states). This line is positioned at the point where the condition of Theorem 1 achieves equality. Above this value the policy changes to πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT and each threshold h¯¯\bar{h}over¯ start_ARG italic_h end_ARG is marked in blue. Note that the threshold h¯¯\bar{h}over¯ start_ARG italic_h end_ARG is increasing as Co/Cisubscript𝐶𝑜subscript𝐶𝑖C_{o}/C_{i}italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT gets higher in (a), i.e., the patient stays “longer” under intensive monitoring. The same behavior is seen in (b) and (c).

Figure 3 shows how closely our asymptotic analysis in Section III relates to the actual solutions of the dynamic programming equations. The model parameters here are chosen such that the optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT. We calculate V(h)superscript𝑉V^{*}(h)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) for this model, and compare it with the value function Vo(h)=ϕhCcsubscript𝑉𝑜superscriptitalic-ϕsubscript𝐶𝑐V_{o}(h)=\phi^{h}C_{c}italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = italic_ϕ start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT obtained for the asymptotic case H𝐻H\to\inftyitalic_H → ∞ (Lemma 1). As observed in the plot, the value function obtained for H=5𝐻5H=5italic_H = 5 is almost identical to the asymptotic approximation Vo(h)subscript𝑉𝑜V_{o}(h)italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ), and they slightly differ only near the top health state H𝐻Hitalic_H. The accuracy of the asymptotic approximation in predicting the optimal policy is further demonstrated by our next result.

We next study the impact of different parameters on the optimal policy πsuperscript𝜋{\pi^{*}}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Figure 4(a) shows how the optimal policy (numerically computed) varies with the cost ratio Cc/Cisubscript𝐶𝑐subscript𝐶𝑖C_{c}/C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For Cc/Ci<20.02subscript𝐶𝑐subscript𝐶𝑖20.02C_{c}/C_{i}<20.02italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 20.02 the optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, while for Cc/Ci>20.02subscript𝐶𝑐subscript𝐶𝑖20.02C_{c}/C_{i}>20.02italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 20.02 it is πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT with varied thresholds. Note that Cc/Ci=20.02subscript𝐶𝑐subscript𝐶𝑖20.02C_{c}/C_{i}=20.02italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 20.02 satisfies the condition in Theorem 1 with equality (the value of Cc/Cisubscript𝐶𝑐subscript𝐶𝑖C_{c}/C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which satisfies γ(λiλo)(1ϕ2)=Ci/CC𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝐶\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})=C_{i}/C_{C}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT). This shows that the condition obtained under the asymptotic assumption is a good indicator for our original problem with finite health states. As the ratio Cc/Cisubscript𝐶𝑐subscript𝐶𝑖C_{c}/C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT grows, the cost incurred on reaching the critical health state increases as compared to the cost of invasiveness, and it gets optimal for the patient to stay under intensive monitoring till their health significantly improves.

Figure 4(b) shows how the optimal policy varies as the probability λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT increases. The optimal policy is πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT for λi<0.28subscript𝜆𝑖0.28\lambda_{i}<0.28italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0.28 and πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT with varied thresholds for λi>0.28subscript𝜆𝑖0.28\lambda_{i}>0.28italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0.28. Again, λi=0.28subscript𝜆𝑖0.28\lambda_{i}=0.28italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.28 solves the condition in Theorem 1 with equality. As the probability λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT increases, the probability of the patient’s health improving under intensive monitoring improves, incentivizing the patient to stay under intensive monitoring for longer. Finally, Figure 4(c) shows the impact of γ𝛾\gammaitalic_γ on the optimal policy. As γ𝛾\gammaitalic_γ increases, the patient incurs a higher discounted cost on reaching the critical state, and hence they stay under intensive monitoring for a broader range of states.

V Conclusions and Extensions

We have developed a two-tier service architecture for remote patient monitoring (RPM), where the service policy decides whether to place the patient under ordinary or intensive monitoring, given their health state. The optimal policy is first analyzed in asymptotic regimes and conditions are established for choosing ordinary vs intensive monitoring. The policy is then numerically computed and the dependence of its behavior on various key parameters is investigated.

An important extension would be to consider the general monitoring model, which includes non-zero transition costs. Based on numerical experiments performed in the general case, the optimal policy in this case would be a threshold policy with two thresholds instead of the one observed in this paper. A patient under ordinary monitoring would be switched to intensive when their health state deteriorates below a certain lower health threshold, and a patient under intensive monitoring would switch to ordinary when their health state improves above an upper health threshold.

There are many other direct extensions that we could not include in the limited space of this short paper. One can consider a service architecture with more than two tiers, where a patient can be under ordinary monitoring (tier m=0), or intensive monitoring tier m=1, or (more) intensive monitoring tier m=2 and so on, with corresponding health care features and attributes. The costs and probabilities of transitions could also be made dependent on the health state, allowing for a more realistic model.

References

  • [1] F. A. C. d. Farias, C. M. Dagostini, Y. d. A. Bicca, V. F. Falavigna, and A. Falavigna, “Remote patient monitoring: a systematic review,” Telemedicine and e-Health, vol. 26, no. 5, pp. 576–583, 2020.
  • [2] L. P. Malasinghe, N. Ramzan, and K. Dahal, “Remote patient monitoring: a comprehensive study,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 57–76, 2019.
  • [3] A. Zinzuwadia, J. M. Goldberg, M. A. Hanson, and J. D. Wessler, “Continuous cardiology: the intersection of telehealth and remote patient monitoring,” in Emerging Practices in Telehealth.   Elsevier, 2023, pp. 97–115.
  • [4] I. Lee, D. Probst, D. Klonoff, and K. Sode, “Continuous glucose monitoring systems-current status and future perspectives of the flagship technologies in biosensor research,” Biosensors and Bioelectronics, vol. 181, p. 113054, 2021.
  • [5] M. Masoumian Hosseini, S. T. Masoumian Hosseini, K. Qayumi, S. Hosseinzadeh, and S. S. Sajadi Tabar, “Smartwatches in healthcare medicine: assistance and monitoring; a sco** review,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, p. 248, 2023.
  • [6] Y. B. David, T. Geller, I. Bistritz, I. Ben-Gal, N. Bambos, and E. Khmelnitsky, “Wireless body area network control policies for energy-efficient health monitoring,” Sensors, vol. 21, no. 12, p. 4245, 2021.
  • [7] M. I. Maiorino, S. Signoriello, A. Maio, P. Chiodini, G. Bellastella, L. Scappaticcio, M. Longo, D. Giugliano, and K. Esposito, “Effects of continuous glucose monitoring on metrics of glycemic control in diabetes: a systematic review with meta-analysis of randomized controlled trials,” Diabetes Care, vol. 43, no. 5, pp. 1146–1156, 2020.
  • [8] P. Prahalad, D. Scheinker, M. Desai, V. Y. Ding, F. K. Bishop, M. Y. Lee, J. Ferstad, D. P. Zaharieva, A. Addala, R. Johari et al., “Equitable implementation of a precision digital health program for glucose management in individuals with newly diagnosed type 1 diabetes,” Nature Medicine, pp. 1–9, 2024.
  • [9] D. Bertsekas, Dynamic programming and optimal control.   Athena scientific, 2012, vol. II.
  • [10] W. Feller, An introduction to probability theory and its applications, Volume 2.   John Wiley & Sons, 1991, vol. 81.

Appendix A Proofs

A-A Proof for Lemma 1

Proof.
  1. (a)

    For the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, the value function is

    Vo(h)=𝔼[t=0T1(γtc(st,πo(st)))+γTCc|h0=h].subscript𝑉𝑜𝔼delimited-[]superscriptsubscript𝑡0𝑇1superscript𝛾𝑡𝑐subscript𝑠𝑡subscript𝜋𝑜subscript𝑠𝑡conditionalsuperscript𝛾𝑇subscript𝐶𝑐subscript0V_{o}(h)=\mathbb{E}\left[\sum_{t=0}^{T-1}(\gamma^{t}c(s_{t},\pi_{o}(s_{t})))+% \gamma^{T}C_{c}|h_{0}=h\right].italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_c ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ) + italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_h ] .

    Here T𝑇Titalic_T denotes the time at which the patient reaches health state 00. Since πo(h)=osubscript𝜋𝑜𝑜\pi_{o}(h)=oitalic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = italic_o, c(st,πo(st))=0𝑐subscript𝑠𝑡subscript𝜋𝑜subscript𝑠𝑡0c(s_{t},\pi_{o}(s_{t}))=0italic_c ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) = 0 for all t<T𝑡𝑇t<Titalic_t < italic_T. This implies that Vo(h)=Cc𝔼[γT|h0=h]subscript𝑉𝑜subscript𝐶𝑐𝔼delimited-[]conditionalsuperscript𝛾𝑇subscript0V_{o}(h)=C_{c}\mathbb{E}[\gamma^{T}|h_{0}=h]italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT blackboard_E [ italic_γ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_h ], where T𝑇Titalic_T is the time at which the patient reaches health state 00 and hence the hitting time of state 00 for a random walk initialized at state hhitalic_h. Since μo>λosubscript𝜇𝑜subscript𝜆𝑜\mu_{o}>\lambda_{o}italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT > italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, the probability that the patient reaches health state 00 under this policy is 1111. Then [10, Chapter XIV, eqn. 4.8] gives us Vo(h)=Ccϕh,subscript𝑉𝑜subscript𝐶𝑐superscriptitalic-ϕV_{o}(h)=C_{c}\phi^{h},italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) = italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , where

    ϕ=114λoμoγ22λoγ.italic-ϕ114subscript𝜆𝑜subscript𝜇𝑜superscript𝛾22subscript𝜆𝑜𝛾\phi=\frac{1-\sqrt{1-4\lambda_{o}\mu_{o}\gamma^{2}}}{2\lambda_{o}\gamma}.italic_ϕ = divide start_ARG 1 - square-root start_ARG 1 - 4 italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_γ end_ARG .
  2. (b)

    Let h=log(Ci/Cc)/log(ϕ)+1superscriptsubscript𝐶𝑖subscript𝐶𝑐italic-ϕ1h^{\prime}=\lceil\log(C_{i}/C_{c})/\log(\phi)\rceil+1italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ⌈ roman_log ( italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) / roman_log ( italic_ϕ ) ⌉ + 1. Then for any hhsuperscripth\geq h^{\prime}italic_h ≥ italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, Vo(h)Vo(h)ϕCisubscript𝑉𝑜subscript𝑉𝑜superscriptitalic-ϕsubscript𝐶𝑖V_{o}(h)\leq V_{o}(h^{\prime})\leq\phi C_{i}italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) ≤ italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ italic_ϕ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Consider a policy πsuperscript𝜋\pi^{\prime}italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that π(h)=isuperscript𝜋𝑖\pi^{\prime}(h)=iitalic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_h ) = italic_i. Then Vπ(h)Cisubscript𝑉superscript𝜋superscriptsubscript𝐶𝑖V_{\pi^{\prime}}(h^{\prime})\geq C_{i}italic_V start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≥ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Now for any hhsuperscripth\geq h^{\prime}italic_h ≥ italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, Vo(h)<Vπ(h)subscript𝑉𝑜subscript𝑉superscript𝜋V_{o}(h)<V_{\pi^{\prime}}(h)italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h ) < italic_V start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_h ) as ϕ<1italic-ϕ1\phi<1italic_ϕ < 1 for γ<1𝛾1\gamma<1italic_γ < 1. This implies that the policy πsuperscript𝜋\pi^{\prime}italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT cannot be optimal, and hence under the optimal policy, the patient prefers to stay in ordinary monitoring for all health states hhsuperscripth\geq h^{\prime}italic_h ≥ italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This completes the proof for Lemma 1.

A-B Proof for Theorem 1

Proof.

We know that a policy π𝜋\piitalic_π is optimal if and only if

c(s,π(s))+γsp(s|s,π(s))Vπ(s)𝑐𝑠𝜋𝑠𝛾subscriptsuperscript𝑠𝑝conditionalsuperscript𝑠𝑠𝜋𝑠subscript𝑉𝜋superscript𝑠\displaystyle c(s,\pi(s))+\gamma\sum_{s^{\prime}}p(s^{\prime}|s,\pi(s))V_{\pi}% (s^{\prime})italic_c ( italic_s , italic_π ( italic_s ) ) + italic_γ ∑ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_s , italic_π ( italic_s ) ) italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
=mina(c(s,a)+γsp(s|s,a)Vπ(s)),absentsubscript𝑎𝑐𝑠𝑎𝛾subscriptsuperscript𝑠𝑝conditionalsuperscript𝑠𝑠𝑎subscript𝑉𝜋superscript𝑠\displaystyle=\min_{a}\left(c(s,a)+\gamma\sum_{s^{\prime}}p(s^{\prime}|s,a)V_{% \pi}(s^{\prime})\right),= roman_min start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_c ( italic_s , italic_a ) + italic_γ ∑ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_s , italic_a ) italic_V start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ,

is true for all states s𝑠sitalic_s [9, Proposition 2.2 and 2.3]. Hence policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is optimal if and only if

γ(λoVo(h+1)+μoVo(h1))𝛾subscript𝜆𝑜subscript𝑉𝑜1subscript𝜇𝑜subscript𝑉𝑜1\displaystyle\gamma(\lambda_{o}V_{o}(h+1)+\mu_{o}V_{o}(h-1))italic_γ ( italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h - 1 ) )
Ci+γ(λiVo(h+1)+μiVo(h1)),absentsubscript𝐶𝑖𝛾subscript𝜆𝑖subscript𝑉𝑜1subscript𝜇𝑖subscript𝑉𝑜1\displaystyle\leq C_{i}+\gamma(\lambda_{i}V_{o}(h+1)+\mu_{i}V_{o}(h-1)),≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_h - 1 ) ) , (4)

for all 1h11\geq h1 ≥ italic_h. Using Lemma 1, we have that πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is optimal if and only if, for all 1h11\leq h1 ≤ italic_h,

Ccγ(λoϕh+1+μoϕh1)Ci+Ccγ(λiϕh+1+μiϕh1)subscript𝐶𝑐𝛾subscript𝜆𝑜superscriptitalic-ϕ1subscript𝜇𝑜superscriptitalic-ϕ1subscript𝐶𝑖subscript𝐶𝑐𝛾subscript𝜆𝑖superscriptitalic-ϕ1subscript𝜇𝑖superscriptitalic-ϕ1\displaystyle C_{c}\gamma(\lambda_{o}\phi^{h+1}+\mu_{o}\phi^{h-1})\leq C_{i}+C% _{c}\gamma(\lambda_{i}\phi^{h+1}+\mu_{i}\phi^{h-1})italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_γ ( italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_h + 1 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT ) ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_h + 1 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT )
iff\displaystyle\iff Ccγϕh1((λoλi)ϕ2+(μoμi))Cisubscript𝐶𝑐𝛾superscriptitalic-ϕ1subscript𝜆𝑜subscript𝜆𝑖superscriptitalic-ϕ2subscript𝜇𝑜subscript𝜇𝑖subscript𝐶𝑖\displaystyle C_{c}\gamma\phi^{h-1}((\lambda_{o}-\lambda_{i})\phi^{2}+(\mu_{o}% -\mu_{i}))\leq C_{i}italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_γ italic_ϕ start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT ( ( italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
(b)superscriptiff𝑏\displaystyle\stackrel{{\scriptstyle(b)}}{{\iff}}start_RELOP SUPERSCRIPTOP start_ARG ⇔ end_ARG start_ARG ( italic_b ) end_ARG end_RELOP Ccγϕh1(λiλo)(1ϕ2)Ci.subscript𝐶𝑐𝛾superscriptitalic-ϕ1subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖\displaystyle C_{c}\gamma\phi^{h-1}(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq C% _{i}.italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_γ italic_ϕ start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Here (b) follows from the definition that μo=1λosubscript𝜇𝑜1subscript𝜆𝑜\mu_{o}=1-\lambda_{o}italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and μi=1λisubscript𝜇𝑖1subscript𝜆𝑖\mu_{i}=1-\lambda_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Note that if

γ(λiλo)(1ϕ2)CiCc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})\leq\frac{C_{i}}{C_{c}}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG

is true, then (A-B) is satisfied for all h11h\geq 1italic_h ≥ 1 which implies that the policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is optimal. This completes the proof for Theorem 1. ∎

A-C Proof for Theorem 2

Proof.

Since γ(λiλo)(1ϕ2)>CiCc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG, eqn. (A-B) is not satisfied for h=11h=1italic_h = 1, and hence πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is not optimal, which implies that staying under intensive monitoring is optimal for some state. Define Q(h,i)=Ci+γ(λiV(h+1)+μiV(h1))superscript𝑄𝑖subscript𝐶𝑖𝛾subscript𝜆𝑖superscript𝑉1subscript𝜇𝑖superscript𝑉1Q^{*}(h,i)=C_{i}+\gamma(\lambda_{i}V^{*}(h{+}1)+\mu_{i}V^{*}(h{-}1))italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) = italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) ) and Q(h,o)=γ(λoV(h+1)+μoV(h1))superscript𝑄𝑜𝛾subscript𝜆𝑜superscript𝑉1subscript𝜇𝑜superscript𝑉1Q^{*}(h,o)=\gamma(\lambda_{o}V^{*}(h{+}1)+\mu_{o}V^{*}(h{-}1))italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) = italic_γ ( italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) + italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) ). This denotes the Q-functions, where action i𝑖iitalic_i (or o𝑜oitalic_o, respectively) is taken when initialized at health state hhitalic_h and then actions are taken using the optimal policy. Note that π(h)=osuperscript𝜋𝑜{\pi^{*}}(h)=oitalic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = italic_o if Q(h,o)<Q(h,i)superscript𝑄𝑜superscript𝑄𝑖Q^{*}(h,o)<Q^{*}(h,i)italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) < italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ), and π(h)=isuperscript𝜋𝑖{\pi^{*}}(h)=iitalic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = italic_i otherwise. If we show that Q(h,o)Q(h,i)superscript𝑄𝑜superscript𝑄𝑖Q^{*}(h,o)-Q^{*}(h,i)italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) is monotonically decreasing as hhitalic_h increases, then if action o𝑜oitalic_o is optimal at some hhitalic_h, then it will also be optimal at h+11h+1italic_h + 1 and so on. As condition (a) already enforces that policy πosubscript𝜋𝑜\pi_{o}italic_π start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is not optimal and we have shown in Lemma 1 part (b) that there exists hsuperscripth^{\prime}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that the optimal policy for health states above hsuperscripth^{\prime}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is o𝑜oitalic_o, the optimal policy has to be πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT for some h¯¯\bar{h}over¯ start_ARG italic_h end_ARG. We can show that

Q(h,o)Q(h,i)=γ(λiλo)(V(h1)V(h+1))Ci.superscript𝑄𝑜superscript𝑄𝑖𝛾subscript𝜆𝑖subscript𝜆𝑜superscript𝑉1superscript𝑉1subscript𝐶𝑖Q^{*}(h,o)-Q^{*}(h,i)=\gamma(\lambda_{i}-\lambda_{o})(V^{*}(h-1)-V^{*}(h+1))-C% _{i}.italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) = italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) - italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) ) - italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Now,

(Q(h+1,o)Q(h+1,i))(Q(h,o)Q(h,i))superscript𝑄1𝑜superscript𝑄1𝑖superscript𝑄𝑜superscript𝑄𝑖\displaystyle\big{(}Q^{*}(h+1,o)-Q^{*}(h+1,i)\big{)}-\big{(}Q^{*}(h,o)-Q^{*}(h% ,i)\big{)}( italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 , italic_i ) ) - ( italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) )
=γ(λiλo)(V(h)V(h+2)\displaystyle=\gamma(\lambda_{i}-\lambda_{o})\Big{(}V^{*}(h)-V^{*}(h+2)= italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) - italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 2 )
V(h1)+V(h+1))\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;-V^{*}(h-1)+V^{*}% (h+1)\Big{)}- italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) )

Hence if V(h)+V(h+1)V(h1)+V(h+2)superscript𝑉superscript𝑉1superscript𝑉1superscript𝑉2V^{*}(h)+V^{*}(h{+}1)\leq V^{*}(h{-}1)+V^{*}(h{+}2)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) ≤ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 2 ) is true for all h11h\geq 1italic_h ≥ 1 then Q(h,o)Q(h,i)superscript𝑄𝑜superscript𝑄𝑖Q^{*}(h,o){-}Q^{*}(h,i)italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) is monotonically decreasing with hhitalic_h. Now we know using [9, Proposition 2.2] that

V(h)γλoV(h+1)+γμoV(h1),superscript𝑉𝛾subscript𝜆𝑜superscript𝑉1𝛾subscript𝜇𝑜superscript𝑉1V^{*}(h)\leq\gamma\lambda_{o}V^{*}(h+1)+\gamma\mu_{o}V^{*}(h-1),italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) ≤ italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) + italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) ,

and

V(h+1)γλoV(h+2)+γμoV(h).superscript𝑉1𝛾subscript𝜆𝑜superscript𝑉2𝛾subscript𝜇𝑜superscript𝑉V^{*}(h+1)\leq\gamma\lambda_{o}V^{*}(h+2)+\gamma\mu_{o}V^{*}(h).italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) ≤ italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 2 ) + italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) .

With some further manipulation, we can show that

V(h)+V(h+1)superscript𝑉superscript𝑉1\displaystyle V^{*}(h)+V^{*}(h+1)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) γλo(1+γλo)1γ2λoμoV(h+2)absent𝛾subscript𝜆𝑜1𝛾subscript𝜆𝑜1superscript𝛾2subscript𝜆𝑜subscript𝜇𝑜superscript𝑉2\displaystyle\leq\frac{\gamma\lambda_{o}(1+\gamma\lambda_{o})}{1-\gamma^{2}% \lambda_{o}\mu_{o}}V^{*}(h+2)≤ divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( 1 + italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 2 )
+γμo(1+γμo)1γ2λoμoV(h1).𝛾subscript𝜇𝑜1𝛾subscript𝜇𝑜1superscript𝛾2subscript𝜆𝑜subscript𝜇𝑜superscript𝑉1\displaystyle+\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu% _{o}}V^{*}(h-1).+ divide start_ARG italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( 1 + italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) .

We can show that γλo(1+γλo)1γ2λoμo𝛾subscript𝜆𝑜1𝛾subscript𝜆𝑜1superscript𝛾2subscript𝜆𝑜subscript𝜇𝑜\frac{\gamma\lambda_{o}(1+\gamma\lambda_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( 1 + italic_γ italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG is always less than 1111 for λo0.5subscript𝜆𝑜0.5\lambda_{o}\leq 0.5italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≤ 0.5. Hence if γμo(1+γμo)1γ2λoμo1𝛾subscript𝜇𝑜1𝛾subscript𝜇𝑜1superscript𝛾2subscript𝜆𝑜subscript𝜇𝑜1\frac{\gamma\mu_{o}(1+\gamma\mu_{o})}{1-\gamma^{2}\lambda_{o}\mu_{o}}\leq 1divide start_ARG italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( 1 + italic_γ italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG ≤ 1, then V(h)+V(h+1)V(h1)+V(h+2)superscript𝑉superscript𝑉1superscript𝑉1superscript𝑉2V^{*}(h)+V^{*}(h{+}1)\leq V^{*}(h{-}1)+V^{*}(h{+}2)italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 1 ) ≤ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h - 1 ) + italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h + 2 ) and hence Q(h,o)Q(h,i)superscript𝑄𝑜superscript𝑄𝑖Q^{*}(h,o){-}Q^{*}(h,i)italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_o ) - italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h , italic_i ) is monotonically decreasing with hhitalic_h. Under the additional condition that γ(λiλo)(1ϕ2)>CiCc𝛾subscript𝜆𝑖subscript𝜆𝑜1superscriptitalic-ϕ2subscript𝐶𝑖subscript𝐶𝑐\gamma(\lambda_{i}-\lambda_{o})(1-\phi^{2})>\frac{C_{i}}{C_{c}}italic_γ ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG, this implies that πt,h¯subscript𝜋𝑡¯\pi_{t,\bar{h}}italic_π start_POSTSUBSCRIPT italic_t , over¯ start_ARG italic_h end_ARG end_POSTSUBSCRIPT is the optimal policy for some threshold h¯¯\bar{h}over¯ start_ARG italic_h end_ARG. ∎