Stochastic Guidance of Buoyancy Controlled Vehicles under Ice Shelves using Ocean Currents

Federico Rossi1, Andrew Branch1, Michael P. Schodlok1, Timothy Stanton2,
Ian G. Fenty1, Joshua Vander Hook1, Evan B. Clark1
1 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109; {federico.rossi, andrew.branch, michael.p.schodlok, ian.fenty, hook evan.clark}@jpl.nasa.gov.2 Moss Landing Marine Laboratories; [email protected]
Abstract

We propose a novel technique for guidance of buoyancy-controlled vehicles in uncertain under-ice ocean flows. In-situ melt rate measurements collected at the grounding zone of Antarctic ice shelves, where the ice shelf meets the underlying bedrock, are essential to constrain models of future sea level rise. Buoyancy-controlled vehicles, which control their vertical position in the water column through internal actuation but have no means of horizontal propulsion, offer an affordable and reliable platform for such in-situ data collection. However, reaching the grounding zone requires vehicles to traverse tens of kilometers under the ice shelf, with approximate position knowledge and no means of communication, in highly variable and uncertain ocean currents. To address this challenge, we propose a partially observable MDP approach that exploits model-based knowledge of the under-ice currents and, critically, of their uncertainty, to synthesize effective guidance policies. The approach uses approximate dynamic programming to model uncertainty in the currents, and QMDP to address localization uncertainty. Numerical experiments show that the policy can deliver up to 88.8% of underwater vehicles to the grounding zone – a 33% improvement compared to state-of-the-art guidance techniques, and a 262% improvement over uncontrolled drifters. Collectively, these results show that model-based under-ice guidance is a highly promising technique for exploration of under-ice cavities, and has the potential to enable cost-effective and scalable access to these challenging and rarely observed environments.

I Introduction

Ice shelves are vast, floating slabs of ice that fringe 75 percent of the Antarctic coastline and act as “corks in the bottle”, preventing the rest of the ice on the continent from sliding into the ocean and catastrophically raising global sea levels. By the end of the century, the collapse of Antarctic ice shelves could trigger a meter or more of sea level rise, with profound effects for hundreds of millions of people worldwide. In total, Antarctic ice shelves hold back enough ice to raise global sea levels by more than fifty meters [1]. Yet a lack of detailed understanding about how ice shelves will behave in a warming climate remains a primary obstacle to accurate projections of sea level rise.

Current state-of-the-art sea-level rise projections have extremely large uncertainties. Specifically, the latest IPCC special report states that, by the end of the century, sea-level rise could range between 0.290.290.290.29 m and 1.11.11.11.1 m, depending on emission scenarios, associated climate policy, and the response of the Antarctic Ice Sheet as the world continues to warm [2]. Some studies suggest that sea-level rise of as much as two meters is possible by 2100 [3].

The single largest drivers of sea-level rise uncertainty are poorly-constrained numerical models of ice shelf melt and collapse, which suffer from a dearth of in-situ measurements to provide ground-truth for basal melt rates under ice shelves. Although various assets, ranging from underwater vehicles to borehole-deployed instruments, have managed to collect some in-situ data sets beneath ice shelves, these data sets usually do not observe basal melt rate, and are also severely limited in duration, location, and spatial distribution. Basal melt rates can be estimated from remote sensing by examining the residual differences between surface elevation change data from satellite altimeters and dynamical volume convergence calculations from combined ice field velocity, surface snow/ice mass flux, and firn compaction estimates [4]. However, large uncertainties remain due to unknown local atmospheric properties and firn behavior; critically, such techniques are inadequate at large ice-shelf grounding zones, which are primary contributors to melt, due to severe surface crevassing which does not allow for easy estimation of the aforementioned effects.

Refer to caption
Figure 1: IceNode concept of operations and mission phases

The grounding zone of an ice shelf is the area where the shelf first becomes buoyantly decoupled from the bedrock below it. Since the grounding zone is where ice first comes in contact with the warm, high-salinity, dense water entering at depth under the ice shelf, it is critical to capture the ice-ocean interactions as close as practical to this region. Due to large strain gradients created by the ice shelf “hinge” formed at this transition, grounding zones are typically heavily crevassed, making melt rate estimations from remote sensing difficult. While very limited melt rate and boundary layer turbulent fluxes have been measured under selected Antarctic ice shelves using borehole-deployed instruments [5] [6], no measurements have been made in active, warm ice shelves in close proximity to the grounding zone, as crevassing prevents borehole drilling operations. Furthermore, borehole based deployments are logistically complicated, expensive, and difficult to scale across multiple measurement sites concurrently for comparative studies.

In line with this, a recent Keck Institute for Space Studies workshop, The Slee** Giant: Measuring Ocean-Ice Interactions in Antarctica [7], identified long-duration measurements of melt rate as critically important to understand grounding zone dynamics and future sea-level rise, and recommended development of autonomous guidance capabilities to enable under ice vehicles to conduct cost-effective, practical, and persistent monitoring, returning long-duration ground-truth datasets from near the grounding zone.

In this paper we describe and demonstrate in simulation an autonomous guidance technique for a novel underwater vehicle, IceNode, designed to drift underneath ice shelves on melt-driven exchange currents, using buoyancy control but no propulsion, and then land against the underside of the ice to directly measure basal melt rate.

Our technique enables IceNode vehicles, deployed at the shelf edge by an ice breaker, to navigate to regions near the grounding zone to conduct landed science, and then return to open ocean to relay the collected data to scientists. Although the proposed technique is designed and evaluated for the IceNode vehicle, the approach is generally applicable to under-ice buoyancy-controlled vehicles navigating in melt-driven exchange flows, and has the potential to enable reliable and cost-effective access to these challenging and rarely observed environments.

I-A IceNode: System Description

IceNode is a buoyancy-driven vehicle with similar functionality to vertically profiling floats (e.g. [8]), but specifically designed to gather in-situ melt rate observations at the basal melt interface of large, difficult to access ice shelves. IceNode controls its vertical position in the water column using a variable buoyancy system (VBS) which pumps oil between an external bladder and the internal pressure hull to change density, and thus sink deeper or float higher. Compared to vertically profiling floats, IceNode offers the additional capability to buoyantly land against the underside of the ice shelf to acquire stable fixed measurements of turbulent heat, salt, and momentum fluxes using direct eddy-correlation techniques, which can be used to calculate in-situ melt rate [5]. Figure 1 shows the distinct stages of an IceNode mission. First, IceNode is deployed in open ocean near the shelf edge by an ice breaker. Next, in the ingress phase, IceNode exploits currents at different depths to navigate to a pre-specified target area near the grounding zone (this technique is the subject of this paper). While drifting in the water column, IceNode is localized using acoustic multilateration from moored sound sources placed at the shelf edge, using the same technique successfully demonstrated with EM-APEX floats in [9]. Once beneath the target area, IceNode ascends to a fixed standoff from the ice and uses an upward-looking Doppler Velocity Log (DVL) to locate a suitable landing location by examining surface slope and roughness. Once an appropriate landing location is found, IceNode deploys landing legs, releases a ballast weight to become highly positively buoyant, and lands against the underside of the ice. The vehicle then collects in-situ measurements of heat, salt, and momentum fluxes at the basal melt interface for a year. Once the landed phase is complete, IceNode jettisons its highly positively buoyant landing legs to achieve near-neutral buoyancy again and exploits melt-driven exchange currents to egress back to open water. Finally, the vehicle surfaces and transmits its mission data back to scientists over an Iridium link (the vehicle is not physically recovered). IceNodes are designed to be cheap (by underwater vehicle standards) and expendable, and multiple IceNodes are concurrently deployed at the shelf edge by an ice breaker. Individual vehicles can be directed to land in different regions under the shelf, and thus form an array of instrument platforms that acquires long-duration, concurrent, well-distributed time series of basal melt rate. The capability of IceNode to land and acquire direct melt rate measurements at the basal melt interface is unique among underwater vehicles, and the drift-based access technique enables long mission duration and cost-effective targeting of areas near the grounding zone not achievable with traditional borehole-deployed instrument packages.

I-B State of the Art

The cavities underneath ice shelves are notoriously difficult to access and return safely from, and are cut off from communication with the outside world by up to a thousand meters of ice overhead. IceNode’s concept of operations draws heavy inspiration from the successful 2019 University of Washington Applied Physics Lab (APL-UW) campaign of four EM-APEX floats beneath the Dotson Ice Shelf [9]. This campaign depended on melt-driven exchange flow to move the vehicles into, around, and out of the cavity. After deployment at the shelf edge by an ice breaker, the EM-APEX floats used a VBS to descend to a depth where they were swept underneath the cavity by deep inflow currents. During the ingress phase, the floats maintained a depth corresponding to 75% of the water column depth, periodically computed by bum** against the seafloor and the ice shelf base. After a pre-set timer elapsed, the floats transitioned to egress, and moved to 25% of the cavity depth to be swept out to sea on shallow outflowing currents. Throughout the mission, the floats collected conductivity, temperature, pressure, and current data, as well as recorded ranging signals from a set of three RAFOS acoustic moorings placed at the shelf edge. Using this technique, all four EM-APEX floats eventually emerged from the cavity, after spending multiple months and collectively traveling hundreds of km under the shelf, demonstrating that riding melt-driven exchange driven flows from the shelf edge is a viable technique for exploring ice shelf cavities.

Other successful missions have been conducted using AUVs [10] [11] [12] and gliders [13] from the shelf edge, and cabled instrumentation [5] [6] and HROVS deployed through boreholes [14] [15] . However, with the exception of long-duration borehole-deployed instruments and the APL-UW Dotson gliders and floats, these missions are typically short-lived (on the order of hours to days), only deploy a single vehicle or asset, and none directly provide long duration, spatially-distributed concurrent melt rate data sets directly at the basal melt interface.

Much research exists related to path planning of under-actuated marine vehicles in flow fields using ocean circulation models, including efficient long-range path planning of gliders in the presence of currents [16] [17], stationkee** of gliders [18] and vertically profiling floats [19] near a location of interest, avoiding glider surfacing in dangerous locations [20], and optimizing float coverage across oceans [21]. Similar techniques exist for path planning of aerostats in wind fields, including Google’s Project Loon using superpressure stratospheric balloons to provide internet connectivity [22], and planetary mission concepts for future Venus [23] and Titan [24] missions. The majority of these works generally assume that the the circulation model is known (either numerically or through accurate measurements), or that updated model predictions based on external measurements can be periodically communicated to the vehicle, enabling the use of deterministic path planning algorithms - reasonable assumptions for atmospheric and surface navigation, but unrealistic ones for the communication-denied environment considered in this paper. The work in [23] does use a stochastic model of the flow field, but it employs an extremely simple probabilistic model to capture the variability of currents.

I-C Contribution

Our contribution is twofold. First, we propose a novel approach for model-based under-ice guidance under model uncertainty. The approach, based on approximate dynamic programming, exploits model information to compute policies that exploit the currents for guidance with only vertical actuation; critically, it accommodates model uncertainty, rather than relying on a (possibly inaccurate or outdated) deterministic representation of under-ice currents. The approach is heavily inspired by [23]; we extend this work by (i) proposing a rigorous and systematic way of capturing the flow distribution from time-varying model data, and (ii) accommodating position uncertainty.

Second, we validate the approach through extensive numerical experiments set beneath the Pine Island Glacier ice shelf in Antarctica. Simulation results show that the proposed approach can can deliver up to 88.8% of underwater vehicles to the grounding zone – a 33% improvement compared to state-of-the-art under-ice guidance techniques for buoyant vehicles, and a 262% improvement over uncontrolled vehicles. The fraction of vehicles that reaches the grounding zone can be further increased up to 95% as localization uncertainty is reduced. Collectively, these results show that model-based under-ice guidance holds promise to enable previously-infeasible measurements of melt rates at the grounding zone of Antarctic glaciers in a cost-effective manner, providing critical in-situ measurements to improve sea-level rise models, and informing climate science and public policy.

I-D Organization

The rest of this paper is organized as follows. In Section II we formally state the under-ice guidance problem we wish to solve, and discuss assumptions. In Section III we describe the numerical model used to characterize under-ice flows, and discuss its relevance to the planning problem. Section IV presents the proposed approach to model-based under-ice guidance. The effectiveness of the proposed approach is assessed through numerical simulations in Section V. Finally, in Section VI, we present our conclusions and lay out directions for future research.

II Problem Statement

The goal of this paper is to provide an efficient algorithm for autonomous under-ice guidance of buoyancy-controlled vehicles in a partially-unknown flow field.

We assume that a stochastic model of the flow field in a region of interest is available. For each point in the domain, the model specifies the probabilistic distribution of the flow field that may be encountered at that location. Such a model can be obtained through, e.g., numerical simulations with state-of-the-art circulation models, numerically capturing uncertainty due to the initial conditions and assessing its impact on cavity flow uncertainty.

An autonomous vehicle navigates in the flow field. The vehicle can control its vertical location in the water column using a variable-buoyancy mechanism. The vehicle has no other means of propulsion, and its horizontal position evolves according to the flow field as a semi-Lagrangian tracer. The vehicle has access to a probabilistic distribution (or belief) of its likely location in the flow. A number of regions in the flow field are designated as end regions; each region is associated with a reward, which captures the scientific interest of reaching that region. The vehicle expends energy to control its position; we assume that the energy expenditure has a constant component which characterizes the “hotel load” for non-propulsion functions, and a variable component which captures the energy cost to ascend in the water column.

We are now in a position to formalize the problem that we wish to solve.

Problem 1 (Autonomous buoyancy-controlled guidance in uncertain flow field).

Given a stochastic model of a flow field, a set of end regions of interest, and costs capturing the vehicle’s energy expenditure, compute an optimal policy (i.e. a map** from beliefs about the vehicle location to desired controlled depths) that maximizes the total discounted reward obtained by the vehicle, i.e. the expected discounted reward for reaching a region of interest minus the expected discounted energy cost incurred along the trajectory.

III Ice Shelf Cavity Modeling

Availability of high-quality stochastic models of under-ice circulation is critical to the proposed guidance technique.

Refer to caption
Refer to caption
Figure 2: Instantaneous currents under the Pine Island ice shelf at 500500500500 m depth and across one vertical slice for one time step.

In this paper, we simulate ice-ocean interaction using the Massachusetts Institute of Technology general circulation model (MITgcm), which includes a dynamic/thermodynamic sea-ice model [25] and captures the temporal evolution of under-ice currents, temperature, and salinity. Freezing and melting processes in the sub-ice-shelf cavity are represented by the three-equation thermodynamics of Hellmer and Olbers [26] with modifications by Jenkins [27]. The model domain, the Pine Island Ice Shelf (shown in Figure 2), is derived from the global cube-sphere configuration (CS510) used by the ECCO2 project [28], with a nominal horizontal grid spacing of 280280280280 m and 250 vertical levels, each with 5555 m thickness. The bathymetry and ice shelf draft are provided by BedMachine Antarctica [29]. Initial conditions and boundary conditions for hydrography (temperature, salinity, and horizontal velocity components u𝑢uitalic_u and v𝑣vitalic_v) and sea ice (concentration, ice thickness, and snow thickness) are derived from a global, coarser resolution (20similar-toabsent20\sim 20∼ 20 km horizontal grid spacing), data-constrained solution for the period of 2009–2012. Due to the difference in resolution between the global model and the 280280280280 m model domain, a relaxation (10 grid points into the model domain) is applied to temperature and salinity at the boundaries to avoid artifacts such as wave energy radiating into the model interior; similarly, a 5-grid-point relaxation is used for sea ice variables. Surface forcing is provided by the ERA-Interim reanalysis project [30]. Similar configurations have been successfully applied to study ice shelf ocean interaction on the cube sphere grid (e.g., [31], [32], [33]).

The model uses an Arakawa C-Grid, where the three current velocity components are not co-located in a grid cell [34]. Four-dimensional (x,y,z,time)𝑥𝑦𝑧time(x,y,z,\text{time})( italic_x , italic_y , italic_z , time ) interpolation is applied for each velocity component independently. To determine the portion of the model containing navigable water, the portion of each cell that contains water (as opposed to land, or ice) is computed. The vertical depth boundaries of the model are determined using this data from the center of each cell, linearly interpolated in the x𝑥xitalic_x and y𝑦yitalic_y dimensions. The horizontal bounds of the navigable portion of the model are determined as any location where 4444D gridded depth interpolation is possible. This functionally truncates the bounds of the model by half a grid cell (140similar-toabsent140\sim 140∼ 140 m). These limitation are acceptable as the affected areas are below the resolution of the model, and only along the boundaries of the model, where it is undesirable for vehicles to travel.

Figure 2 shows a snapshot of the model output (specifically, the cavity flows) for one time step.

The simulated flow is consistent with the geophysical fluid dynamical and ice melt equations encoded in the numerical model, with the prescribed seafloor and ice-shelf cavity geometry, and with external atmospheric and open-boundary forcing. While significant uncertainties remain in the ice shelf cavity geometry and in atmospheric and open boundary forcing, we have confidence in many aspects of the simulated ice shelf cavity circulation, such as the propensity for the balanced flow to follow contours to conserve potential vorticity (f/h𝑓f/hitalic_f / italic_h, where hhitalic_h is the floor-ceiling column thickness), and the cavity-scale overturning circulation in which relatively dense, salty warm waters flow towards the grounding line at depth, and return towards the cavity entrance near the cavity ceiling as relatively lighter fresh cool waters. Of course, specific aspects of the time-variable, fine-scale turbulent circulation (i.e., the instantaneous arrangement of meso- and submeso-scale eddies and filaments) are assumed to be realistic and representative in a statistical sense. Therefore, while the true distribution of velocities within the cavity is unknown (and will remain so for all practical purposes), the numerical model provides a reasonable and useful approximation for the flow distribution that a float would encounter.

IV Guidance in uncertain flow fields

We are now in a position to describe the proposed approach to solve Problem 1. First, we present a continuous-space MDP formulation that leverages the MITgcm flow solution, which we solve through approximate dynamic programming (ADP) [35] to compute an optimal policy when the vehicle’s position is exactly known. Next, we discuss how QMDP [36] can be used to extend the applicability of the policy to the case where the vehicle’s position is only approximately known.

IV-A Continuous-space Markov Decision Process

We formalize the under-ice guidance problem as a continuous-state Markov Decision Process by defining its states, actions, transitions, rewards, and final states. We discretize time according to a discrete time step δ𝛿\deltaitalic_δ.

States

The state of the vehicle is the vehicle’s location under the ice. Formally, the set of states is:

𝒮={(x,y,z)|(x,y,z)navigable waterz>z¯},𝒮conditional-set𝑥𝑦𝑧𝑥𝑦𝑧navigable water𝑧¯𝑧\mathcal{S}=\{(x,y,z)|(x,y,z)\in\text{navigable water}\cap z>\underline{z}\},caligraphic_S = { ( italic_x , italic_y , italic_z ) | ( italic_x , italic_y , italic_z ) ∈ navigable water ∩ italic_z > under¯ start_ARG italic_z end_ARG } ,

where the navigable region under the ice is computed according to the model described in Section III, and the maximum allowable depth z¯¯𝑧\underline{z}under¯ start_ARG italic_z end_ARG captures the vehicle’s depth rating.

Actions

The vehicle can choose to move to a different depth a𝑎aitalic_a through a buoyancy control mechanism. The vehicle’s ascent and descent rate are constrained to be lower than a given maximum and minimum rate z˙¯¯˙𝑧\overline{\dot{z}}over¯ start_ARG over˙ start_ARG italic_z end_ARG end_ARG and z¯˙¯˙𝑧\underline{\dot{z}}under¯ start_ARG over˙ start_ARG italic_z end_ARG end_ARG respectively, and the vehicle should ensure that the desired depth will be within navigable waters. Formally, the actions available in state s are the set of depths:

𝒜((x,y,z))={a|δz¯˙(az)<δz˙¯(x,y,a)𝒮}𝒜𝑥𝑦𝑧conditional-set𝑎𝛿¯˙𝑧𝑎𝑧𝛿¯˙𝑧𝑥𝑦𝑎𝒮\mathcal{A}((x,y,z))=\{a|\delta\underline{\dot{z}}\leq(a-z)<\delta\overline{% \dot{z}}\cap(x,y,a)\in\mathcal{S}\}caligraphic_A ( ( italic_x , italic_y , italic_z ) ) = { italic_a | italic_δ under¯ start_ARG over˙ start_ARG italic_z end_ARG end_ARG ≤ ( italic_a - italic_z ) < italic_δ over¯ start_ARG over˙ start_ARG italic_z end_ARG end_ARG ∩ ( italic_x , italic_y , italic_a ) ∈ caligraphic_S }

Final States

Certain states 𝒮𝒮\mathcal{F}\subset\mathcal{S}caligraphic_F ⊂ caligraphic_S are denoted as final states: when the vehicle reaches one of these states f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F, it receives a lump reward r(f)𝑟𝑓r(f)italic_r ( italic_f ) and transitions to landing mode. The final states denote the desired landing regions for the vehicle, and the corresponding reward captures the scientific interest of the landing zone. We also model all infeasible states (x,y,z)𝒮𝑥𝑦𝑧𝒮(x,y,z)\not\in\mathcal{S}( italic_x , italic_y , italic_z ) ∉ caligraphic_S (i.e., all states not in navigable water or outside the domain of the model) as final states, associated with a negative reward.

Transitions

We model the vehicle as a semi-Lagrangian tracer, where the horizontal dynamics are driven by the flow field, and the vertical dynamics are controlled through the variable buoyancy mechanism. We leverage the flow field model described in Section III to capture the stochastic dynamics that the vehicle may encounter. Specifically, we assume that the model captures the likely distribution of the flow field at every point in the state space. Due to uncertainty in the initial and boundary conditions, the model cannot accurately reproduce the flow that will be encountered by the vehicle at a specific time; however, we assume that the empirical temporal distribution of flow velocities encountered over the simulation is representative of the probabilistic distribution of velocities that the vehicle may encounter. Rigorously, define v(x,y,z)𝑣𝑥𝑦𝑧\vec{v}(x,y,z)over→ start_ARG italic_v end_ARG ( italic_x , italic_y , italic_z ) as a random variable denoting the flow velocity encountered by the vehicle at state (x,y,z)𝑥𝑦𝑧(x,y,z)( italic_x , italic_y , italic_z ), and denote as v~(t,x,y,z)~𝑣𝑡𝑥𝑦𝑧\tilde{v}(t,x,y,z)over~ start_ARG italic_v end_ARG ( italic_t , italic_x , italic_y , italic_z ) the flow velocity predicted by the numerical model described in Section III. We assume that

P(v(x,y,z)=v)t0tf1v~(t,x,y,z)=v𝑑tproportional-to𝑃𝑣𝑥𝑦𝑧𝑣superscriptsubscriptsubscript𝑡0subscript𝑡𝑓subscript1~𝑣𝑡𝑥𝑦𝑧𝑣differential-d𝑡P(\vec{v}(x,y,z)=v)\propto\int_{t_{0}}^{t_{f}}1_{\tilde{v}(t,x,y,z)=v}dtitalic_P ( over→ start_ARG italic_v end_ARG ( italic_x , italic_y , italic_z ) = italic_v ) ∝ ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 1 start_POSTSUBSCRIPT over~ start_ARG italic_v end_ARG ( italic_t , italic_x , italic_y , italic_z ) = italic_v end_POSTSUBSCRIPT italic_d italic_t

where t0subscript𝑡0{t_{0}}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and tfsubscript𝑡𝑓{t_{f}}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT are the temporal boundaries of the cavity model simulation, and 1xsubscript1𝑥1_{x}1 start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is a Boolean function assuming value 1 if x𝑥xitalic_x is true and 0 otherwise.

For a given state (x,y,z)𝒮𝑥𝑦𝑧𝒮(x,y,z)\in\mathcal{S}( italic_x , italic_y , italic_z ) ∈ caligraphic_S, action a𝒜((x,y,z))𝑎𝒜𝑥𝑦𝑧a\in\mathcal{A}((x,y,z))italic_a ∈ caligraphic_A ( ( italic_x , italic_y , italic_z ) ), and realization of the velocity field v(x,y,z)𝑣𝑥𝑦𝑧\vec{v}(x,y,z)over→ start_ARG italic_v end_ARG ( italic_x , italic_y , italic_z ), we model the vehicle transition as

s=(x,y,z)=[x+vx(x,y,z)δ,y+vy(x,y,z)δ,a],superscript𝑠superscript𝑥superscript𝑦superscript𝑧𝑥subscript𝑣𝑥𝑥𝑦𝑧𝛿𝑦subscript𝑣𝑦𝑥𝑦𝑧𝛿𝑎s^{\prime}=(x^{\prime},y^{\prime},z^{\prime})=[x+\vec{v}_{x}(x,y,z)\delta,y+% \vec{v}_{y}(x,y,z)\delta,a],italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = [ italic_x + over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x , italic_y , italic_z ) italic_δ , italic_y + over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x , italic_y , italic_z ) italic_δ , italic_a ] , (1)

where vxsubscript𝑣𝑥\vec{v}_{x}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and vysubscript𝑣𝑦\vec{v}_{y}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are the components of the velocity vector v𝑣\vec{v}over→ start_ARG italic_v end_ARG along x𝑥xitalic_x and y𝑦yitalic_y respectively. If s𝒮superscript𝑠𝒮s^{\prime}\not\in\mathcal{S}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∉ caligraphic_S, the vehicle transitions to a final state associated with a negative reward, as discussed above. Accordingly, the probability of transitioning to state (x,y,z)𝒮superscript𝑥superscript𝑦superscript𝑧𝒮(x^{\prime},y^{\prime},z^{\prime})\in\mathcal{S}( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_S from state (x,y,z)𝒮𝑥𝑦𝑧𝒮(x,y,z)\in\mathcal{S}( italic_x , italic_y , italic_z ) ∈ caligraphic_S with action a𝒜(s)𝑎𝒜𝑠a\in\mathcal{A}(s)italic_a ∈ caligraphic_A ( italic_s ) can be computed as

P((x,y,z)|(x,y,z),a)=𝑃conditionalsuperscript𝑥superscript𝑦superscript𝑧𝑥𝑦𝑧𝑎absent\displaystyle P((x^{\prime},y^{\prime},z^{\prime})|(x,y,z),a)=italic_P ( ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ( italic_x , italic_y , italic_z ) , italic_a ) = (2)
1z=aζ:(x,y,ζ)𝒮P(v(x,y,z)=[(xx)/δ,(yy)/δ,ζ])𝑑ζ,subscript1superscript𝑧𝑎subscript:𝜁𝑥𝑦𝜁𝒮𝑃𝑣𝑥𝑦𝑧superscript𝑥𝑥𝛿superscript𝑦𝑦𝛿𝜁differential-d𝜁\displaystyle\quad 1_{z^{\prime}=a}\cdot\int_{\mathrlap{\zeta:(x,y,\zeta)\in% \mathcal{S}}}P(\vec{v}(x,y,z)=[(x^{\prime}-x)/\delta,(y^{\prime}-y)/\delta,% \zeta])d\zeta,1 start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_a end_POSTSUBSCRIPT ⋅ ∫ start_POSTSUBSCRIPT italic_ζ : ( italic_x , italic_y , italic_ζ ) ∈ caligraphic_S end_POSTSUBSCRIPT italic_P ( over→ start_ARG italic_v end_ARG ( italic_x , italic_y , italic_z ) = [ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x ) / italic_δ , ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y ) / italic_δ , italic_ζ ] ) italic_d italic_ζ ,

that is, the probability of transitioning from (x,y,z)𝑥𝑦𝑧(x,y,z)( italic_x , italic_y , italic_z ) to (x,y,z)superscript𝑥superscript𝑦superscript𝑧(x^{\prime},y^{\prime},z^{\prime})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) equals the probability of encountering a flow velocity v𝑣\vec{v}over→ start_ARG italic_v end_ARG such that vxδ=xxsubscript𝑣𝑥𝛿superscript𝑥𝑥\vec{v}_{x}\delta=x^{\prime}-xover→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_δ = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x and vyδ=yysubscript𝑣𝑦𝛿superscript𝑦𝑦\vec{v}_{y}\delta=y^{\prime}-yover→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_δ = italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y if the commanded depth is a=z𝑎superscript𝑧a=z^{\prime}italic_a = italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and is zero otherwise.

Rewards

Each state-action pair is associated with a reward that captures the energy cost of the action undertaken. The energy cost consists of a constant term ehsubscript𝑒e_{h}italic_e start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT that captures the “hotel load” required by the vehicle for non-propulsion purposes (e.g., computing and localization), and a variable term ebsubscript𝑒𝑏e_{b}italic_e start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT that captures the energy used by the buoyancy control mechanism. The buoyancy control mechanism consumes virtually no energy to descend (as water pressure is used to force oil from the external bladder to the pressure vessel after a valve is opened); in contrast, when the vehicle ascends, a pump works against the water pressure. In this paper, we adopt a simple model where the energy cost is proportional to the desired change in depth with proportional constant αbsubscript𝛼𝑏\alpha_{b}italic_α start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT; the adoption of a more sophisticated energy model is an interesting direction for future research. Formally, the reward for the state-action pair (x,y,z)𝒮,a𝒜(s)formulae-sequence𝑥𝑦𝑧𝒮𝑎𝒜𝑠(x,y,z)\in\mathcal{S},a\in\mathcal{A}(s)( italic_x , italic_y , italic_z ) ∈ caligraphic_S , italic_a ∈ caligraphic_A ( italic_s ) is

r((x,y,z),a)=eh+αbmax(az,0).𝑟𝑥𝑦𝑧𝑎subscript𝑒subscript𝛼𝑏𝑎𝑧0r((x,y,z),a)=e_{h}+\alpha_{b}\cdot\max(a-z,0).italic_r ( ( italic_x , italic_y , italic_z ) , italic_a ) = italic_e start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⋅ roman_max ( italic_a - italic_z , 0 ) .

Discussion

A few comments are in order. First, the proposed approach strongly relies on the cavity flow model to capture the probabilistic distribution of the flow dynamics. Accordingly, a representative model that is able to characterize both the likely under-ice flow and its variability is critical to achieve good performance. Second, the approach does not exploit spatial or temporal correlations in the model: that is, knowledge of the flow encountered by the vehicle at one location is not used to update transition probabilities at other nearby locations, and seasonal effects are averaged out. While this choice helps avoid overfitting the model output, exploiting spatial and temporal correlations in a principled way can help improve performance, and is a highly promising direction for future research. Third, we use a simple one-step integration scheme to compute transitions in Equation (1). A more refined integration scheme that updates vertical velocities within the time step may offer additional fidelity, and is an interesting direction for future research. Finally, we use a simple model for energy costs; we remark that the proposed modeling approach can accommodate arbitrarily sophisticated energy models with no structural changes.

IV-B Approximate Dynamic Programming Solution

We are now in a position to solve the continuous MDP through approximate dynamic programming (ADP). We discretize the state space in a discrete set of states S~~𝑆\tilde{S}over~ start_ARG italic_S end_ARG forming a uniform lattice. We remark that the ADP discretization needs not correspond to the discretization used in the cavity model.

For each state s~𝒮~~𝑠~𝒮\tilde{s}\in\tilde{\mathcal{S}}over~ start_ARG italic_s end_ARG ∈ over~ start_ARG caligraphic_S end_ARG, the optimal value of the state (i.e., the optimal discounted expected reward that an agent will obtain when departing from that state) can be computed through the Bellman equation as

V(s~)=maxa𝒜(s~)(r(s~,a)+γ𝔼sP(s|s~,a)[V(s)]),superscript𝑉~𝑠subscript𝑎𝒜~𝑠𝑟~𝑠𝑎𝛾subscript𝔼similar-tosuperscript𝑠𝑃conditionalsuperscript𝑠~𝑠𝑎delimited-[]superscript𝑉superscript𝑠V^{\star}(\tilde{s})=\max_{a\in\mathcal{A}(\tilde{s})}\left(r(\tilde{s},a)+% \gamma\mathbb{E}_{s^{\prime}\sim P(s^{\prime}|\tilde{s},a)}\!\left[V^{\star}(s% ^{\prime})\right]\right),italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( over~ start_ARG italic_s end_ARG ) = roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A ( over~ start_ARG italic_s end_ARG ) end_POSTSUBSCRIPT ( italic_r ( over~ start_ARG italic_s end_ARG , italic_a ) + italic_γ blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_P ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | over~ start_ARG italic_s end_ARG , italic_a ) end_POSTSUBSCRIPT [ italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ) , (3)

and the optimal action for state s~~𝑠\tilde{s}over~ start_ARG italic_s end_ARG is

a(s~)=argmaxa𝒜(s~)(r(s~,a)+γ𝔼sP(s|s~,a)[V(s)]),superscript𝑎~𝑠subscript𝑎𝒜~𝑠𝑟~𝑠𝑎𝛾subscript𝔼similar-tosuperscript𝑠𝑃conditionalsuperscript𝑠~𝑠𝑎delimited-[]superscript𝑉superscript𝑠a^{\star}(\tilde{s})=\arg\max_{a\in\mathcal{A}(\tilde{s})}\left(r(\tilde{s},a)% +\gamma\mathbb{E}_{s^{\prime}\sim P(s^{\prime}|\tilde{s},a)}\!\left[V^{\star}(% s^{\prime})\right]\right),italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( over~ start_ARG italic_s end_ARG ) = roman_arg roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A ( over~ start_ARG italic_s end_ARG ) end_POSTSUBSCRIPT ( italic_r ( over~ start_ARG italic_s end_ARG , italic_a ) + italic_γ blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_P ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | over~ start_ARG italic_s end_ARG , italic_a ) end_POSTSUBSCRIPT [ italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ) , (4)

where γ(0,1)𝛾01\gamma\in(0,1)italic_γ ∈ ( 0 , 1 ) is the discount factor.

The value of states sS~superscript𝑠~𝑆s^{\prime}\not\in\tilde{S}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∉ over~ start_ARG italic_S end_ARG is computed by linearly interpolating the values of states in S~~𝑆\tilde{S}over~ start_ARG italic_S end_ARG. Recall that the states in S~~𝑆\tilde{S}over~ start_ARG italic_S end_ARG form a regular lattice. Denote as 𝒩~(s)S~~𝒩superscript𝑠~𝑆\tilde{\mathcal{N}}(s^{\prime})\subset\tilde{S}over~ start_ARG caligraphic_N end_ARG ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⊂ over~ start_ARG italic_S end_ARG the states that form the vertices of the lattice cell that contains ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then the optimal value of ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is approximated as

V(s)=s~𝒩~(s)λs~V(s~), wheresuperscript𝑉superscript𝑠subscript~𝑠~𝒩superscript𝑠subscript𝜆~𝑠superscript𝑉~𝑠 where\displaystyle V^{\star}(s^{\prime})=\sum_{\tilde{s}\in\tilde{\mathcal{N}}(s^{% \prime})}\lambda_{\tilde{s}}V^{\star}(\tilde{s}),\quad\text{ where}italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_s end_ARG ∈ over~ start_ARG caligraphic_N end_ARG ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_s end_ARG end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( over~ start_ARG italic_s end_ARG ) , where (5a)
λs~1ss~s~𝒩~(s)ands~𝒩~λs~=1formulae-sequenceproportional-tosubscript𝜆~𝑠1normsuperscript𝑠~𝑠formulae-sequencefor-all~𝑠~𝒩superscript𝑠andsubscript~𝑠~𝒩subscript𝜆~𝑠1\displaystyle\lambda_{\tilde{s}}\propto\frac{1}{\|s^{\prime}-\tilde{s}\|}\quad% \forall\tilde{s}\in\tilde{\mathcal{N}}(s^{\prime})\quad\text{and}\quad\sum_{% \tilde{s}\in\tilde{\mathcal{N}}}\lambda_{\tilde{s}}=1italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_s end_ARG end_POSTSUBSCRIPT ∝ divide start_ARG 1 end_ARG start_ARG ∥ italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - over~ start_ARG italic_s end_ARG ∥ end_ARG ∀ over~ start_ARG italic_s end_ARG ∈ over~ start_ARG caligraphic_N end_ARG ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and ∑ start_POSTSUBSCRIPT over~ start_ARG italic_s end_ARG ∈ over~ start_ARG caligraphic_N end_ARG end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_s end_ARG end_POSTSUBSCRIPT = 1 (5b)

Equations (3)-(5) are solved via value iteration, yielding an optimal policy for under-ice guidance that provides an optimal action a(s~)superscript𝑎~𝑠a^{\star}(\tilde{s})italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( over~ start_ARG italic_s end_ARG ) for every state s~𝒮~~𝑠~𝒮\tilde{s}\in\tilde{\mathcal{S}}over~ start_ARG italic_s end_ARG ∈ over~ start_ARG caligraphic_S end_ARG. For states not in 𝒮~~𝒮\tilde{\mathcal{S}}over~ start_ARG caligraphic_S end_ARG, a nearest-neighbor approach is used whereby the policy corresponding to the closest state in 𝒮~~𝒮\tilde{\mathcal{S}}over~ start_ARG caligraphic_S end_ARG is used.

IV-C State uncertainty: a QMDP approach

The policy computed in Section IV-B requires perfect knowledge of the location of the IceNode. In contrast, the location of underwater vehicles typically presents a significant degree of uncertainty. IceNodes can estimate their location through acoustic multilateration from moored sound sources placed at the shelf edge; the technique yields uncertainties on the order of 0.50.50.50.5 km in the radial direction and D/40𝐷40D/40italic_D / 40 in the azimuthal direction from the moored buoys, where D𝐷Ditalic_D is the distance from the buoy [9].

To address this uncertainty, we propose using the QMDP algorithm [36], which is well-suited for the embedded, power-constrained IceNode platform due to its modest computational requirements. Intuitively, for a given belief over the vehicle location, QMDP selects the action that yields the best expected value, where the expectation is taken over the states where the vehicle may be. Rigorously, let the vehicle’s belief over its location be the probability distribution \mathcal{B}caligraphic_B. Let the set of available actions be

𝒜()=s𝒮:(s)>0𝒜(s)𝒜subscript:𝑠𝒮𝑠0𝒜𝑠\mathcal{A}(\mathcal{B})=\bigcap_{s\in\mathcal{S}:\mathcal{B}(s)>0}\mathcal{A}% (s)caligraphic_A ( caligraphic_B ) = ⋂ start_POSTSUBSCRIPT italic_s ∈ caligraphic_S : caligraphic_B ( italic_s ) > 0 end_POSTSUBSCRIPT caligraphic_A ( italic_s )

Then, we select the action for belief \mathcal{B}caligraphic_B as:

a()=argmaxa𝒜()𝔼s[r(s,a)+γ𝔼sP(s|s,a)[V(s)]],superscript𝑎subscript𝑎𝒜subscript𝔼similar-to𝑠delimited-[]𝑟𝑠𝑎𝛾subscript𝔼similar-tosuperscript𝑠𝑃conditionalsuperscript𝑠𝑠𝑎delimited-[]superscript𝑉superscript𝑠a^{\star}(\mathcal{B})=\arg\max_{a\in\mathcal{A}(\mathcal{B})}\mathbb{E}_{s% \sim\mathcal{B}}\left[r(s,a)+\gamma\mathbb{E}_{s^{\prime}\sim P(s^{\prime}|s,a% )}\left[V^{\star}(s^{\prime})\right]\right],italic_a start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( caligraphic_B ) = roman_arg roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A ( caligraphic_B ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_s ∼ caligraphic_B end_POSTSUBSCRIPT [ italic_r ( italic_s , italic_a ) + italic_γ blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_P ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_s , italic_a ) end_POSTSUBSCRIPT [ italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] ] , (6)

where V(s)superscript𝑉superscript𝑠V^{\star}(s^{\prime})italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is computed according to (3).

We remark that evaluating the optimal policy (6) requires minimal computational effort, since the optimal state values Vsuperscript𝑉V^{\star}italic_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT can be pre-computed and stored: therefore, the proposed approach is well-suited for on-board guidance of vehicles with highly limited computation resources.

A key limitation of the QMDP formulation is that it assumes that all uncertainty will disappear at the next time step: hence, the approach is unable to perform information-gathering actions (e.g., improving localization by steering towards areas where the flow is well-characterized, and then comparing the actual motion experienced by the vehicle with the model). An interesting direction for future research will encompass the use of more sophisticated POMDP algorithms such as Monte Carlo Tree Search [37] to assess the effectiveness of such information-gathering actions.

V Numerical Experiments

We characterize the performance of the proposed approach through numerical simulations. Due to space constraints, we focus our analysis on the problem of reaching the grounding zone; the dual problem of egress from the grounding zone to open sea will be the subject of future studies. We solve the ADP problem (3)-(5) on a lattice with a stride of 840×840×2584084025840\times 840\times 25840 × 840 × 25 m, which we empirically found to provide a good balance between computational and storage cost and policy performance. To compute the transition probabilities, we use 20% of all available time steps (i.e., 1752 steps, or one step every five hours), to capture the fact that model knowledge may not perfectly reproduce the actual flow, especially for what concerns short-term and small-scale dynamics. The resulting state values and optimal policy are shown in Figures 3 and 4 for one selected depth.

Refer to caption
Figure 3: MDP problem state value for z=500𝑧500z=-500italic_z = - 500 m. The color of each location denotes the expected discounted reward obtained when following the optimal policy from that location.
Refer to caption
Figure 4: MDP problem policy for z=500𝑧500z=-500italic_z = - 500 m. Color denotes the change in depth prescribed by the optimal policy.
Refer to caption
Refer to caption
Figure 5: Policy rollouts and time required to reach the landing zone for successful rollouts. For each policy, 500 IceNode trajectories are simulated. The color of the trajectory shows the change in depth, either through a control action or vertical forcing due to current: yellow corresponds to an ascent, blue captures constant-depth drifting, and cyan shows a descent. Red dots show the vehicles’ final locations.

We compare the performance of the proposed QMDP policy with three other policies:

  • an uncontrolled policy where the vehicle drifts in the current with no active buoyancy control;

  • the state-of-the-art constant depth fraction policy implemented by the 2019 APL-UW Dotson Ice Shelf EM-APEX campaign [9], where the vehicle controls its buoyancy to float at a depth corresponding to 75% of the cavity depth. In the implementation, we assume that the vehicle has perfect knowledge of its location and the seafloor and basal ice depth, resulting in an upper bound on the effectiveness of the policy.

  • the MDP policy where the vehicle follows Equation (4) with perfect knowledge of its location. The MDP policy represents an upper bound on the performance of the QMDP policy, and it allows us to quantitatively assess the value of knowledge about the vehicle’s position.

For each policy, we perform 500 rollouts. In each rollout, we pick a random initial time and let the cavity flow (and the vehicle position) evolve according to the MITgcm model from that time onwards. The simulation uses all available time steps, capturing shorter-term dynamics that are not available to the MDP model. Vehicles that have not reached the grounding zone after three months are assumed to be lost. All rollouts begin at a manually-selected starting location that mimics state-of-the-art deployment strategies for under-ice vehicles. Specifically, the starting location and depth are selected to be close to the inlet of the cavity, in the region where the most robust inflow current exists, maximizing the likelihood that the vehicle will be dragged deep beneath the shelf. For the QMDP policy, the vehicle’s belief about its location follows a Gaussian distribution with σx=σy=1000subscript𝜎𝑥subscript𝜎𝑦1000\sigma_{x}\!=\!\sigma_{y}\!=\!1000italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = 1000 m and σz=3subscript𝜎𝑧3\sigma_{z}\!=\!3italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = 3 m, which is consistent with the localization performance demonstrated by the EM-APEX campaign [9].

Results are shown in Figure 5 and in Table I.

TABLE I: Performance of underwater guidance policies.
Reached Time to GZ [h]
grounding zone Median Std. dev
Uncontrolled 33.8% 725 435
Const. depth fraction 66.6% 890 383
MDP 95.4% 517 300
QMDP 88.8% 702 325

The proposed QMDP policy is able to deliver close to 90% of all vehicles to the landing zone - a performance well in excess of the state-of-the-art constant depth fraction policy’s, and over 2.5 times as good as the uncontrolled policy. The proposed approach also delivers IceNodes to the grounding zone 26%, or eight days, faster than the constant depth fraction policy, resulting in increased science returns. Imperfect position knowledge results in an 6.6% reduction in the success rate of the proposed guidance policy, and a 36% increase in the median navigation time, compared to the MDP policy; this motivates the study of model-based localization techniques to further reduce position uncertainty and approach the performance of the MDP policy.

Figure 5 shows the trajectories produced by the four policies. The MDP policy sharply exploits the structured nature of under-ice currents, and the resulting trajectories are highly clustered around two favorable sets of paths. Remarkably, the trajectories produced by the QMDP policy present a similar qualitative distribution; however, position uncertainty results in several IceNodes being swept out to sea or in side cavities. The constant depth fraction policy is highly effective at delivering vehicles under the ice shelf; however, only a fraction of the vehicles make it to the grounding zone, whereas many more are swept to side cavities or grounded against the sides of the cavity. Finally, despite the selection of a favorable starting location, the uncontrolled policy is only marginally effective at delivering vehicles to the grounding zone, with the majority of IceNodes adrift, lost to side cavities, or swept to sea.

VI Conclusions

We presented a novel approach for guidance of buoyancy-controlled vehicles under ice shelves in uncertain ocean currents. The proposed technique estimates the probabilistic distribution of ocean currents by leveraging numerical simulations of the ice cavity flow, and it can cope with realistic uncertainty in the vehicle’s localization estimate. Numerical simulations show that the technique significantly outperforms existing under-ice guidance techniques, and holds promise to allow reliable and cost-effective access to ice shelf grounding zones, which hold the key to better understanding ice shelf melt rates and improving predictions of future sea level rise.

A number of directions for future research are of interest. First, we plan to further extend the approach to capture the effect of bathymetry uncertainty. Bathymetry uncertainty introduces two sources of error: first, regions that are assumed to be navigable may be occupied by ice or rock, and vice versa; second, uncertainty in the bathymetry profile induces significant uncertainty in the currents, especially near the boundaries. We will quantify both effects by leveraging numerical simulations on reduced-resolution cavity models, and incorporate these sources of uncertainty in the MDP model to mitigate their impact on the policy’s performance. Second, we will further explore partially observable MDP approaches, and assess whether IceNode’s observations can be used to improve the knowledge of its location by exploiting the cavity flow model. Third, we will consider reinforcement learning approaches where the IceNode’s observations are used to improve the flow field model during navigation, leveraging spatial and temporal correlations in the flow field. To support this, we will consider fast online algorithms to re-solve the guidance problem on board the vehicle with minimal energy and time expenditure. Finally, we plan to validate the approach through field tests in open ocean, first using a virtually injected ice shelf, and later during field deployments beneath real-world ice shelves.

Acknowledgements

Part of this work was carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). We gratefully acknowledge support from the NASA Cryospheric Sciences Programs. High-end computing resources were provided by the NASA Advanced Supercomputing (NAS) Division of the Ames Research Center.

References

  • [1] C. Harig and F. J. Simons, “Accelerated West Antarctic ice mass loss continues to outpace east antarctic gains,” Earth Planet. Sci. Lett., vol. 415, pp. 134–141, 2015.
  • [2] M. Oppenheimer, B. Glavovic, J. Hinkel et al., “Sea level rise and implications for low lying islands, coasts and communities,” in IPCC Special Report on the Ocean and Cryosphere in a Changing Climate, H.-O. Pörtner, D. Roberts, V. Masson-Delmotte et al., Eds.   The Intergovernmental Panel on Climate Change, Jun. 2019, ch. 4.
  • [3] J. A. Church, P. U. Clark, A. Cazenave et al., “Climate change 2013: the physical science basis. contribution of working group I to the fifth assessment report of the intergovernmental panel on climate change,” Sea level change, pp. 1137–1216, 2013.
  • [4] S. Adusumilli, H. A. Fricker, B. Medley, L. Padman, and M. R. Siegfried, “Interannual variations in meltwater input to the southern ocean from antarctic ice shelves,” Nature geoscience, vol. 13, no. 9, pp. 616–620, 2020.
  • [5] T. P. Stanton, W. J. Shaw, M. Truffer, H. F. J. Corr, L. E. Peters, K. L. Riverman, R. Bindschadler, D. M. Holland, and S. Anandakrishnan, “Channelized ice melting in the ocean boundary layer beneath pine island glacier, antarctica,” Science, vol. 341, no. 6151, pp. 1236–1239, 2013.
  • [6] P. Davis, K. Nicholls, and D. Holland, “Turbulence observations in the grounding zone region of thwaites glacier,” in EGU General Assembly 2020, May 2020, p. 50.
  • [7] A. F. Thompson, J. Willis, and A. Payne, “The slee** giant: Measuring Ocean-Ice interactions in antarctica,” Keck Institute for Space Studies, Pasadena, CA, Tech. Rep., Dec. 2015.
  • [8] T. B. Sanford, J. H. Dunlap, J. A. Carlson, D. C. Webb, and J. B. Girton, “Autonomous velocity and density profiler: EM-APEX,” in IEEE/OES Working Conference on Current Measurement Technology., Jun. 2005, pp. 152–156.
  • [9] J. B. Girton, K. Christianson, J. Dunlap, P. Dutrieux, J. Gobat, C. Lee, and L. Rainville, “Buoyancy-adjusting profiling floats for exploration of heat transport, melt rates, and mixing in the ocean cavities under floating ice shelves,” in OCEANS 2019 MTS/IEEE SEATTLE, Oct. 2019, pp. 1–6.
  • [10] S. D. McPhail, M. E. Furlong, M. Pebody, J. R. Perrett, P. Stevenson, A. Webb, and D. White, “Exploring beneath the PIG ice shelf with the Autosub3 AUV,” in OCEANS 2009-EUROPE, May 2009, pp. 1–8.
  • [11] D. Davies, R. G. Bingham, A. G. C. Graham, M. Spagnolo, P. Dutrieux, D. G. Vaughan, A. Jenkins, and F. O. Nitsche, “High‐resolution sub‐ice‐shelf seafloor records of twentieth century ungrounding and retreat of Pine Island Glacier, West Antarctica,” J. Geophys. Res. Earth Surf., vol. 122, no. 9, pp. 1698–1714, 2017.
  • [12] S. McPhail, R. Templeton, M. Pebody, D. Roper, and R. Morrison, “Autosub long range AUV missions under the Filchner and Ronne ice shelves in the Weddell Sea, Antarctica - an engineering perspective,” in OCEANS 2019 - Marseille, Jun. 2019, pp. 1–8.
  • [13] C. Lee, L. Rainville, J. I. Gobat, J. B. Girton, P. Dutrieux, K. A. Christianson, T. W. Kim, and S. H. Lee, “Sustained, autonomous observations beneath ice shelves,” in AGU Fall Meeting, Dec. 2018.
  • [14] J. Lawrence, B. Schmidt, P. Washam et al., “ROV Icefin at Ross ice shelf grounding zone: 5 km of ice, ocean, seafloor, and crevasse exploration,” in AGU Fall Meeting, Dec. 2020.
  • [15] B. E. Schmidt, P. Washam, P. E. D. Davis et al., “Melting at the grounding zone of Thwaites Glacier observed by Icefin,” in AGU Fall Meeting, Dec. 2020.
  • [16] D. Rao and S. B. Williams, “Large-scale path planning for underwater gliders in ocean currents,” in Australasian Conference on Robotics and Automation (ACRA), 2009, pp. 2–4.
  • [17] D. R. Thompson, S. Chien, Y. Chao et al., “Spatiotemporal path planning in strong, dynamic, uncertain currents,” in 2010 IEEE Int. Conf. Robotics and Automation, May 2010, pp. 4778–4783.
  • [18] E. B. Clark, A. Branch, S. Chien et al., “Station-Kee** underwater gliders using a predictive ocean circulation model and applications to SWOT calibration and validation,” IEEE J. Oceanic Eng., pp. 1–14, 2019.
  • [19] M. Troesch, S. Chien, Y. Chao, J. Farrara, J. Girton, and J. Dunlap, “Autonomous control of marine floats in the presence of dynamic, uncertain ocean currents,” Rob. Auton. Syst., vol. 108, pp. 100–114, 2018.
  • [20] A. A. Pereira, J. Binney, G. A. Hollinger, and G. S. Sukhatme, “Risk-aware path planning for autonomous underwater vehicles using predictive ocean models,” J. Field Robotics, vol. 30, no. 5, pp. 741–762, 2013.
  • [21] K. P. Dahl, D. R. Thompson, D. McLaren, Y. Chao, and S. Chien, “Current-sensitive path planning for an underactuated free-floating ocean sensorweb,” in 2011 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2011, pp. 3140–3146.
  • [22] M. G. Bellemare, S. Candido, P. S. Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang, “Autonomous navigation of stratospheric balloons using reinforcement learning,” Nature, vol. 588, no. 7836, pp. 77–82, 2020.
  • [23] M. T. Wolf, L. Blackmore, Y. Kuwata, N. Fathpour, A. Elfes, and C. Newman, “Probabilistic motion planning of balloons in strong, uncertain wind fields,” in 2010 IEEE Int. Conf. Robotics and Automation, 2010, pp. 1123–1129.
  • [24] N. Fathpour, L. Blackmore, Y. Kuwata, C. Assad, M. T. Wolf, C. Newman, A. Elfes, and K. Reh, “Feasibility studies on guidance and global path planning for wind-assisted montgolfière in Titan,” IEEE Systems Journal, vol. 8, no. 4, pp. 1112–1125, 2014.
  • [25] M. Losch, D. Menemenlis, J.-M. Campin, P. Heimbach, and C. Hill, “On the formulation of sea-ice models. Part 1: effects of different solver implementations and parameterizations,” Ocean Modelling, vol. 33(1–2), p. 129–144, 2010.
  • [26] H. H. Hellmer and D. J. Olbers, “A two-dimensional model of the thermohaline circulation under an ice shelf,” Antarct. Sci., vol. 1(4), p. 325–336, 1989.
  • [27] A. Jenkins, H. H. Hellmer, and D. M. Holland, “The role of meltwater advection in the formulation of conservative boundary conditions at an ice-ocean interface,” J. Phys. Oceanogr., vol. 31, p. 285–296, 2001.
  • [28] D. Menemenlis, J.-M. Campin, P. Heimbach, C. Hill, T. Lee, A. Nguyen, M. Schodlok, and H. Zhang, “ECCO2: High resolution global ocean and sea ice data synthesis,” Mercator Ocean Quarterly Newsletter, vol. 31, 2008.
  • [29] M. Morlighem, E. Rignot, T. Binder et al., “Deep glacial troughs and stabilizing ridges unveiled beneath the margins of the Antarctic ice sheet,” Nature Geoscience, vol. 13, pp. 132–137, 2020.
  • [30] D. Dee, S. Uppala, A. Simmons et al., “The ERA-Interim reanalysis: configuration and performance of the data assimilation system,” Q.J.R. Meteorol. Soc., vol. 137, p. 553–597, 2011.
  • [31] M. P. Schodlok, D. Menemenlis, E. Rignot, and M. Studinger, “Sensitivity of the ice‐shelf/ocean system to the sub‐ice‐shelf cavity shape measured by NASA IceBridge in Pine Island Glacier, West Antarctica,” Ann. Glaciol., vol. 53(60), p. 156–162, 2012.
  • [32] M. P. Schodlok, D. Menemenlis, and E. Rignot, “Ice shelf basal melt rates around Antarctica from simulations and observations,” J. Geophys. Res., vol. 120, 2016.
  • [33] A. Khazendar, M. Schodlok, I. Fenty, S. Ligtenberg, E. Rignot, and M. van den Broeke, “Observed thinning of Totten glacier is linked to coastal polynya variability,” Nat. Comm., vol. 4, 2013.
  • [34] A. Arakawa and V. R. Lamb, “Computational design of the basic dynamical processes of the UCLA general circulation model,” General circulation models of the atmosphere, vol. 17, no. Supplement C, pp. 173–265, 1977.
  • [35] D. P. Bertsekas, Dynamic programming and optimal control, 4th ed.   Athena Scientific, 2012, vol. 2.
  • [36] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, “Learning policies for partially observable environments: Scaling up,” in Int. Conf. Machine Learning.   San Francisco, CA: Elsevier, 1995, pp. 362–370.
  • [37] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo tree search methods,” Ieee Trans. Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43, 2012.