Deep Learning of Multivariate Extremes
via a Geometric Representation


Callum J. R. Murphy-Barltrop1,2∗, Reetam Majumder3, Jordan Richards4

11footnotetext: Technische Universität Dresden, Institut Für Mathematische Stochastik, Dresden, Germany 22footnotetext: Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany 33footnotetext: Southeast Climate Adaptation Science Center, North Carolina State University, USA 44footnotetext: School of Mathematics, University of Edinburgh, UK
\;{}^{*}start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPTCorresponding author: [email protected]

June 28, 2024

Abstract

The study of geometric extremes, where extremal dependence properties are inferred from the deterministic limiting shapes of scaled sample clouds, provides an exciting approach to modelling the extremes of multivariate data. These shapes, termed limit sets, link together several popular extremal dependence modelling frameworks. Although the geometric approach is becoming an increasingly popular modelling tool, current inference techniques are limited to a low dimensional setting (d3𝑑3d\leq 3italic_d ≤ 3), and generally require rigid modelling assumptions. In this work, we propose a range of novel theoretical results to aid with the implementation of the geometric extremes framework and introduce the first approach to modelling limit sets using deep learning. By leveraging neural networks, we construct asymptotically-justified yet flexible semi-parametric models for extremal dependence of high-dimensional data. We showcase the efficacy of our deep approach by modelling the complex extremal dependencies between meteorological and oceanographic variables in the North Sea off the coast of the UK.

Keywords: extremal dependence; geometric extremes; limit sets; neural networks

1 Introduction

Multivariate extreme value theory is a branch of statistics that deals with understanding relationships between the extremes of multiple variables. A wide variety of modelling approaches for multivariate extremes (or, equivalently, extremal dependence) have been proposed; classical approaches use the framework of regular variation (see, e.g., Tawn,, 1990; Rootzén and Tajvidi,, 2006; Einmahl and Segers,, 2009), but these models are restrictive in the forms of extremal dependence that they can capture (Huser et al.,, 2024). In particular, they can only capture asymptotic dependence (Coles et al.,, 1999), where extremes of a random vector occur when all of its components are jointly large. Assuming this form of extremal dependence for data is unrealistic in many applications, and numerous works have advocated against the use of regular variation models for environmental studies (Opitz,, 2016; Dawkins and Stephenson,, 2018; Huser et al.,, 2024).

The first approach to modelling non-asymptotically dependent data was provided by Ledford and Tawn, (1996, 1997); see, also, hidden regular variation (Resnick,, 2002). Letting 𝑿E:=(XE,1,,XE,d)assignsubscript𝑿𝐸subscript𝑋𝐸1subscript𝑋𝐸𝑑\bm{X}_{E}:=(X_{E,1},\dots,X_{E,d})bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT := ( italic_X start_POSTSUBSCRIPT italic_E , 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_E , italic_d end_POSTSUBSCRIPT ) denote a random vector with standard exponential margins, Ledford and Tawn, (1996) assume that, as u𝑢u\to\inftyitalic_u → ∞,

Pr(mini𝒟{XE,i}>u)=L(eu)exp{u/η},Prsubscript𝑖𝒟subscript𝑋𝐸𝑖𝑢𝐿superscript𝑒𝑢𝑢𝜂\Pr\left(\min_{i\in\mathcal{D}}\{X_{E,i}\}>u\right)=L(e^{u})\exp\{-u/\eta\},roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT } > italic_u ) = italic_L ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) roman_exp { - italic_u / italic_η } , (1)

where 𝒟:={1,,d}assign𝒟1𝑑\mathcal{D}:=\{1,\ldots,d\}caligraphic_D := { 1 , … , italic_d }, L()𝐿L(\cdot)italic_L ( ⋅ ) is a slowly varying function, i.e., limuL(cu)/L(u)=1subscript𝑢𝐿𝑐𝑢𝐿𝑢1\lim_{u\to\infty}L(cu)/L(u)=1roman_lim start_POSTSUBSCRIPT italic_u → ∞ end_POSTSUBSCRIPT italic_L ( italic_c italic_u ) / italic_L ( italic_u ) = 1 for any constant c>0𝑐0c>0italic_c > 0, and η(0,1]𝜂01\eta\in(0,1]italic_η ∈ ( 0 , 1 ] is termed the coefficient of tail dependence. Under asymptotic dependence, we have η=1𝜂1\eta=1italic_η = 1 and limuL(u)>0subscript𝑢𝐿𝑢0\lim_{u\to\infty}L(u)>0roman_lim start_POSTSUBSCRIPT italic_u → ∞ end_POSTSUBSCRIPT italic_L ( italic_u ) > 0, with other extremal dependence structures arising when these conditions are not satisfied. In a practical setting, model (1) is only applicable when all components of 𝑿Esubscript𝑿𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT are large simultaneously. To overcome this limitation, Wadsworth and Tawn, (2013) introduced the angular dependence function (ADF), which generalises the coefficient η𝜂\etaitalic_η. Consider any angle 𝒘:=(w1,,wd)T𝒮+d1assign𝒘superscriptsubscript𝑤1subscript𝑤𝑑𝑇subscriptsuperscript𝒮𝑑1\bm{w}:=(w_{1},\dots,w_{d})^{T}\in\mathcal{S}^{d-1}_{+}bold_italic_w := ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT where 𝒮+d1:={𝒙+d:𝒙=1}assignsuperscriptsubscript𝒮𝑑1conditional-set𝒙superscriptsubscript𝑑norm𝒙1\mathcal{S}_{+}^{d-1}:=\{\bm{x}\in\mathbb{R}_{+}^{d}:||\bm{x}||=1\}caligraphic_S start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT := { bold_italic_x ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : | | bold_italic_x | | = 1 } is the strictly positive part of the unit (d1)𝑑1(d-1)( italic_d - 1 )-sphere with \|\cdot\|∥ ⋅ ∥ the Euclidean norm; Wadsworth and Tawn, (2013) assume that

Pr(mini𝒟{XE,i/wi}>u)=L(eu;𝒘)eλ(𝒘)u,λ(𝒘)max(𝒘),formulae-sequencePrsubscript𝑖𝒟subscript𝑋𝐸𝑖subscript𝑤𝑖𝑢𝐿superscript𝑒𝑢𝒘superscript𝑒𝜆𝒘𝑢𝜆𝒘𝒘\Pr\left(\min_{i\in\mathcal{D}}\{X_{E,i}/w_{i}\}>u\right)=L(e^{u};\bm{w})e^{-% \lambda(\bm{w})u},\;\;\lambda(\bm{w})\geq\max(\bm{w}),roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) = italic_L ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ; bold_italic_w ) italic_e start_POSTSUPERSCRIPT - italic_λ ( bold_italic_w ) italic_u end_POSTSUPERSCRIPT , italic_λ ( bold_italic_w ) ≥ roman_max ( bold_italic_w ) , (2)

as u𝑢u\to\inftyitalic_u → ∞, where L(;𝒘)𝐿𝒘L(\cdot\;;\bm{w})italic_L ( ⋅ ; bold_italic_w ) is a slowly varying function and λ()𝜆\lambda(\cdot)italic_λ ( ⋅ ) denotes the ADF; the latter provides information about the joint tail of 𝑿Esubscript𝑿𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT, and we have η={dλ(d1/2,,d1/2)}1𝜂superscript𝑑𝜆superscript𝑑12superscript𝑑121\eta=\{\sqrt{d}\lambda(d^{-1/2},\ldots,d^{-1/2})\}^{-1}italic_η = { square-root start_ARG italic_d end_ARG italic_λ ( italic_d start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT , … , italic_d start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Model (2) can capture both extremal dependence regimes, with asymptotic dependence implying the lower bound, λ(𝒘)=max(𝒘)𝜆𝒘𝒘\lambda(\bm{w})=\max(\bm{w})italic_λ ( bold_italic_w ) = roman_max ( bold_italic_w ) for all 𝒘𝒮+d1𝒘subscriptsuperscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Loosely speaking, the angle 𝒘𝒘\bm{w}bold_italic_w is the direction in +dsuperscriptsubscript𝑑\mathbb{R}_{+}^{d}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for which the joint tail region in (2) is defined. This model has been successfully applied in environmental applications by, e.g., Murphy-Barltrop et al., (2023), Murphy-Barltrop and Wadsworth, (2024), and Murphy-Barltrop et al., 2024b .

Several of the aforementioned models introduced are defined for random vectors exhibiting standard margins with finite lower bounds, e.g., Pareto, Fréchet, or exponential, which limits the study of extremal dependence. In particular, when applied to random vectors with double tailed margins, such as Laplace, these modelling approaches reduce the study of extremal dependence to data observed only in the positive orthant, +dsubscriptsuperscript𝑑\mathbb{R}^{d}_{+}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT; see Section 2 for further discussion. This can be restrictive in practical applications where different regions of low joint probability mass may be of interest. Consequently, many recent works have introduced modelling approaches for data on standard Laplace margins. For example, Keef et al., (2013) extended the model of Heffernan and Tawn, (2004) to Laplace margins, with the resulting framework providing greater flexibility and interpretability. Moreover, Mackay and Jonathan, (2023), Papastathopoulos et al., (2024) and Murphy-Barltrop et al., 2024a demonstrate that models on Laplace margins permit evaluation of the joint tail behaviour of random vectors in all 2dsuperscript2𝑑2^{d}2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT orthants of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. To demonstrate this, Figure 1 illustrates extreme density contours for a bivariate Gaussian copula with correlation parameter ρ=0.5𝜌0.5\rho=-0.5italic_ρ = - 0.5 on both standard exponential and standard Laplace margins. Hereafter, we use 𝑿:=(X1,,Xd)Tassign𝑿superscriptsubscript𝑋1subscript𝑋𝑑𝑇\bm{X}:=(X_{1},\dots,X_{d})^{T}bold_italic_X := ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT to denote a d𝑑ditalic_d-dimensional random vector with standard Laplace margins, with distribution function F𝑿()subscript𝐹𝑿F_{\bm{X}}(\cdot)italic_F start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) and continuous density function f𝑿()subscript𝑓𝑿f_{\bm{X}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( ⋅ ).

Refer to caption
Figure 1: Density contours for a bivariate random vector 𝑿𝑿\bm{X}bold_italic_X with a Gaussian copula (with negative correlation) and with standard exponential (left) or standard Laplace (right) margins. The density levels for each colour are given in the legend in the left panel.

Recent theoretical developments for multivariate extremes have focused on geometric extremes, whereby extremal dependence properties of 𝑿𝑿\bm{X}bold_italic_X can be inferred directly from the deterministic limiting shapes of scaled sample clouds. Let Cn:={𝑿i/rn}i=1nassignsubscript𝐶𝑛subscriptsuperscriptsubscript𝑿𝑖subscript𝑟𝑛𝑛𝑖1C_{n}:={\{\bm{X}_{i}/r_{n}\}^{n}_{i=1}}italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := { bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT denote n𝑛nitalic_n independent copies of 𝑿𝑿\bm{X}bold_italic_X, scaled by a suitably chosen positive sequence (rn)nsubscriptsubscript𝑟𝑛𝑛(r_{n})_{n\in\mathbb{N}}( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT satisfying rnsubscript𝑟𝑛r_{n}\to\inftyitalic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → ∞ as n𝑛n\to\inftyitalic_n → ∞. Under mild conditions, Cnsubscript𝐶𝑛C_{n}italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges in probability, with respect to the Hausdorff metric, onto the compact, star-shaped limit set 𝒢:={𝒙:g(𝒙)1}[1,1]dassign𝒢conditional-set𝒙𝑔𝒙1superscript11𝑑\mathcal{G}:=\{\bm{x}:g(\bm{x})\leq 1\}\subset[-1,1]^{d}caligraphic_G := { bold_italic_x : italic_g ( bold_italic_x ) ≤ 1 } ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where g:d+:𝑔maps-tosuperscript𝑑subscriptg:\mathbb{R}^{d}\mapsto\mathbb{R}_{+}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is the gauge function of 𝒢𝒢\mathcal{G}caligraphic_G (Fisher,, 1969; Davis et al.,, 1988; Kinoshita and Resnick,, 1991). A sufficient condition for convergence onto 𝒢𝒢\mathcal{G}caligraphic_G is that

logf𝑿(t𝒙)tg(𝒙),𝒙d,t,formulae-sequencesimilar-tosubscript𝑓𝑿𝑡𝒙𝑡𝑔𝒙formulae-sequence𝒙superscript𝑑𝑡-\log f_{\bm{X}}(t\bm{x})\sim tg(\bm{x}),\;\;\bm{x}\in\mathbb{R}^{d},\;\;t\to\infty,- roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_x ) ∼ italic_t italic_g ( bold_italic_x ) , bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_t → ∞ , (3)

or, equivalently, g(𝒙)=limt[logf𝑿(t𝒙)]/t𝑔𝒙subscript𝑡delimited-[]subscript𝑓𝑿𝑡𝒙𝑡g(\bm{x})=\lim_{t\to\infty}[-\log f_{\bm{X}}(t\bm{x})]/titalic_g ( bold_italic_x ) = roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT [ - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_x ) ] / italic_t (Balkema and Nolde,, 2010). Following Nolde, (2014), we also define the unit-level, or boundary, set 𝒢:={𝒙:g(𝒙)=1}𝒢assign𝒢conditional-set𝒙𝑔𝒙1𝒢\partial\mathcal{G}:=\{\bm{x}:g(\bm{x})=1\}\subset\mathcal{G}∂ caligraphic_G := { bold_italic_x : italic_g ( bold_italic_x ) = 1 } ⊂ caligraphic_G. For standard Laplace margins, it suffices to set rn=log(n/2)subscript𝑟𝑛𝑛2r_{n}=\log(n/2)italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log ( italic_n / 2 ) to achieve the required convergence (Papastathopoulos et al.,, 2024). To demonstrate this, Figure 2 illustrates the limit set 𝒢𝒢\mathcal{G}caligraphic_G and unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G for three copulas, alongside n=10,000𝑛10000n=10,000italic_n = 10 , 000 samples {𝒙i/log(n/2)}i=1nsuperscriptsubscriptsubscript𝒙𝑖𝑛2𝑖1𝑛\{\bm{x}_{i}/\log(n/2)\}_{i=1}^{n}{ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / roman_log ( italic_n / 2 ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT from each copula; formal definitions of these copulas are given in Section 4. One can observe that the (finite) observed sample clouds lie approximately within the theoretical limit set. Hereafter, we implicitly assume that any 𝑿𝑿\bm{X}bold_italic_X satisfies the conditions for convergence onto 𝒢𝒢\mathcal{G}caligraphic_G.

Refer to caption
Figure 2: Scaled sample clouds of size n=10,000𝑛10000n=10,000italic_n = 10 , 000 from Gaussian (left), Student-t (middle), and logistic (right) copulas on standard Laplace margins. For each panel, the shaded regions and solid red lines give the limit set 𝒢𝒢\mathcal{G}caligraphic_G and the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, respectively, while the solid blue lines denote the set {𝒘/Λ(𝒘):𝒘𝒮d1𝒜}conditional-set𝒘Λ𝒘𝒘superscript𝒮𝑑1𝒜\{\bm{w}/\Lambda(\bm{w}):\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}\}{ bold_italic_w / roman_Λ ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A }; see Section 2.4 for further details. The black dotted lines denote the unit square.

Nolde, (2014) and Nolde and Wadsworth, (2022) illustrate that the shape of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, or equivalently 𝒢𝒢\mathcal{G}caligraphic_G, is directly related to extremal dependence of 𝑿𝑿\bm{X}bold_italic_X. Specifically, 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G links together several representations for multivariate extremes: from 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, we can determine all parameters associated with the models described in equations (1) and (2), as well as those proposed by Heffernan and Tawn, (2004) and Simpson et al., (2020). Taking, e.g., the ADF, we have that λ(𝒘)=𝒘×𝔯𝒘1𝜆𝒘subscriptnorm𝒘superscriptsubscript𝔯𝒘1\lambda(\bm{w})=||\bm{w}||_{\infty}\times\mathfrak{r}_{\bm{w}}^{-1}italic_λ ( bold_italic_w ) = | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT × fraktur_r start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for all 𝒘𝒮+d1𝒘subscriptsuperscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, where 𝔯𝒘:=max{𝔯[0,1]:𝔯𝒘𝒢}assignsubscript𝔯𝒘:𝔯01𝔯subscript𝒘𝒢\mathfrak{r}_{\bm{w}}:=\max\{\mathfrak{r}\in[0,1]:\mathfrak{r}\mathcal{R}_{\bm% {w}}\cap\partial\mathcal{G}\neq\emptyset\}fraktur_r start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT := roman_max { fraktur_r ∈ [ 0 , 1 ] : fraktur_r caligraphic_R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ } is a 𝒘𝒘\bm{w}bold_italic_w-dependent coefficient used to scale back the set 𝒘:=i=1,,d[wi/𝒘,]assignsubscript𝒘subscripttensor-product𝑖1𝑑subscript𝑤𝑖subscriptnorm𝒘\mathcal{R}_{\bm{w}}:=\bigotimes_{i=1,\ldots,d}[w_{i}/||\bm{w}||_{\infty},\infty]caligraphic_R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT := ⨂ start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT [ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , ∞ ] to intersect with 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G. Such parameters allow us to quantify extremal dependence of 𝑿𝑿\bm{X}bold_italic_X in an interpretable manner. Furthermore, from a practical perspective, estimates of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G can be used to estimate extreme statistics; for example, joint tail probabilities (Wadsworth and Campbell,, 2024), return curves (Murphy-Barltrop et al., 2024b, ), and return level sets (Papastathopoulos et al.,, 2024). The unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G therefore offers a high degree of practical utility for inference of multivariate extremes, and thus its accurate estimation is of particular importance.

Owing to this perspective, recent works have introduced techniques for the estimation of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, on both standard exponential and standard Laplace margins. For the former margin, Simpson and Tawn, 2024a and Majumder et al., 2024b proposed semi-parametric techniques using generalised additive models (GAMs) and Bézier curves, respectively, to approximate 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, while Wadsworth and Campbell, (2024) used a parametric-copula-based model. For Laplace margins, Papastathopoulos et al., (2024) proposed a latent Gaussian model to approximate the shape of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, while Murphy-Barltrop et al., 2024a used GAMs to estimate 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G using the model introduced by Mackay and Jonathan, (2023). Within these approaches, a wide range of statistical techniques have been considered in both Bayesian and frequentist settings, and it is possible to obtain accurate estimates of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G for a wide range of dependence structures. However, current techniques for modelling and estimating 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G have several shortcomings: i) all existing approaches are limited to a low-dimensional setting (d3𝑑3d\leq 3italic_d ≤ 3), ii) the restriction of several approaches to standard exponential margins offers a limited perspective for evaluating joint tail properties (see Figure 1), and iii) one must always specify parametric or semi-parametric forms for quantities related to 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, or select a large number of tuning parameters. Therefore, the existing estimation techniques for 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G offer limited practical utility, motivating novel developments.

Recent literature combining extreme value theory and deep learning has seen the construction of flexible, computationally-scalable models for univariate extremes (see, e.g., Pasche and Engelke,, 2022; Richards and Huser,, 2022; Cisneros et al.,, 2024) and generative models for multivariate and spatial extremes (Boulaguiem et al.,, 2022; Lafon et al.,, 2023; Zhang et al.,, 2023; Majumder et al., 2024a, ). While Hasan et al., (2022) used neural networks to build flexible models for asymptotically-dependent multivariate extremes, deep learning is yet to be exploited in the construction of models that can capture non-asymptotically dependent data structures. Here, we propose the first deep learning-based approach for modelling 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, referred to hereafter as the DeepGauge framework. Our approach uses neural networks to perform inference on multivariate extremes, and represents a significant step towards flexible and robust models that require minimal parametric assumptions. The use of deep learning methods gives the DeepGauge framework a high degree of flexibility and, as we demonstrate in Sections 4 and 5, allows us to capture a wide variety of extremal dependence structures. Furthermore, it can be applied in higher dimensional settings (d4𝑑4d\geq 4italic_d ≥ 4) than existing techniques for estimating 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, and also requires fewer modelling assumptions.

The paper is organised as follows. Section 2 outlines the theory underpinning the DeepGauge framework, alongside novel theoretical results pertaining to the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G and its estimation. Section 3 outlines our methodology for estimating 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G and related quantities, with Section 3.3 detailing our neural network-based representation for geometric extremes and Section 3.5 introducing novel diagnostic tools for validating model fits in high dimensional settings. Section 4 provides a simulation study showcasing the efficacy of our framework for inferring the extremal dependence of random vectors. Section 5 provides an application to the NORA10 hindcast data set of meteorological and oceanographic (metocean) variables in the North Sea that exhibit complex dependence structures. We conclude in Section 6 with a discussion and outlook on future work.

2 Theoretical developments in Geometric Extremes

2.1 Overview of the angular-radial decomposition

To estimate the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G and its corresponding gauge function g()𝑔g(\cdot)italic_g ( ⋅ ), we first decompose 𝑿𝑿\bm{X}bold_italic_X into angular and radial components, and then model the radii conditional on a fixed angle. While one could select one of many radial-angular systems for this decomposition, we follow advocacy by Murphy-Barltrop et al., 2024a and define angular and radial components via the Euclidean norm. For any 𝑿d𝟎d𝑿superscript𝑑subscript0𝑑\bm{X}\in\mathbb{R}^{d}\setminus\bm{0}_{d}bold_italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, with 𝟎d:=(0,,0)Tassignsubscript0𝑑superscript00𝑇\bm{0}_{d}:=(0,\ldots,0)^{T}bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT := ( 0 , … , 0 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, define (R,𝑾)𝑅𝑾(R,\bm{W})( italic_R , bold_italic_W ) by 𝑿(R,𝑾):=(𝑿,𝑿/𝑿)maps-to𝑿𝑅𝑾assignnorm𝑿𝑿norm𝑿\bm{X}\mapsto(R,\bm{W}):=(\|\bm{X}\|,\bm{X}/\|\bm{X}\|)bold_italic_X ↦ ( italic_R , bold_italic_W ) := ( ∥ bold_italic_X ∥ , bold_italic_X / ∥ bold_italic_X ∥ ) for R>0𝑅0R>0italic_R > 0 and 𝑾𝒮d1𝑾superscript𝒮𝑑1\bm{W}\in\mathcal{S}^{d-1}bold_italic_W ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, where 𝒮d1:={𝒙d:𝒙=1}assignsuperscript𝒮𝑑1conditional-set𝒙superscript𝑑norm𝒙1\mathcal{S}^{d-1}:=\{\bm{x}\in\mathbb{R}^{d}:||\bm{x}||=1\}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT := { bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : | | bold_italic_x | | = 1 } denotes the unit (d1)𝑑1(d-1)( italic_d - 1 )-sphere. It follows that 𝑿=R𝑾𝑿𝑅𝑾\bm{X}=R\bm{W}bold_italic_X = italic_R bold_italic_W, which implies that 𝑿𝑿\bm{X}bold_italic_X is completely determined by the joint behaviour of R𝑅Ritalic_R and 𝑾𝑾\bm{W}bold_italic_W. It is trivial to show that map** t:d𝟎d+×𝒮d1:𝑡maps-tosuperscript𝑑subscript0𝑑subscriptsuperscript𝒮𝑑1t:\mathbb{R}^{d}\setminus\bm{0}_{d}\mapsto\mathbb{R}_{+}\times\mathcal{S}^{d-1}italic_t : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT × caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, where t(𝒙):=(𝒙,𝒙/𝒙)assign𝑡𝒙norm𝒙𝒙norm𝒙t(\bm{x}):=(||\bm{x}||,\bm{x}/||\bm{x}||)italic_t ( bold_italic_x ) := ( | | bold_italic_x | | , bold_italic_x / | | bold_italic_x | | ), is bijective; thus, no information is lost through considering (R,𝑾)𝑅𝑾(R,\bm{W})( italic_R , bold_italic_W ). Loosely speaking, R𝑅Ritalic_R is the magnitude of an event, while 𝑾𝑾\bm{W}bold_italic_W defines its direction, i.e., in which orthant of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT the event occurs. Note that for any d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N there exist 2dsuperscript2𝑑2^{d}2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT possible orthants, as each component can be either positive or negative.

We now recall some theoretical properties of gauge functions and limit sets. The star-shapedness of the limit set 𝒢𝒢\mathcal{G}caligraphic_G implies that, for any 𝒙𝒢𝒙𝒢\bm{x}\in\mathcal{G}bold_italic_x ∈ caligraphic_G and t>0𝑡0t>0italic_t > 0, we have t𝒙𝒢𝑡𝒙𝒢t\bm{x}\in\mathcal{G}italic_t bold_italic_x ∈ caligraphic_G. Moreover, if 𝟎d𝒢subscript0𝑑𝒢\bm{0}_{d}\in\mathcal{G}bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ caligraphic_G and g(𝟎d)<1𝑔subscript0𝑑1g(\bm{0}_{d})<1italic_g ( bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) < 1, we have that the line segment {𝟎d+t𝒙:t[0,1]}𝒢conditional-setsubscript0𝑑𝑡𝒙𝑡01𝒢\{\bm{0}_{d}+t\bm{x}:t\in[0,1]\}\subset\mathcal{G}{ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_t bold_italic_x : italic_t ∈ [ 0 , 1 ] } ⊂ caligraphic_G for any 𝒙𝒢𝒙𝒢\bm{x}\in\mathcal{G}bold_italic_x ∈ caligraphic_G. Furthermore, one can show that the componentwise maxima and minima of 𝒢𝒢\mathcal{G}caligraphic_G equal 𝟏dsubscript1𝑑\bm{1}_{d}bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and 𝟏dsubscript1𝑑-\bm{1}_{d}- bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT respectively, implying that 𝒢𝒢\mathcal{G}caligraphic_G (and, thus, the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G) must touch all boundaries of the unit hypercube [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT at least once. We further note that the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ) of 𝒢𝒢\mathcal{G}caligraphic_G is 1-homogeneous, i.e., g(t𝒙)=tg(𝒙)𝑔𝑡𝒙𝑡𝑔𝒙g(t\bm{x})=tg(\bm{x})italic_g ( italic_t bold_italic_x ) = italic_t italic_g ( bold_italic_x ) for any 𝒙d,t+formulae-sequence𝒙superscript𝑑𝑡subscript\bm{x}\in\mathbb{R}^{d},t\in\mathbb{R}_{+}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_t ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

Using the radial-angular decomposition of 𝑿𝑿\bm{X}bold_italic_X and the star-shaped property of 𝒢𝒢\mathcal{G}caligraphic_G, we can reformulate the unit-level set as 𝒢={r𝒘:r>0,𝒘𝒮d1,g(r𝒘)=1}.𝒢conditional-set𝑟𝒘formulae-sequence𝑟0formulae-sequence𝒘superscript𝒮𝑑1𝑔𝑟𝒘1\partial\mathcal{G}=\left\{r\bm{w}:r>0,\bm{w}\in\mathcal{S}^{d-1},g(r\bm{w})=1% \right\}.∂ caligraphic_G = { italic_r bold_italic_w : italic_r > 0 , bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT , italic_g ( italic_r bold_italic_w ) = 1 } . Homogeneity of g()𝑔g(\cdot)italic_g ( ⋅ ) implies that, for any 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, the radial value of the corresponding point on the unit-level set must be 1/g(𝒘)1𝑔𝒘1/g(\bm{w})1 / italic_g ( bold_italic_w ); hence, 𝒢={𝒘/g(𝒘):𝒘𝒮d1}.𝒢conditional-set𝒘𝑔𝒘𝒘superscript𝒮𝑑1\partial\mathcal{G}=\left\{\bm{w}/g(\bm{w}):\bm{w}\in\mathcal{S}^{d-1}\right\}.∂ caligraphic_G = { bold_italic_w / italic_g ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } . This reformulation has the powerful implication that, to determine 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, we need only to evaluate g()𝑔g(\cdot)italic_g ( ⋅ ) on 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. Illustrations of the radial-angular representations for the sets 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G are given in Figure 3.

Refer to caption
Figure 3: Gaussian limit set 𝒢𝒢\mathcal{G}caligraphic_G (shaded regions) and unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G (solid red lines) in Cartesian (left) and polar (right) coordinates. The black dotted lines in both plots denote the unit square. For the right panel, we plot the radii of the unit-level set (i.e., 1/g(𝒘)1𝑔𝒘1/g(\bm{w})1 / italic_g ( bold_italic_w )) against the standard polar angle, cos1(w1)=sin1(w2)[0,2π)superscript1subscript𝑤1superscript1subscript𝑤202𝜋\cos^{-1}(w_{1})=\sin^{-1}(w_{2})\in[0,2\pi)roman_cos start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = roman_sin start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ [ 0 , 2 italic_π ) for each (w1,w2)𝒮1subscript𝑤1subscript𝑤2superscript𝒮1(w_{1},w_{2})\in\mathcal{S}^{1}( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ caligraphic_S start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.

2.2 Geometric extremes on Laplace margins

As noted in Section 1, we consider data on standard Laplace margins as this permits a more detailed description of joint tail behaviour. However, the theoretical results provided by Nolde and Wadsworth, (2022), linking the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G to a variety of modelling frameworks, are given only for random vectors on standard exponential margins. The next proposition illustrates that some of the same results also hold for data on Laplace margins.

Proposition 2.1.

Consider a random vector 𝐗d𝟎d𝐗superscript𝑑subscript0𝑑\bm{X}\in\mathbb{R}^{d}\setminus\bm{0}_{d}bold_italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT with standard Laplace margins and gauge function g()𝑔g(\cdot)italic_g ( ⋅ ). Let 𝐗Esubscript𝐗𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT denote the same vector with unit exponential margins, with gauge function gE()subscript𝑔𝐸g_{E}(\cdot)italic_g start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( ⋅ ). We have equality of the gauge functions for positive angles, that is, g(𝐰)=gE(𝐰)𝑔𝐰subscript𝑔𝐸𝐰g(\bm{w})=g_{E}(\bm{w})italic_g ( bold_italic_w ) = italic_g start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_w ) for all 𝐰𝒮+d1𝐰subscriptsuperscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

Proof of Proposition 2.1 is provided in Appendix B.1.

Remark 1.

From Proposition 2.1, we immediately see that the sets {𝒘/gE(𝒘):𝒘𝒮+d1}conditional-set𝒘subscript𝑔𝐸𝒘𝒘superscriptsubscript𝒮𝑑1\left\{\bm{w}/g_{E}(\bm{w}):\bm{w}\in\mathcal{S}_{+}^{d-1}\right\}{ bold_italic_w / italic_g start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } and {𝒘/g(𝒘):𝒘𝒮+d1}conditional-set𝒘𝑔𝒘𝒘superscriptsubscript𝒮𝑑1\left\{\bm{w}/g(\bm{w}):\bm{w}\in\mathcal{S}_{+}^{d-1}\right\}{ bold_italic_w / italic_g ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } are identical, implying equality of the unit-level sets on the positive orthant. This has the strong implication that the results proposed by Nolde and Wadsworth, (2022), which link the frameworks described in equations (1) and (2), as well as the model proposed by Simpson et al., (2020), to the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, are also valid for random vectors with Laplace margins.

2.3 Constructing valid unit-level sets

We now consider estimation of the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G or, equivalently, the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ) on 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. This requires estimates of the corresponding limit set, 𝒢𝒢\mathcal{G}caligraphic_G, to have certain properties; see Section 2.1. In what follows, we show that such properties are easily satisfied via an appropriate estimator for g()𝑔g(\cdot)italic_g ( ⋅ ). We begin by noting that g()𝑔g(\cdot)italic_g ( ⋅ ) must satisfy the constraint described in the following Proposition 2.2.

Proposition 2.2.

For all 𝐰𝒮d1𝐰superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ) satisfies the constraint that

g(𝒘)𝒘,𝑔𝒘subscriptnorm𝒘g(\bm{w})\geq||\bm{w}||_{\infty},italic_g ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ,

where 𝐱:=max{|x1|,,|xd|}assignsubscriptnorm𝐱subscript𝑥1subscript𝑥𝑑||\bm{x}||_{\infty}:=\max\{|x_{1}|,\ldots,|x_{d}|\}| | bold_italic_x | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT := roman_max { | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , … , | italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT | } denotes the infinity norm.

Proof.

Given 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, let 𝒘/g(𝒘)𝒢𝒘𝑔𝒘𝒢\bm{w}/g(\bm{w})\in\partial\mathcal{G}bold_italic_w / italic_g ( bold_italic_w ) ∈ ∂ caligraphic_G denote the corresponding point on the unit-level set. Since 𝒢[1,1]d𝒢superscript11𝑑\partial\mathcal{G}\subset[-1,1]^{d}∂ caligraphic_G ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we must have maxi=1,,d{wi}/g(𝒘)=maxi=1,,d{wi/g(𝒘)}1,subscript𝑖1𝑑subscript𝑤𝑖𝑔𝒘subscript𝑖1𝑑subscript𝑤𝑖𝑔𝒘1\max_{i=1,\dots,d}\{{w}_{i}\}/g(\bm{w})=\max_{i=1,\dots,d}\{{w}_{i}/g(\bm{w})% \}\leq 1,roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } / italic_g ( bold_italic_w ) = roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_g ( bold_italic_w ) } ≤ 1 , which implies that maxi=1,,d{wi}g(𝒘)subscript𝑖1𝑑subscript𝑤𝑖𝑔𝒘\max_{i=1,\dots,d}\{{w}_{i}\}\leq g(\bm{w})roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ≤ italic_g ( bold_italic_w ). Similarly for the vector minima, we obtain maxi=1,,d{wi}=mini=1,,d{wi}g(𝒘)subscript𝑖1𝑑subscript𝑤𝑖subscript𝑖1𝑑subscript𝑤𝑖𝑔𝒘\max_{i=1,\dots,d}\{-{w}_{i}\}=-\min_{i=1,\dots,d}\{{w}_{i}\}\leq g(\bm{w})roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { - italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = - roman_min start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ≤ italic_g ( bold_italic_w ). Considering each component of 𝒘𝒘\bm{w}bold_italic_w gives g(𝒘)max(wi,wi)𝑔𝒘subscript𝑤𝑖subscript𝑤𝑖g(\bm{w})\geq\max(w_{i},-w_{i})italic_g ( bold_italic_w ) ≥ roman_max ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , - italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for all i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d; hence, g(𝒘)𝒘𝑔𝒘subscriptnorm𝒘g(\bm{w})\geq||\bm{w}||_{\infty}italic_g ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. ∎

Ignoring this constraint may lead to estimates of 𝒢𝒢\mathcal{G}caligraphic_G not contained within [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Such estimates lack any theoretical interpretation and must undergo a user-specified rescaling to be of use; see, e.g., Papastathopoulos et al., (2024) or Murphy-Barltrop et al., 2024a . We exploit Proposition 2.2 to ensure that our model does not suffer from this problem and can be designed to always provide valid estimates of limit sets. Now consider any continuous radial function h:𝒮d1+:maps-tosuperscript𝒮𝑑1subscripth:\mathcal{S}^{d-1}\mapsto\mathbb{R}_{+}italic_h : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. The following proposition holds.

Proposition 2.3.

Suppose h():𝒮d1+:maps-tosuperscript𝒮𝑑1subscripth(\cdot):\mathcal{S}^{d-1}\mapsto\mathbb{R}_{+}italic_h ( ⋅ ) : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT satisfies 1/h(𝐰)𝐰1𝐰subscriptnorm𝐰1/h(\bm{w})\geq||\bm{w}||_{\infty}1 / italic_h ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT for all 𝐰𝒮d1𝐰superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, and define the set

:={𝒙d{𝟎d}|𝒙h(𝒙/𝒙)}{𝟎d}.assignconditional-set𝒙superscript𝑑subscript0𝑑norm𝒙𝒙norm𝒙subscript0𝑑\mathcal{H}:=\left\{\bm{x}\in\mathbb{R}^{d}\setminus\{\bm{0}_{d}\}\;\bigg{|}\;% ||\bm{x}||\leq h(\bm{x}/||\bm{x}||)\right\}\bigcup\bigg{\{}\bm{0}_{d}\bigg{\}}.caligraphic_H := { bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } | | | bold_italic_x | | ≤ italic_h ( bold_italic_x / | | bold_italic_x | | ) } ⋃ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } .

Then \mathcal{H}caligraphic_H is star-shaped and satisfies [1,1]dsuperscript11𝑑\mathcal{H}\subset[-1,1]^{d}caligraphic_H ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Moreover, \mathcal{H}caligraphic_H is compact.

Proof of Proposition 2.3 is provided in Appendix B.2.

Proposition 2.3 implies that we can design models for 𝒢𝒢\mathcal{G}caligraphic_G that satisfy (some of) the validity properties of limit sets by starting with any (potentially arbitrary) continuous radial function h()h(\cdot)italic_h ( ⋅ ) that satisfies 1/h(𝒘)𝒘1𝒘subscriptnorm𝒘1/h(\bm{w})\geq||\bm{w}||_{\infty}1 / italic_h ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT; the corresponding set \mathcal{H}caligraphic_H will be a subset of [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, star-shaped, and compact. Whilst \mathcal{H}caligraphic_H must be a subset of the unit hyper-cube, it is not guaranteed to intersect with its boundary in each component, or even at all; the same is true of the boundary set \partial\mathcal{H}∂ caligraphic_H. Since this is a requirement of valid limit and unit-level sets, we propose a rescaling procedure to ensure it is always satisfied for our model.

Observe that the boundary of \mathcal{H}caligraphic_H is given by ={𝒘h(𝒘):𝒘𝒮d1}conditional-set𝒘𝒘𝒘superscript𝒮𝑑1\partial\mathcal{H}=\left\{\bm{w}h(\bm{w}):\bm{w}\in\mathcal{S}^{d-1}\right\}∂ caligraphic_H = { bold_italic_w italic_h ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT }. For each i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d, we define bi(wi):=𝟙(wi0)biU𝟙(wi<0)biL>0assignsubscript𝑏𝑖subscript𝑤𝑖1subscript𝑤𝑖0superscriptsubscript𝑏𝑖𝑈1subscript𝑤𝑖0superscriptsubscript𝑏𝑖𝐿0b_{i}(w_{i}):=\mathbbm{1}(w_{i}\geq 0)b_{i}^{U}-\mathbbm{1}(w_{i}<0)b_{i}^{L}>0italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) := blackboard_1 ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 ) italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT - blackboard_1 ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 ) italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT > 0, where

biU:=max{wih(𝒘)𝒘𝒮d1}>0,andbiL:=min{wih(𝒘)𝒘𝒮d1}<0,formulae-sequenceassignsuperscriptsubscript𝑏𝑖𝑈conditionalsubscript𝑤𝑖𝒘𝒘superscript𝒮𝑑10assignandsuperscriptsubscript𝑏𝑖𝐿conditionalsubscript𝑤𝑖𝒘𝒘superscript𝒮𝑑10\displaystyle b_{i}^{U}:=\max\left\{w_{i}h(\bm{w})\mid\bm{w}\in\mathcal{S}^{d-% 1}\right\}>0,\quad\text{and}\quad b_{i}^{L}:=\min\left\{w_{i}h(\bm{w})\mid\bm{% w}\in\mathcal{S}^{d-1}\right\}<0,italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT := roman_max { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) ∣ bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } > 0 , and italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT := roman_min { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) ∣ bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } < 0 , (4)

for all i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d. Using these scaling functions, we define the rescaled set

~:={h(𝒘)(w1b1(w1),,wdbd(wd))|𝒘𝒮d1},assign~conditional-set𝒘subscript𝑤1subscript𝑏1subscript𝑤1subscript𝑤𝑑subscript𝑏𝑑subscript𝑤𝑑𝒘superscript𝒮𝑑1\widetilde{\partial\mathcal{H}}:=\left\{h(\bm{w})\left(\frac{w_{1}}{b_{1}(w_{1% })},\ldots,\frac{w_{d}}{b_{d}(w_{d})}\right)\bigg{|}\bm{w}\in\mathcal{S}^{d-1}% \right\},over~ start_ARG ∂ caligraphic_H end_ARG := { italic_h ( bold_italic_w ) ( divide start_ARG italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) | bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } ,

which satisfies the following Proposition 2.4.

Proposition 2.4.

The rescaled set ~~\widetilde{\partial\mathcal{H}}over~ start_ARG ∂ caligraphic_H end_ARG is in one-to-one correspondence with \partial\mathcal{H}∂ caligraphic_H, satisfies ~[1,1]d~superscript11𝑑\widetilde{\partial\mathcal{H}}\subset[-1,1]^{d}over~ start_ARG ∂ caligraphic_H end_ARG ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and has componentwise maxima and minima 𝟏dsubscript1𝑑\bm{1}_{d}bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and 𝟏dsubscript1𝑑-\bm{1}_{d}- bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, respectively.

To map \partial\mathcal{H}∂ caligraphic_H to ~~\widetilde{\partial\mathcal{H}}over~ start_ARG ∂ caligraphic_H end_ARG, we require the transformation described in the following lemma.

Lemma 2.1.

Let κ:𝒮d1𝒮d1:𝜅maps-tosuperscript𝒮𝑑1superscript𝒮𝑑1\kappa:\mathcal{S}^{d-1}\mapsto\mathcal{S}^{d-1}italic_κ : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT denote the following map**:

κ(𝒘)=(w1b1(w1),,wdbd(wd))/(w1b1(w1),,wdbd(wd)).𝜅𝒘subscript𝑤1subscript𝑏1subscript𝑤1subscript𝑤𝑑subscript𝑏𝑑subscript𝑤𝑑delimited-∥∥subscript𝑤1subscript𝑏1subscript𝑤1subscript𝑤𝑑subscript𝑏𝑑subscript𝑤𝑑\kappa(\bm{w})=\left(\frac{w_{1}}{b_{1}(w_{1})},\ldots,\frac{w_{d}}{b_{d}(w_{d% })}\right)\bigg{/}\bigg{\lVert}\left(\frac{w_{1}}{b_{1}(w_{1})},\ldots,\frac{w% _{d}}{b_{d}(w_{d})}\right)\bigg{\rVert}.italic_κ ( bold_italic_w ) = ( divide start_ARG italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) / ∥ ( divide start_ARG italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ .

Then κ𝜅\kappaitalic_κ is a bijective map**.

Lemma 2.1 is used to prove Proposition 2.4 in Appendix B.3. Proposition 2.4 permits a new construction of valid gauge functions, denoted by g~:𝒮d1+:~𝑔maps-tosuperscript𝒮𝑑1subscript\tilde{g}:\mathcal{S}^{d-1}\mapsto\mathbb{R}_{+}over~ start_ARG italic_g end_ARG : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, with

g~(𝒘):=1/h(κ1(𝒘))(κ1(𝒘)1b1(κ1(𝒘)1),,κ1(𝒘)dbd(κ1(𝒘)d)).assign~𝑔𝒘1normsuperscript𝜅1𝒘superscript𝜅1subscript𝒘1subscript𝑏1superscript𝜅1subscript𝒘1superscript𝜅1subscript𝒘𝑑subscript𝑏𝑑superscript𝜅1subscript𝒘𝑑\tilde{g}(\bm{w}):=1\bigg{/}\left\|h(\kappa^{-1}(\bm{w}))\left(\frac{\kappa^{-% 1}(\bm{w})_{1}}{b_{1}(\kappa^{-1}(\bm{w})_{1})},\ldots,\frac{\kappa^{-1}(\bm{w% })_{d}}{b_{d}(\kappa^{-1}(\bm{w})_{d})}\right)\right\|.over~ start_ARG italic_g end_ARG ( bold_italic_w ) := 1 / ∥ italic_h ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) ) ( divide start_ARG italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ . (5)

Note that we term g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) the rescaled gauge function. Since h()h(\cdot)italic_h ( ⋅ ) is continuous, we have that g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) is also continuous. Moreover, as we show in the following corollaries, g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) satisfies the theoretical properties required to produce valid limit sets.

Corollary 2.1.

The rescaled gauge function g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) satisfies g~(𝐰)𝐰~𝑔𝐰subscriptnorm𝐰\tilde{g}(\bm{w})\geq||\bm{w}||_{\infty}over~ start_ARG italic_g end_ARG ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT for all 𝐰𝒮d1𝐰superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. Let 𝐰u,i=argmax𝐰𝒮d1{wih(𝐰)}superscript𝐰𝑢𝑖subscriptargmax𝐰superscript𝒮𝑑1subscript𝑤𝑖𝐰\bm{w}^{u,i}=\operatorname*{arg\,max}_{\bm{w}\in\mathcal{S}^{d-1}}\{w_{i}h(\bm% {w})\}bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) } and 𝐰l,i=argmin𝐰𝒮d1{wih(𝐰)}superscript𝐰𝑙𝑖subscriptargmin𝐰superscript𝒮𝑑1subscript𝑤𝑖𝐰\bm{w}^{l,i}=\operatorname*{arg\,min}_{\bm{w}\in\mathcal{S}^{d-1}}\{w_{i}h(\bm% {w})\}bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) } for i=1,,d𝑖1𝑑i=1,\dots,ditalic_i = 1 , … , italic_d. Then g~(κ(𝐰u,i))=κ(𝐰u,i)~𝑔𝜅superscript𝐰𝑢𝑖subscriptnorm𝜅superscript𝐰𝑢𝑖\tilde{g}(\kappa(\bm{w}^{u,i}))=||\kappa(\bm{w}^{u,i})||_{\infty}over~ start_ARG italic_g end_ARG ( italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) ) = | | italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and g~(κ(𝐰l,i))=κ(𝐰l,i)~𝑔𝜅superscript𝐰𝑙𝑖subscriptnorm𝜅superscript𝐰𝑙𝑖\tilde{g}(\kappa(\bm{w}^{l,i}))=||\kappa(\bm{w}^{l,i})||_{\infty}over~ start_ARG italic_g end_ARG ( italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT ) ) = | | italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT for each i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d.

Corollary 2.2.

Let h()h(\cdot)italic_h ( ⋅ ) be any continuous radial function and define the corresponding rescaled gauge function by g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) as in (5). The set

~:={𝒙d{𝟎d}|𝒙1g~(𝒙/𝒙)}{𝟎d},assign~conditional-set𝒙superscript𝑑subscript0𝑑norm𝒙1~𝑔𝒙norm𝒙subscript0𝑑\widetilde{\mathcal{H}}:=\left\{\bm{x}\in\mathbb{R}^{d}\setminus\{\bm{0}_{d}\}% \bigg{|}||\bm{x}||\leq\frac{1}{\tilde{g}(\bm{x}/||\bm{x}||)}\right\}\bigcup% \bigg{\{}\bm{0}_{d}\bigg{\}},over~ start_ARG caligraphic_H end_ARG := { bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } | | | bold_italic_x | | ≤ divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_g end_ARG ( bold_italic_x / | | bold_italic_x | | ) end_ARG } ⋃ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } ,

is star-shaped, compact, and satisfies ~[1,1]d~superscript11𝑑\widetilde{\mathcal{H}}\subset[-1,1]^{d}over~ start_ARG caligraphic_H end_ARG ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Furthermore, ~~\widetilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG has componentwise maxima and minima 𝟏dsubscript1𝑑\bm{1}_{d}bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and 𝟏dsubscript1𝑑-\bm{1}_{d}- bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, respectively.

Proof of Corollary 2.1 is provided in Appendix B.4. Proof of Corollary 2.2 follows directly from Propositions 2.3 and 2.4, and Corollary 2.1.

Remark 2.

Corollary 2.2 also implies that the boundary ~~\widetilde{\partial\mathcal{H}}over~ start_ARG ∂ caligraphic_H end_ARG associated with ~~\widetilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG satisfies all of the required theoretical properties of valid unit-level sets.

For inference of the limit set, we exploit Corollary 2.2 to ensure our estimates of the limit (or unit-level) set have the required properties for validity; see Section 3.2. We note that an alternative rescaling was proposed by Papastathopoulos et al., (2024). However, this was applied post-hoc on an initial estimate of the gauge function via a two-step procedure, whereas our rescaling is performed during inference without the need for an additional step.

2.4 Extended Angular Dependence Function

Consider now any gauge function g()𝑔g(\cdot)italic_g ( ⋅ ) with limit and unit level sets 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G respectively. Proposition 2.1 implies that valid estimates of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G can be used to immediately obtain parameter estimates for several existing modelling frameworks, providing information about the extremal dependence structure in the positive orthant of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. However, as noted in Section 1, our interest lies more generally in understanding the extremal dependence in all orthants of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The next proposition shows that, with suitable scaling coefficients, the limit set in any orthant can be obtained by considering a rescaled random vector on the set 𝒮+d1subscriptsuperscript𝒮𝑑1\mathcal{S}^{d-1}_{+}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

Proposition 2.5.

Given 𝐰𝒮d1𝐰superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, let 𝐜:=(ε(w1),,ε(wd))Tassign𝐜superscript𝜀subscript𝑤1𝜀subscript𝑤𝑑𝑇\bm{c}:=(\varepsilon(w_{1}),\ldots,\varepsilon(w_{d}))^{T}bold_italic_c := ( italic_ε ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_ε ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where ε(x)=1𝜀𝑥1\varepsilon(x)=1italic_ε ( italic_x ) = 1 for x0𝑥0x\geq 0italic_x ≥ 0 and ε(x)=1𝜀𝑥1\varepsilon(x)=-1italic_ε ( italic_x ) = - 1, otherwise. For 𝐜𝐗:=(c1X1,,cdXd)Tassign𝐜𝐗superscriptsubscript𝑐1subscript𝑋1subscript𝑐𝑑subscript𝑋𝑑𝑇\bm{c}\bm{X}:=(c_{1}X_{1},\ldots,c_{d}X_{d})^{T}bold_italic_c bold_italic_X := ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝐜𝐰:=(c1w1,,cdwd)T𝒮+d1assign𝐜𝐰superscriptsubscript𝑐1subscript𝑤1subscript𝑐𝑑subscript𝑤𝑑𝑇subscriptsuperscript𝒮𝑑1\bm{c}\bm{w}:=(c_{1}w_{1},\ldots,c_{d}w_{d})^{T}\in\mathcal{S}^{d-1}_{+}bold_italic_c bold_italic_w := ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, we have g(𝐰)=g𝐜𝐗(𝐜𝐰)𝑔𝐰subscript𝑔𝐜𝐗𝐜𝐰g(\bm{w})=g_{\bm{c}\bm{X}}(\bm{c}\bm{w})italic_g ( bold_italic_w ) = italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ), where g𝐜𝐗()subscript𝑔𝐜𝐗g_{\bm{c}\bm{X}}(\cdot)italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) denotes the gauge function of 𝐜𝐗𝐜𝐗\bm{c}\bm{X}bold_italic_c bold_italic_X.

Proof.

Letting f𝒄𝑿()subscript𝑓𝒄𝑿f_{\bm{c}\bm{X}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) denote the joint density function of 𝒄𝑿𝒄𝑿\bm{c}\bm{X}bold_italic_c bold_italic_X, we have

g(𝒘)=limt[logf𝑿(t𝒘)]/t=limt[logf𝒄𝑿(t𝒄𝒘)]/t=g𝒄𝑿(𝒄𝒘),𝑔𝒘subscript𝑡delimited-[]subscript𝑓𝑿𝑡𝒘𝑡subscript𝑡delimited-[]subscript𝑓𝒄𝑿𝑡𝒄𝒘𝑡subscript𝑔𝒄𝑿𝒄𝒘g(\bm{w})=\lim_{t\to\infty}[-\log f_{\bm{X}}(t\bm{w})]/t=\lim_{t\to\infty}[-% \log f_{\bm{c}\bm{X}}(t\bm{c}\bm{w})]/t=g_{\bm{c}\bm{X}}(\bm{c}\bm{w}),italic_g ( bold_italic_w ) = roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT [ - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_w ) ] / italic_t = roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT [ - roman_log italic_f start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_c bold_italic_w ) ] / italic_t = italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ) ,

as the Jacobian of the transformation 𝒄𝑿𝑿maps-to𝒄𝑿𝑿\bm{c}\bm{X}\mapsto\bm{X}bold_italic_c bold_italic_X ↦ bold_italic_X is equal to 1111. ∎

Remark 3.

Proposition 2.5 implies that, for any 𝒄{1,1}d𝒄superscript11𝑑\bm{c}\in\{-1,1\}^{d}bold_italic_c ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT that leads to the partitioning 𝒮𝒄d1:={𝒘𝒮d1:𝒄:=(ε(w1),,ε(wd))T}assignsubscriptsuperscript𝒮𝑑1𝒄conditional-set𝒘superscript𝒮𝑑1assign𝒄superscript𝜀subscript𝑤1𝜀subscript𝑤𝑑𝑇\mathcal{S}^{d-1}_{\bm{c}}:=\{\bm{w}\in\mathcal{S}^{d-1}:\bm{c}:=(\varepsilon(% w_{1}),\ldots,\varepsilon(w_{d}))^{T}\}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT := { bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT : bold_italic_c := ( italic_ε ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_ε ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT } of the unit (d1)𝑑1(d-1)( italic_d - 1 )-sphere, the sets {𝒘/g(𝒘):𝒘𝒮𝒄d1}conditional-set𝒘𝑔𝒘𝒘subscriptsuperscript𝒮𝑑1𝒄{\{\bm{w}/g(\bm{w}):\bm{w}\in\mathcal{S}^{d-1}_{\bm{c}}\}}{ bold_italic_w / italic_g ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT } and {𝒘/g𝒄𝑿(𝒄𝒘):𝒘𝒮𝒄d1}conditional-set𝒘subscript𝑔𝒄𝑿𝒄𝒘𝒘subscriptsuperscript𝒮𝑑1𝒄\{\bm{w}/g_{\bm{c}\bm{X}}(\bm{c}\bm{w}):\bm{w}\in\mathcal{S}^{d-1}_{\bm{c}}\}{ bold_italic_w / italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_c end_POSTSUBSCRIPT } are equal. Therefore, the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G can be obtained by evaluating gauge functions for rescaled vectors on the set {𝒘𝒮d1:mini=1,,d{wi}0}conditional-set𝒘superscript𝒮𝑑1subscript𝑖1𝑑subscript𝑤𝑖0\{\bm{w}\in\mathcal{S}^{d-1}:\min_{i=1,\dots,d}\{w_{i}\}\geq 0\}{ bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT : roman_min start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ≥ 0 }.

Propositions 2.1 and 2.5 have implications when using the geometric representation to infer the ADF from (2). As this framework is given for vectors on standard exponential margins, careful treatment is required to define an analogous model for Laplace margins.

Proposition 2.6.

Given any angle 𝐰𝒮+d1𝐰subscriptsuperscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, assume that equation (2) holds for the random vector 𝐗Esubscript𝐗𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. Then

Pr(mini𝒟{Xi/wi}>u)=L(eu;𝒘)eλ(𝒘)u,u.formulae-sequencePrsubscript𝑖𝒟subscript𝑋𝑖subscript𝑤𝑖𝑢𝐿superscript𝑒𝑢𝒘superscript𝑒𝜆𝒘𝑢𝑢\Pr\left(\min_{i\in\mathcal{D}}\{X_{i}/w_{i}\}>u\right)=L(e^{u};\bm{w})e^{-% \lambda(\bm{w})u},\;\;u\to\infty.roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) = italic_L ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ; bold_italic_w ) italic_e start_POSTSUPERSCRIPT - italic_λ ( bold_italic_w ) italic_u end_POSTSUPERSCRIPT , italic_u → ∞ .

Proof of Proposition 2.6 is provided in Appendix B.5. Combining Proposition 2.6 and model (2) allows us to assess joint tail behaviour in the positive orthant +dsubscriptsuperscript𝑑\mathbb{R}^{d}_{+}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT through the ADF, λ()𝜆\lambda(\cdot)italic_λ ( ⋅ ). We further extend this model to dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT by considering an extended ADF, denoted by Λ()Λ\Lambda(\cdot)roman_Λ ( ⋅ ). Specifically, given any 𝒘𝒮d1𝒜𝒘superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A, where 𝒜:=i=1d{𝒘𝒮d1:wi=0}assign𝒜superscriptsubscript𝑖1𝑑conditional-set𝒘superscript𝒮𝑑1subscript𝑤𝑖0\mathcal{A}:=\bigcup_{i=1}^{d}\{\bm{w}\in\mathcal{S}^{d-1}:w_{i}=0\}caligraphic_A := ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT { bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT : italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 } is the intersection of 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT with each axis, we assume that

Pr(mini𝒟{Xi/wi}>u)=(eu;𝒘)eΛ(𝒘)u,u,formulae-sequencePrsubscript𝑖𝒟subscript𝑋𝑖subscript𝑤𝑖𝑢superscript𝑒𝑢𝒘superscript𝑒Λ𝒘𝑢𝑢\Pr\left(\min_{i\in\mathcal{D}}\{X_{i}/w_{i}\}>u\right)=\mathcal{L}(e^{u};\bm{% w})e^{-\Lambda(\bm{w})u},\;\;u\to\infty,roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) = caligraphic_L ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ; bold_italic_w ) italic_e start_POSTSUPERSCRIPT - roman_Λ ( bold_italic_w ) italic_u end_POSTSUPERSCRIPT , italic_u → ∞ , (6)

where (;𝒘)𝒘\mathcal{L}(\cdot;\bm{w})caligraphic_L ( ⋅ ; bold_italic_w ) is a slowly varying function and Λ()Λ\Lambda(\cdot)roman_Λ ( ⋅ ) is defined in Proposition 2.8. Note that the extended ADF corresponds with the copula exponent function introduced by Mackay and Jonathan, (2023) for data on uniform margins. The next proposition illustrates that, under mild conditions, the convergence in (6) is always achieved.

Proposition 2.7.

Assume that the conditions of Proposition 2.6 are satisfied for any random vector 𝐜𝐗𝐜𝐗\bm{c}\bm{X}bold_italic_c bold_italic_X, where 𝐜{1,1}d𝐜superscript11𝑑\bm{c}\in\{-1,1\}^{d}bold_italic_c ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Then equation (6) holds for any 𝐰𝒮d1𝒜𝐰superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A.

Proof of Proposition 2.7 is provided in Appendix B.6. Note that while the assumptions for Proposition 2.7 may seem restrictive, Wadsworth and Tawn, (2013) demonstrate through rigorous theoretical treatment that model (2) captures the joint tail structure of 𝑿Esubscript𝑿𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT for a wide variety of theoretical examples. Therefore, it is reasonable to assume this framework can capture the tail structure in every orthant.

Remark 4.

In the definition of our model (6), we purposely exclude any angles in 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT that intersect the axes, as the model is not well defined there. Take, e.g., 𝒘:=(1,0,,0)Tassign𝒘superscript100𝑇\bm{w}:=(1,0,\ldots,0)^{T}bold_italic_w := ( 1 , 0 , … , 0 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT; it is not clear whether one should consider the probability Pr(X1>u,Xi>0,i=2,,d)Prsubscript𝑋1𝑢subscript𝑋𝑖0𝑖2𝑑\Pr(X_{1}>u,X_{i}>0,i=2,\ldots,d)roman_Pr ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_u , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , italic_i = 2 , … , italic_d ) or Pr(X1>u,Xi<0,i=2,,d)Prsubscript𝑋1𝑢subscript𝑋𝑖0𝑖2𝑑\Pr(X_{1}>u,X_{i}<0,i=2,\ldots,d)roman_Pr ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_u , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 , italic_i = 2 , … , italic_d ) under the modelling framework. This results in discontinuities for the extended ADF at the axes; see Figure 2.

On the original exponential margins, Nolde and Wadsworth, (2022) show that the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G is linked to the ADF; see Section 1. As demonstrated by Proposition 2.8, an analogous relationship holds for the extended ADF.

Proposition 2.8.

Suppose equation (6) holds for any angle 𝐰𝒮d1𝒜𝐰superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A. Then Λ(𝐰)=𝐰×𝔯~𝐰1Λ𝐰subscriptnorm𝐰subscriptsuperscript~𝔯1𝐰\Lambda(\bm{w})=||\bm{w}||_{\infty}\times\tilde{\mathfrak{r}}^{-1}_{\bm{w}}roman_Λ ( bold_italic_w ) = | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT × over~ start_ARG fraktur_r end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT, where 𝔯~𝐰=max{𝔯[0,1]:𝔯~𝐰𝒢}subscript~𝔯𝐰:𝔯01𝔯subscript~𝐰𝒢\tilde{\mathfrak{r}}_{\bm{w}}=\max\{\mathfrak{r}\in[0,1]:\mathfrak{r}\tilde{% \mathcal{R}}_{\bm{w}}\cap\partial\mathcal{G}\neq\emptyset\}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT = roman_max { fraktur_r ∈ [ 0 , 1 ] : fraktur_r over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ } and ~𝐰:=i=1,,d𝒰wiassignsubscript~𝐰subscripttensor-product𝑖1𝑑subscript𝒰subscript𝑤𝑖\tilde{\mathcal{R}}_{\bm{w}}:=\bigotimes_{i=1,\ldots,d}\mathcal{U}_{w_{i}}over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT := ⨂ start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, with 𝒰wi:=[wi/𝐰,]assignsubscript𝒰subscript𝑤𝑖subscript𝑤𝑖subscriptnorm𝐰\mathcal{U}_{w_{i}}:=[w_{i}/||\bm{w}||_{\infty},\infty]caligraphic_U start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT := [ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , ∞ ] for wi>0subscript𝑤𝑖0w_{i}>0italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 and [,wi/𝐰]subscript𝑤𝑖subscriptnorm𝐰[-\infty,w_{i}/||\bm{w}||_{\infty}][ - ∞ , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ] for wi<0subscript𝑤𝑖0w_{i}<0italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0.

Proof of Proposition 2.8 is provided in Appendix B.7, alongside the illustrative Figure A1. Proposition 2.8 illustrates that the extended ADF can be obtained directly from the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G. Consequently, the extended ADF is linked to the gauge function, as demonstrated by the following corollary.

Corollary 2.3.

Suppose model (6) holds for all 𝐰𝒮d1𝒜𝐰superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A. Then, we have that

g(𝒘)Λ(𝒘)𝒘.𝑔𝒘Λ𝒘subscriptnorm𝒘g(\bm{w})\geq\Lambda(\bm{w})\geq||\bm{w}||_{\infty}.italic_g ( bold_italic_w ) ≥ roman_Λ ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT .

Proof of Corollary 2.3 is provided in Appendix B.8, with a visualisation provided in Figure 2. The blue lines in each panel of Figure 2 denote the sets {𝒘/Λ(𝒘):𝒘𝒮d1𝒜}conditional-set𝒘Λ𝒘𝒘superscript𝒮𝑑1𝒜\{\bm{w}/\Lambda(\bm{w}):\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}\}{ bold_italic_w / roman_Λ ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A }, corresponding to the intersection points {𝔯~𝒘~𝒘𝒢,𝒘𝒮d1𝒜\{\tilde{\mathfrak{r}}_{\bm{w}}\tilde{\mathcal{R}}_{\bm{w}}\cap\partial% \mathcal{G},\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}{ over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G , bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A}. Furthermore, from the first and third panels of Figure 2, one can clearly observe the discontinuities of the extended ADF at the axes, as discussed in Remark 4.

3 Inference

3.1 Overview

Here we describe our DeepGauge framework for modelling and estimating limit sets. Section 3.2 describes model assumptions for the conditional radii R(𝑾=𝒘)conditional𝑅𝑾𝒘R\mid(\bm{W}=\bm{w})italic_R ∣ ( bold_italic_W = bold_italic_w ), 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, through which we obtain estimates of the unit-level set G𝐺\partial G∂ italic_G. Section 3.3 describes our DeepGauge representation of gauge functions using neural networks. Section 3.4 covers estimation of the extended ADF and its use in probability estimation. Section 3.5 concludes with diagnostic tools for assessing goodness-of-fit.

3.2 Modelling assumptions for the conditional radii

Wadsworth and Campbell, (2024) demonstrate that for large radial values r𝑟ritalic_r, we have that fR𝑾(r𝒘)rd1exp{rg(𝒘)}proportional-tosubscript𝑓conditional𝑅𝑾conditional𝑟𝒘superscript𝑟𝑑1𝑟𝑔𝒘f_{R\mid\bm{W}}(r\mid\bm{w})\propto r^{d-1}\exp\{-rg(\bm{w})\}italic_f start_POSTSUBSCRIPT italic_R ∣ bold_italic_W end_POSTSUBSCRIPT ( italic_r ∣ bold_italic_w ) ∝ italic_r start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT roman_exp { - italic_r italic_g ( bold_italic_w ) }, where fR𝑾subscript𝑓conditional𝑅𝑾f_{R\mid\bm{W}}italic_f start_POSTSUBSCRIPT italic_R ∣ bold_italic_W end_POSTSUBSCRIPT denotes the density function of R𝑾conditional𝑅𝑾R\mid\bm{W}italic_R ∣ bold_italic_W. This implies that the upper tail of R(𝑾=𝒘)conditional𝑅𝑾𝒘R\mid(\bm{W}=\bm{w})italic_R ∣ ( bold_italic_W = bold_italic_w ) should follow the form of a gamma kernel. Note that this holds for 𝑿𝑿\bm{X}bold_italic_X on both exponential and Laplace margins. To accommodate this form, Wadsworth and Campbell, (2024) propose the modelling assumption:

R(𝑾=𝒘,R>rτ(𝒘))truncGamma(α,g(𝒘)),similar-toconditional𝑅formulae-sequence𝑾𝒘𝑅subscript𝑟𝜏𝒘truncGamma𝛼𝑔𝒘R\mid(\bm{W}=\bm{w},R>r_{\tau}(\bm{w}))\sim\text{truncGamma}(\alpha,g(\bm{w})),italic_R ∣ ( bold_italic_W = bold_italic_w , italic_R > italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) ) ∼ truncGamma ( italic_α , italic_g ( bold_italic_w ) ) , (7)

where ‘truncGamma’ denotes a truncated Gamma distribution with shape and rate parameters α>0𝛼0\alpha>0italic_α > 0 and g(𝒘)>0𝑔𝒘0g(\bm{w})>0italic_g ( bold_italic_w ) > 0, respectively, and rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) denotes the τ𝜏\tauitalic_τ-quantile of R(𝑾=𝒘)conditional𝑅𝑾𝒘R\mid(\bm{W}=\bm{w})italic_R ∣ ( bold_italic_W = bold_italic_w ) for some τ(0,1)𝜏01\tau\in(0,1)italic_τ ∈ ( 0 , 1 ) close to 1, that is, Pr{Rrτ(𝒘)𝑾=𝒘}=τPr𝑅conditionalsubscript𝑟𝜏𝒘𝑾𝒘𝜏\Pr\{R\leq r_{\tau}(\bm{w})\mid\bm{W}=\bm{w}\}=\tauroman_Pr { italic_R ≤ italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) ∣ bold_italic_W = bold_italic_w } = italic_τ for all 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. Using form (7), we can use maximum likelihood techniques to estimate the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ), which is viewed as a non-stationary rate parameter of the truncated gamma distribution.

Through rigorous theoretical treatment, Wadsworth and Campbell, (2024) show that equation (7) is a valid modelling assumption for a wide range of parametric copulas, including the three illustrated in Figure 2. For most examples, the true shape parameter is α=d𝛼𝑑\alpha=ditalic_α = italic_d. However, to increase the flexibility of their model, Wadsworth and Campbell, (2024) permit estimation of this parameter; the DeepGauge framework follows suit.

3.3 Estimation of the gauge function using neural networks

To overcome the limited flexibility of existing geometric modelling approaches, we model the rescaled gauge function (g~~𝑔\tilde{g}over~ start_ARG italic_g end_ARG; see Corollary 2.1) using neural networks. Full inference requires the construction of two models: one for the radial threshold, rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) in (7), and one for an (unscaled) gauge function g(𝒘)𝑔𝒘{g}(\bm{w})italic_g ( bold_italic_w ) satisfying g(𝒘)𝒘𝑔𝒘subscriptnorm𝒘{g}(\bm{w})\geq||\bm{w}||_{\infty}italic_g ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT for all 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT (see Proposition 2.2). With the latter, we can employ the transformation in Corollary 2.1 to produce a model for the rescaled gauge function, g~(𝒘)~𝑔𝒘\tilde{g}(\bm{w})over~ start_ARG italic_g end_ARG ( bold_italic_w ), that provides a valid unit-level set.

We model the radial threshold rτ:𝒮d1+:subscript𝑟𝜏maps-tosuperscript𝒮𝑑1subscriptr_{\tau}:\mathcal{S}^{d-1}\mapsto\mathbb{R}_{+}italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT using a multi-layer perceptron (MLP) with rectified linear unit (ReLU) activation functions, denoted by m𝝍:𝒮d1+:subscript𝑚𝝍maps-tosuperscript𝒮𝑑1subscriptm_{\bm{\psi}}:\mathcal{S}^{d-1}\mapsto\mathbb{R}_{+}italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT : caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and parameterised by the set 𝝍𝝍\bm{\psi}bold_italic_ψ. For brevity, we provide details of the construction in Appendix A.1; see, also, Richards and Huser, (2024) for a review of these models and their inference. We similarly represent the (unscaled) gauge function g(𝒘)𝑔𝒘{g}(\bm{w})italic_g ( bold_italic_w ) using an MLP. To ensure that g()𝑔{g}(\cdot)italic_g ( ⋅ ) satisfies g(𝒘)𝒘𝑔𝒘subscriptnorm𝒘{g}(\bm{w})\geq\|\bm{w}\|_{\infty}italic_g ( bold_italic_w ) ≥ ∥ bold_italic_w ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT for all 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, we let g(𝒘)=ReLU{m𝝍(𝒘)}+𝒘𝑔𝒘ReLUsubscript𝑚𝝍𝒘subscriptnorm𝒘{g}(\bm{w})=\text{ReLU}\{m_{\bm{\psi}}(\bm{w})\}+\|\bm{w}\|_{\infty}italic_g ( bold_italic_w ) = ReLU { italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( bold_italic_w ) } + ∥ bold_italic_w ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, where ReLU(𝒙)=(max{x1,0},max{x2,0},)T\bm{x})=(\max\{x_{1},0\},\max\{x_{2},0\},\dots)^{T}bold_italic_x ) = ( roman_max { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 } , roman_max { italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 } , … ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for a vector 𝒙=(x1,x2,)T𝒙superscriptsubscript𝑥1subscript𝑥2𝑇\bm{x}=(x_{1},x_{2},\dots)^{T}bold_italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT of finite (unspecified) length. Note that we share neither parameters nor architectures between the two neural networks determining rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) and g(𝒘)𝑔𝒘{g}(\bm{w})italic_g ( bold_italic_w ).

To transform g()𝑔g(\cdot)italic_g ( ⋅ ) to g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) requires evaluation of all scaling factors biUsuperscriptsubscript𝑏𝑖𝑈b_{i}^{U}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT and biL,i=1,,d,formulae-sequencesuperscriptsubscript𝑏𝑖𝐿𝑖1𝑑b_{i}^{L},i=1,\dots,d,italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT , italic_i = 1 , … , italic_d , in (4). We do so numerically using a sample of angles, denoted by 𝒲𝒲\mathcal{W}caligraphic_W, that provides a dense coverage of the (d1)𝑑1(d-1)( italic_d - 1 )-sphere. In practice, we simulate |𝒲|=106𝒲superscript106|\mathcal{W}|=10^{6}| caligraphic_W | = 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT points using the rejection sampling algorithm of Neumann, (1951).

We fit or train the neural network rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) via minimisation of some loss function, denoted by (r,rτ)𝑟subscript𝑟𝜏\ell(r,r_{\tau})roman_ℓ ( italic_r , italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ). Consider the set {(rj,𝒘j)}j=1nsuperscriptsubscriptsubscript𝑟𝑗subscript𝒘𝑗𝑗1𝑛\{(r_{j},\bm{w}_{j})\}_{j=1}^{n}{ ( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where rjsubscript𝑟𝑗r_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝒘jsubscript𝒘𝑗\bm{w}_{j}bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are observations of R𝑅Ritalic_R and 𝑾𝑾\bm{W}bold_italic_W, respectively. Optimal estimates of the neural network parameters, denoted 𝝍^^𝝍\widehat{\bm{\psi}}over^ start_ARG bold_italic_ψ end_ARG, can be found by solving the minimisation problem

𝝍^argmin𝝍1nj=1n(rj,rτ(𝒘j)),^𝝍subscriptargmin𝝍1𝑛subscriptsuperscript𝑛𝑗1subscript𝑟𝑗subscript𝑟𝜏subscript𝒘𝑗\widehat{\bm{\psi}}\in\operatorname*{arg\,min}\limits_{\bm{\psi}}\frac{1}{n}% \sum^{n}_{j=1}\ell(r_{j},r_{\tau}(\bm{w}_{j})),over^ start_ARG bold_italic_ψ end_ARG ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT roman_ℓ ( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) , (8)

where we have suppressed the dependency of rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) on 𝝍𝝍\bm{\psi}bold_italic_ψ. As rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) denotes the τ𝜏\tauitalic_τ-quantile of R(𝑾=𝒘)conditional𝑅𝑾𝒘R\mid(\bm{W}=\bm{w})italic_R ∣ ( bold_italic_W = bold_italic_w ), the most appropriate choice of loss function is the tilted loss, given by l(r,rτ)=ρτ(rrτ)𝑙𝑟subscript𝑟𝜏subscript𝜌𝜏𝑟subscript𝑟𝜏l(r,r_{\tau})=\rho_{\tau}(r-r_{\tau})italic_l ( italic_r , italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) = italic_ρ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_r - italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) for ρτ(z):=z(τ𝟙{z<0})assignsubscript𝜌𝜏𝑧𝑧𝜏1𝑧0\rho_{\tau}(z):=z(\tau-\mathbbm{1}\{z<0\})italic_ρ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) := italic_z ( italic_τ - blackboard_1 { italic_z < 0 } ) (Koenker et al.,, 2017). We can also define a suitable loss function and minimisation problem to train the deep representation of the rescaled gauge function, g~(𝒘).~𝑔𝒘\tilde{g}(\bm{w}).over~ start_ARG italic_g end_ARG ( bold_italic_w ) . This model can be described as a conditional density network estimation (see, e.g., Rothfuss et al.,, 2019), where the loss function is the negative log-likelihood associated with the truncated gamma model defined in (7); note that this is dependent on both the exceedance threshold rτ(𝒘j)subscript𝑟𝜏subscript𝒘𝑗r_{\tau}(\bm{w}_{j})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and a scaling parameter, α>0𝛼0\alpha>0italic_α > 0. To fit the gauge function model, we replace (,)\ell(\cdot,\cdot)roman_ℓ ( ⋅ , ⋅ ) in (8) with

{rj,g~(𝒘j),α;rτ(𝒘j)}=𝟙{rj>rτ(𝒘j)}subscript𝑟𝑗~𝑔subscript𝒘𝑗𝛼subscript𝑟𝜏subscript𝒘𝑗1subscript𝑟𝑗subscript𝑟𝜏subscript𝒘𝑗\displaystyle\ell\{r_{j},\tilde{g}(\bm{w}_{j}),\alpha;r_{\tau}(\bm{w}_{j})\}=-% \mathbbm{1}\{r_{j}>r_{\tau}(\bm{w}_{j})\}roman_ℓ { italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_g end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_α ; italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } = - blackboard_1 { italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } [αlog{g~(𝒘j)}+(α1)log(rj)rjg~(𝒘j)\displaystyle\big{[}\alpha\log\{\tilde{g}(\bm{w}_{j})\}+(\alpha-1)\log(r_{j})-% r_{j}\tilde{g}(\bm{w}_{j})[ italic_α roman_log { over~ start_ARG italic_g end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } + ( italic_α - 1 ) roman_log ( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
log{Γ(α)}log{Q(α,g~(𝒘j)rτ(𝒘j))}],\displaystyle-\log\{\Gamma(\alpha)\}-\log\{Q(\alpha,\tilde{g}(\bm{w}_{j})r_{% \tau}(\bm{w}_{j}))\}\big{]},- roman_log { roman_Γ ( italic_α ) } - roman_log { italic_Q ( italic_α , over~ start_ARG italic_g end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) } ] , (9)

where Q(α,z)=Γ(α,z)/Γ(z)𝑄𝛼𝑧Γ𝛼𝑧Γ𝑧Q(\alpha,z)=\Gamma(\alpha,z)/\Gamma(z)italic_Q ( italic_α , italic_z ) = roman_Γ ( italic_α , italic_z ) / roman_Γ ( italic_z ) for Γ(α,z)=ztα1exp(t)𝑑tΓ𝛼𝑧subscriptsuperscript𝑧superscript𝑡𝛼1𝑡differential-d𝑡\Gamma(\alpha,z)=\int^{\infty}_{z}t^{\alpha-1}\exp(-t)\,dtroman_Γ ( italic_α , italic_z ) = ∫ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_t ) italic_d italic_t, and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) is the rescaled gauge function described above. Note that, through an abuse of notation, we have suppressed dependency of 𝝍𝝍\bm{\psi}bold_italic_ψ on α𝛼\alphaitalic_α, but this parameter is estimated concurrently with the parameters that comprise the neural network.

Full inference for our framework is performed in a two-stage fashion. We first train a model for rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) and derive its estimate, r^τ(𝒘)subscript^𝑟𝜏𝒘\hat{r}_{\tau}(\bm{w})over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ). This estimate is then using in a subsequent training step for the rescaled gauge function g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) and α𝛼\alphaitalic_α, by replacing rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) in (3.3) with its estimated counterpart. For details of the algorithms and regularisation techniques used for training of both models, as well as practical advice for pre-training (Goodfellow et al.,, 2016) of the models using initial estimates for rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ), see Appendix A.2.

3.4 Estimating the extended ADF

We now adapt the approach of Simpson and Tawn, 2024a , for estimation of the classical ADF, to permit estimation of our extended ADF, Λ()Λ\Lambda(\cdot)roman_Λ ( ⋅ ). First, given a sample of angles {𝒘j}j=1nsuperscriptsubscriptsubscript𝒘𝑗𝑗1𝑛\{\bm{w}_{j}\}_{j=1}^{n}{ bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we define the corresponding point estimates for the unit-level set by {𝒙~j}i=1nsubscriptsuperscriptsubscript~𝒙𝑗𝑛𝑖1\{\tilde{\bm{x}}_{j}\}^{n}_{i=1}{ over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT where 𝒙~j:=(x~j,1,,x~j,d)T=𝒘j/g~(𝒘j)𝒢~assignsubscript~𝒙𝑗superscriptsubscript~𝑥𝑗1subscript~𝑥𝑗𝑑𝑇subscript𝒘𝑗~𝑔subscript𝒘𝑗~𝒢{\tilde{\bm{x}}_{j}:=(\tilde{x}_{j,1},\ldots,\tilde{x}_{j,d})^{T}=\bm{w}_{j}/% \tilde{g}(\bm{w}_{j})\in\widetilde{\partial\mathcal{G}}}over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j , 1 end_POSTSUBSCRIPT , … , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / over~ start_ARG italic_g end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ over~ start_ARG ∂ caligraphic_G end_ARG for j=1,,n𝑗1𝑛j=1,\ldots,nitalic_j = 1 , … , italic_n. Then, for any angle 𝒘𝒮d1𝒜𝒘superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A at which we wish to estimate Λ(𝒘)Λ𝒘\Lambda(\bm{w})roman_Λ ( bold_italic_w ), we consider the sample {𝒙~j}i=1nsubscriptsuperscriptsubscript~𝒙𝑗𝑛𝑖1\{\tilde{\bm{x}}_{j}\}^{n}_{i=1}{ over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT to be candidates for the intersection of 𝒢~~𝒢\widetilde{\partial\mathcal{G}}over~ start_ARG ∂ caligraphic_G end_ARG and the scaled-back set ~𝒘subscript~𝒘\tilde{\mathcal{R}}_{\bm{w}}over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT (see Proposition 2.8). The corresponding scaling coefficient 𝔯~𝒘subscript~𝔯𝒘\tilde{\mathfrak{r}}_{\bm{w}}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT must satisfy 𝔯~𝒘𝔯~jsubscript~𝔯𝒘subscript~𝔯𝑗\tilde{\mathfrak{r}}_{\bm{w}}\geq\tilde{\mathfrak{r}}_{j}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ≥ over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n, where 𝔯~j:=𝒘mini{x~j,i/wi}assignsubscript~𝔯𝑗subscriptnorm𝒘subscript𝑖subscript~𝑥𝑗𝑖subscript𝑤𝑖\tilde{\mathfrak{r}}_{j}:=||\bm{w}||_{\infty}\min_{i}\{\tilde{x}_{j,i}/w_{i}\}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }; hence, we approximate 𝔯~𝒘subscript~𝔯𝒘\tilde{\mathfrak{r}}_{\bm{w}}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT as 𝔯~𝒘maxj=1,,n{𝔯~j}subscript~𝔯𝒘subscript𝑗1𝑛subscript~𝔯𝑗\tilde{\mathfrak{r}}_{\bm{w}}\approx\max_{j=1,\dots,n}\{\tilde{\mathfrak{r}}_{% j}\}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ≈ roman_max start_POSTSUBSCRIPT italic_j = 1 , … , italic_n end_POSTSUBSCRIPT { over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }. Recalling Proposition 2.8, it follows that an estimate Λ^(𝒘)^Λ𝒘\hat{\Lambda}(\bm{w})over^ start_ARG roman_Λ end_ARG ( bold_italic_w ) of the extended ADF at 𝒘𝒘\bm{w}bold_italic_w is Λ^(𝒘)=𝒘/maxj{𝔯~j}^Λ𝒘subscriptnorm𝒘subscript𝑗subscript~𝔯𝑗\hat{\Lambda}(\bm{w})=||\bm{w}||_{\infty}/\max_{j}\{\tilde{\mathfrak{r}}_{j}\}over^ start_ARG roman_Λ end_ARG ( bold_italic_w ) = | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT / roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT { over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }; see Simpson and Tawn, 2024a for further details. Furthermore, since the estimated unit-level set 𝒢~~𝒢\widetilde{\partial\mathcal{G}}over~ start_ARG ∂ caligraphic_G end_ARG satisfies all of the required theoretical properties for 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, the resulting estimate Λ^()^Λ\hat{\Lambda}(\cdot)over^ start_ARG roman_Λ end_ARG ( ⋅ ) must also satisfy the properties of the extended ADF.

Estimates Λ^(𝒘),𝒘𝒮d1𝒜,^Λ𝒘𝒘superscript𝒮𝑑1𝒜\hat{\Lambda}(\bm{w}),\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A},over^ start_ARG roman_Λ end_ARG ( bold_italic_w ) , bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A , can be used to obtain probability estimates for a wide variety of joint tail regions of 𝑿𝑿\bm{X}bold_italic_X. The estimation scheme outlined below avoids the need to sample from, or model, the distribution of the angles 𝑾𝑾\bm{W}bold_italic_W, as has been considered in other probability estimation schemes using the geometric representation (Papastathopoulos et al.,, 2024; Wadsworth and Campbell,, 2024). This feature is practically advantageous as estimation of the angular distribution alongside the limit set introduces additional modelling uncertainty into our framework, and many existing techniques for non-parametric density estimation are only applicable in low dimensional settings (Ruzgas et al.,, 2021).

To begin, let 𝒙d𝒙superscript𝑑\bm{x}\in\mathbb{R}^{d}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be such that r:=𝒙assign𝑟norm𝒙r:=||\bm{x}||italic_r := | | bold_italic_x | | is large and define the corresponding angle 𝒘:=𝒙/rassign𝒘𝒙𝑟\bm{w}:=\bm{x}/rbold_italic_w := bold_italic_x / italic_r. Further define the univariate structure variable T𝒘:=mini=1,,d{Xi/wi}assignsubscript𝑇𝒘subscript𝑖1𝑑subscript𝑋𝑖subscript𝑤𝑖T_{\bm{w}}:=\min_{i=1,\dots,d}\{X_{i}/w_{i}\}italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, and let u𝑢uitalic_u denote a quantile of T𝒘subscript𝑇𝒘T_{\bm{w}}italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT satisfying q:=Pr(T𝒘u)<Pr(T𝒘r)assign𝑞Prsubscript𝑇𝒘𝑢Prsubscript𝑇𝒘𝑟q:=\Pr(T_{\bm{w}}\leq u)<\Pr(T_{\bm{w}}\leq r)italic_q := roman_Pr ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ≤ italic_u ) < roman_Pr ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ≤ italic_r ), with q𝑞qitalic_q close to 1 and u>0𝑢0u>0italic_u > 0 large. Equation (6) implies that

Pr(sgn(xi)Xi>sgn(xi)xi,i=1,,d)=Pr(T𝒘>r)Prsgnsubscript𝑥𝑖subscript𝑋𝑖sgnsubscript𝑥𝑖subscript𝑥𝑖𝑖1𝑑Prsubscript𝑇𝒘𝑟\displaystyle\Pr(\text{sgn}(x_{i})X_{i}>\text{sgn}(x_{i})x_{i},i=1,\ldots,d)=% \Pr(T_{\bm{w}}>r)roman_Pr ( sgn ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > sgn ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_d ) = roman_Pr ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT > italic_r ) (10)
=Pr(T𝒘>rT𝒘>u)Pr(T𝒘>u)exp{Λ^(𝒘)(ru)}(1q).absentPrsubscript𝑇𝒘𝑟ketsubscript𝑇𝒘𝑢Prsubscript𝑇𝒘𝑢^Λ𝒘𝑟𝑢1𝑞\displaystyle=\Pr(T_{\bm{w}}>r\mid T_{\bm{w}}>u)\Pr(T_{\bm{w}}>u)\approx\exp\{% -\hat{\Lambda}(\bm{w})(r-u)\}(1-q).= roman_Pr ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT > italic_r ∣ italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT > italic_u ) roman_Pr ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT > italic_u ) ≈ roman_exp { - over^ start_ARG roman_Λ end_ARG ( bold_italic_w ) ( italic_r - italic_u ) } ( 1 - italic_q ) . (11)

Figure 4 illustrates examples of joint tail regions in the two dimensional setting that can be estimated using the framework described in equation (6). Observe that these regions are more general than many inference procedures for multivariate extremes, which just focus on the joint survivor or cumulative distribution function (e.g., Ledford and Tawn,, 1996; Ramos and Ledford,, 2009; Cooley et al.,, 2019).

Refer to caption
Figure 4: Example probability regions that can be evaluated using the model described in equation (6), with data simulated from a Gaussian copula on Laplace margins. The blue, green, red, and orange regions denote the sets [5,)×[5,)55[5,\infty)\times[5,\infty)[ 5 , ∞ ) × [ 5 , ∞ ), (,1]×[5,)15(-\infty,-1]\times[5,\infty)( - ∞ , - 1 ] × [ 5 , ∞ ), (,7]×(,0.5]70.5(-\infty,-7]\times(-\infty,-0.5]( - ∞ , - 7 ] × ( - ∞ , - 0.5 ], and [4,)×(,6]46[4,\infty)\times(-\infty,-6][ 4 , ∞ ) × ( - ∞ , - 6 ], respectively.

3.5 Assessing goodness-of-fit for DeepGauge model fits

We now discuss several diagnostic tools for assessing goodness-of-fit under the DeepGauge modelling framework, which we utilise for the case study in Section 5. The first three tools correspond to quantile-quantile (QQ) plots on a unit exponential scale. Comparing quantiles on this scale is common within extreme value analyses, as it allows one to assess how well a given model captures the most extreme observations (see, e.g., Coles,, 2001; Heffernan and Tawn,, 2001). The latter two tools described below are for visualisation of unit-level sets and the extended ADF in low (d3𝑑3d\leq 3italic_d ≤ 3) and high (d>3𝑑3d>3italic_d > 3) dimensional settings.

Truncated gamma QQ plot:

We consider the diagnostic proposed by Wadsworth and Campbell, (2024) to assess the validity of the modelling assumption described in equation (7). Here, the fitted truncated gamma model is used to transform all of the threshold-exceeding observations of RR>rτ(𝒘)𝑅ket𝑅subscript𝑟𝜏𝒘R\mid R>r_{\tau}(\bm{w})italic_R ∣ italic_R > italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) to a unit exponential scale, and we compare the observed and theoretical quantiles via a QQ plot. This provides a global diagnostic for the model fit over the entire angular domain.

Extended ADF diagnostic:

We adapt the diagnostic proposed by Murphy-Barltrop et al., 2024b to assess goodness-of-fit with respect to the estimated extended ADF. Note that equation (6) can be reformulated as follows: given any angle 𝒘𝒮d1𝒜𝒘superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A, we have that T𝒘uExp(Λ(𝒘))similar-tosubscriptsuperscript𝑇𝑢𝒘ExpΛ𝒘{T^{u}_{\bm{w}}\sim\text{Exp}(\Lambda(\bm{w}))}italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∼ Exp ( roman_Λ ( bold_italic_w ) ) as u𝑢u\to\inftyitalic_u → ∞, for exceedances T𝒘u:=(T𝒘u)(T𝒘>u)assignsubscriptsuperscript𝑇𝑢𝒘conditionalsubscript𝑇𝒘𝑢subscript𝑇𝒘𝑢T^{u}_{\bm{w}}:=(T_{\bm{w}}-u)\mid(T_{\bm{w}}>u)italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT := ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT - italic_u ) ∣ ( italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT > italic_u ) with T𝒘subscript𝑇𝒘T_{\bm{w}}italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT defined in Section 3.2. Therefore, for sufficiently large u𝑢uitalic_u, the random variable E:=log[1(1exp{Λ(𝒘)T𝒘u})]=Λ(𝒘)T𝒘uassign𝐸11Λ𝒘subscriptsuperscript𝑇𝑢𝒘Λ𝒘subscriptsuperscript𝑇𝑢𝒘E:=-\log[1-(1-\exp\{-\Lambda(\bm{w})T^{u}_{\bm{w}}\})]=\Lambda(\bm{w})T^{u}_{% \bm{w}}italic_E := - roman_log [ 1 - ( 1 - roman_exp { - roman_Λ ( bold_italic_w ) italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT } ) ] = roman_Λ ( bold_italic_w ) italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT follows a unit exponential distribution, independent of the choice of 𝒘𝒘\bm{w}bold_italic_w. Murphy-Barltrop et al., 2024b exploit this fact and compute min-projection exceedances over the angular observations, before transforming these exceedances to the standard exponential scale. One can then evaluate the alignment of the exponential quantiles on a QQ plot. This results in a global diagnostic and allows us to test the validity of the modelling assumption described in equation (6). The corresponding algorithm for computing the diagnostic is given in Appendix B.9.

Return level sets and probabilities:

Using our fitted model, we can obtain estimates of return level sets, which provide a summary of the joint extremal behaviour of random vectors and are used extensively in practice for design sensitivity analysis (see, e.g., Mackay and Haselsteiner,, 2021; Papastathopoulos et al.,, 2024; Simpson and Tawn, 2024b, ). For a probability p𝑝pitalic_p, a return level set is defined as any region dsuperscript𝑑\mathcal{B}\subset\mathbb{R}^{d}caligraphic_B ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT satisfying Pr(𝑿)=pPr𝑿𝑝\Pr(\bm{X}\in\mathcal{B})=proman_Pr ( bold_italic_X ∈ caligraphic_B ) = italic_p. When \mathcal{B}caligraphic_B is centred at the origin 𝟎dsubscript0𝑑\bm{0}_{d}bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, it can be obtained directly from the modelling framework outlined in Section 3.2; for each angle 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, set

rp(𝒘)=FΓ1[(pτ1τ)×F¯Γ(rτ(𝒘);α,g(𝒘))+FΓ(rτ(𝒘);α,g(𝒘));α,g(𝒘)],subscript𝑟𝑝𝒘subscriptsuperscript𝐹1Γ𝑝𝜏1𝜏subscript¯𝐹Γsubscript𝑟𝜏𝒘𝛼𝑔𝒘subscript𝐹Γsubscript𝑟𝜏𝒘𝛼𝑔𝒘𝛼𝑔𝒘r_{p}(\bm{w})=F^{-1}_{\Gamma}\left[\left(\frac{p-\tau}{1-\tau}\right)\times% \bar{F}_{\Gamma}\left(r_{\tau}\left(\bm{w}\right);\alpha,g\left(\bm{w}\right)% \right)+F_{\Gamma}\left(r_{\tau}\left(\bm{w}\right);\alpha,g\left(\bm{w}\right% )\right);\alpha,g\left(\bm{w}\right)\right],italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_w ) = italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT [ ( divide start_ARG italic_p - italic_τ end_ARG start_ARG 1 - italic_τ end_ARG ) × over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) ; italic_α , italic_g ( bold_italic_w ) ) + italic_F start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) ; italic_α , italic_g ( bold_italic_w ) ) ; italic_α , italic_g ( bold_italic_w ) ] ,

where FΓ()subscript𝐹ΓF_{\Gamma}(\cdot)italic_F start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT ( ⋅ ) denotes the gamma distribution function. This corresponds to the estimated p𝑝pitalic_p-quantile of the conditional distribution R(𝑾=𝒘)conditional𝑅𝑾𝒘R\mid(\bm{W}=\bm{w})italic_R ∣ ( bold_italic_W = bold_italic_w ) under model (7). Using these radial quantiles, define the Cartesian set

p:={𝒙d{𝟎d}:𝒙rp(𝒙/𝒙)}{𝟎d}.assignsubscript𝑝conditional-set𝒙superscript𝑑subscript0𝑑norm𝒙subscript𝑟𝑝𝒙norm𝒙subscript0𝑑\mathcal{B}_{p}:=\left\{\bm{x}\in\mathbb{R}^{d}\setminus\{\bm{0}_{d}\}:||\bm{x% }||\leq r_{p}\left(\bm{x}/||\bm{x}||\right)\right\}\bigcup\big{\{}\bm{0}_{d}% \big{\}}.caligraphic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT := { bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } : | | bold_italic_x | | ≤ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_x / | | bold_italic_x | | ) } ⋃ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } .

Assuming unbiased estimation, the total law of probability implies that Pr(𝑿p)=pPr𝑿subscript𝑝𝑝\Pr(\bm{X}\in\mathcal{B}_{p})=proman_Pr ( bold_italic_X ∈ caligraphic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) = italic_p (Papastathopoulos et al.,, 2024). To assess the accuracy of return level set estimates, we propose the following diagnostic. First, observe that for any observation 𝒙j𝟎dsubscript𝒙𝑗subscript0𝑑\bm{x}_{j}\neq\bm{0}_{d}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, we have 𝒙jp𝒙jrp(𝒙j/𝒙j)iffsubscript𝒙𝑗subscript𝑝normsubscript𝒙𝑗subscript𝑟𝑝subscript𝒙𝑗normsubscript𝒙𝑗\bm{x}_{j}\in\mathcal{B}_{p}\iff||\bm{x}_{j}||\leq r_{p}(\bm{x}_{j}/||\bm{x}_{% j}||)bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ⇔ | | bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | ≤ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / | | bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | ). Consequently, for a sample of non-zero observations {𝒙j}j=1nsubscriptsuperscriptsubscript𝒙𝑗𝑛𝑗1\{\bm{x}_{j}\}^{n}_{j=1}{ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT, an empirical estimate p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG of Pr(𝑿p)Pr𝑿subscript𝑝\Pr(\bm{X}\in\mathcal{B}_{p})roman_Pr ( bold_italic_X ∈ caligraphic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) is p^:=(1/n)j=1n𝟙(𝒙jrp(𝒙j/𝒙j))assign^𝑝1𝑛superscriptsubscript𝑗1𝑛1normsubscript𝒙𝑗subscript𝑟𝑝subscript𝒙𝑗normsubscript𝒙𝑗\hat{p}:=(1/n)\sum_{j=1}^{n}\mathbbm{1}(||\bm{x}_{j}||\leq r_{p}(\bm{x}_{j}/||% \bm{x}_{j}||))over^ start_ARG italic_p end_ARG := ( 1 / italic_n ) ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 ( | | bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | ≤ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / | | bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | ) ). Uncertainty intervals for p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG can be obtained through bootstrap** the data sample. Our diagnostic is then to plot the pairs (log(1p),log(1p^))1𝑝1^𝑝(-\log(1-p),-\log(1-\hat{p}))( - roman_log ( 1 - italic_p ) , - roman_log ( 1 - over^ start_ARG italic_p end_ARG ) ), alongside tolerance bounds, for a subset of increasing probabilities close to 1. This approach for assessing goodness-of-fit provides a multivariate extension of ‘return level plots’; see, e.g., Coles, (2001).

Three-dimensional unit-level and extended ADF sets:

For d3𝑑3d\leq 3italic_d ≤ 3, we can plot the scaled sample clouds {𝒙j/log(n/2):j=1,,n}conditional-setsubscript𝒙𝑗𝑛2𝑗1𝑛\{\bm{x}_{j}/\log(n/2):j=1,\ldots,n\}{ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / roman_log ( italic_n / 2 ) : italic_j = 1 , … , italic_n } against the estimated unit-level and extended ADF sets; see, e.g., Figure 2. For large enough n𝑛nitalic_n, we would expect the scaled sample clouds to lie approximately within the interiors of the estimated unit-level and extended ADF sets. Plotting scaled observations against the estimated unit-level sets has also been used for validation by, e.g., Majumder et al., 2024b ; Papastathopoulos et al., (2024). In addition to being a visual indicator of how well the estimated shapes capture the complex extremal dependence features of data, one can also use these plots to verify that the estimated unit-level sets satisfy all of the theoretical properties discussed in Section 1.

Bivariate unit-level set slices:

For d4𝑑4d\geq 4italic_d ≥ 4, we cannot visualise unit-level sets. Furthermore, interpretation is challenging for d=3𝑑3d=3italic_d = 3 unless one can freely alter the perspective angle, e.g., using computational software. To account for this shortcoming, we propose considering bivariate slices of the estimated unit-level sets. Specifically, given indices (𝔦,𝔧),𝔦𝔧(\mathfrak{i},\mathfrak{j}),( fraktur_i , fraktur_j ) , with 1𝔦<𝔧d1𝔦𝔧𝑑1\leq\mathfrak{i}<\mathfrak{j}\leq d1 ≤ fraktur_i < fraktur_j ≤ italic_d, consider the set of points

𝒢~𝔦,𝔧:={(w𝔦,w𝔧)/g~(𝒘):𝒘𝒮d1,wk=0,k𝒟{𝔦,𝔧}}[1,1]2.assignsubscript~𝒢𝔦𝔧conditional-setsubscript𝑤𝔦subscript𝑤𝔧~𝑔𝒘formulae-sequence𝒘superscript𝒮𝑑1formulae-sequencesubscript𝑤𝑘0𝑘𝒟𝔦𝔧superscript112\widetilde{\partial\mathcal{G}}_{\mathfrak{i},\mathfrak{j}}:=\left\{(w_{% \mathfrak{i}},w_{\mathfrak{j}})/\tilde{g}(\bm{w}):\bm{w}\in\mathcal{S}^{d-1},w% _{k}=0,k\in\mathcal{D}\setminus\{\mathfrak{i},\mathfrak{j}\}\right\}\subset[-1% ,1]^{2}.over~ start_ARG ∂ caligraphic_G end_ARG start_POSTSUBSCRIPT fraktur_i , fraktur_j end_POSTSUBSCRIPT := { ( italic_w start_POSTSUBSCRIPT fraktur_i end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT fraktur_j end_POSTSUBSCRIPT ) / over~ start_ARG italic_g end_ARG ( bold_italic_w ) : bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT , italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 , italic_k ∈ caligraphic_D ∖ { fraktur_i , fraktur_j } } ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

It is important to note that 𝒢~𝔦,𝔧subscript~𝒢𝔦𝔧\widetilde{\partial\mathcal{G}}_{\mathfrak{i},\mathfrak{j}}over~ start_ARG ∂ caligraphic_G end_ARG start_POSTSUBSCRIPT fraktur_i , fraktur_j end_POSTSUBSCRIPT is not the bivariate unit-level set for the vector (X𝔦,X𝔧)subscript𝑋𝔦subscript𝑋𝔧(X_{\mathfrak{i}},X_{\mathfrak{j}})( italic_X start_POSTSUBSCRIPT fraktur_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT fraktur_j end_POSTSUBSCRIPT ); rather, 𝒢~𝔦,𝔧subscript~𝒢𝔦𝔧\widetilde{\partial\mathcal{G}}_{\mathfrak{i},\mathfrak{j}}over~ start_ARG ∂ caligraphic_G end_ARG start_POSTSUBSCRIPT fraktur_i , fraktur_j end_POSTSUBSCRIPT is a bivariate projection from the subset of 𝒢~~𝒢\widetilde{\partial\mathcal{G}}over~ start_ARG ∂ caligraphic_G end_ARG for which all angles indexed by 𝒟{𝔦,𝔧}𝒟𝔦𝔧\mathcal{D}\setminus\{\mathfrak{i},\mathfrak{j}\}caligraphic_D ∖ { fraktur_i , fraktur_j } are equal to 0. This projection is illustrated in Figure 5 for a three-dimensional unit-level set with (𝔦,𝔧)=(1,2)𝔦𝔧12(\mathfrak{i},\mathfrak{j})=(1,2)( fraktur_i , fraktur_j ) = ( 1 , 2 ).

Refer to caption
Refer to caption
Figure 5: Pedagogical example of a bivariate slice from a three-dimensional unit-level set.

In practice, we observe very few angles in the region {𝒘𝒮d1:wk=0,k𝒟{𝔦,𝔧}}conditional-set𝒘superscript𝒮𝑑1formulae-sequencesubscript𝑤𝑘0𝑘𝒟𝔦𝔧\{\bm{w}\in\mathcal{S}^{d-1}:w_{k}=0,k\in\mathcal{D}\setminus\{\mathfrak{i},% \mathfrak{j}\}\}{ bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT : italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 , italic_k ∈ caligraphic_D ∖ { fraktur_i , fraktur_j } }; however, we will typically observe a significant number of angular observations that lie close to this region. Moreover, note that for any observation 𝒙𝒙\bm{x}bold_italic_x with corresponding angular observation 𝒘𝒘\bm{w}bold_italic_w, we have wk=0xk/log(n/2)=0iffsubscript𝑤𝑘0subscript𝑥𝑘𝑛20w_{k}=0\iff x_{k}/\log(n/2)=0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 ⇔ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / roman_log ( italic_n / 2 ) = 0 for any k𝒟𝑘𝒟k\in\mathcal{D}italic_k ∈ caligraphic_D. Therefore, we plot, alongside 𝒢~𝔦,𝔧subscript~𝒢𝔦𝔧\widetilde{\partial\mathcal{G}}_{\mathfrak{i},\mathfrak{j}}over~ start_ARG ∂ caligraphic_G end_ARG start_POSTSUBSCRIPT fraktur_i , fraktur_j end_POSTSUBSCRIPT, all observations of the the scaled bivariate sample clouds for which 𝒙{𝔦,𝔧}/log(n/2)ϵnormsubscript𝒙𝔦𝔧𝑛2italic-ϵ||\bm{x}_{-\{\mathfrak{i},\mathfrak{j}\}}||/\log(n/2)\leq\epsilon| | bold_italic_x start_POSTSUBSCRIPT - { fraktur_i , fraktur_j } end_POSTSUBSCRIPT | | / roman_log ( italic_n / 2 ) ≤ italic_ϵ, where 𝒙{𝔦,𝔧}subscript𝒙𝔦𝔧\bm{x}_{-\{\mathfrak{i},\mathfrak{j}\}}bold_italic_x start_POSTSUBSCRIPT - { fraktur_i , fraktur_j } end_POSTSUBSCRIPT denotes the observation with its 𝔦𝔦\mathfrak{i}fraktur_i-th and 𝔧𝔧\mathfrak{j}fraktur_j-th components removed and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 denotes some small value. For sufficiently small ϵitalic-ϵ\epsilonitalic_ϵ and large enough n𝑛nitalic_n, one would expect the scaled bivariate sample cloud to lie approximately within the interior of the estimated bivariate slice. Selection of ϵitalic-ϵ\epsilonitalic_ϵ is considered in Section 5.

4 Simulation study

4.1 Overview

We conduct two simulation studies to investigate the efficacy of the DeepGauge framework for estimating the gauge function of three known copulas: Gaussian, Student-t, and logistic. Details of the copulas and their theoretical gauge functions are provided in Section 4.2.

Two studies are performed and their results presented in Section 4.3. Efficacy in both studies is quantified using the validation diagnostics described in Section 4.2. The first study considers, for a fixed quantile level τ𝜏\tauitalic_τ and architecture, the effect of varying d𝑑ditalic_d and n𝑛nitalic_n on the accuracy of DeepGauge estimates of the gauge functions and exceedance probabilities; the second study considers, for a fixed dimension d𝑑ditalic_d and sample size n𝑛nitalic_n, the effect of hyper-parameter choice. In both studies, we perform 100100100100 experiments for every model specification. That is, for a single choice of n𝑛nitalic_n, d𝑑ditalic_d, copula, and hyper-parameter configuration, we simulate 100 data sets and apply the methodology, separately, to each. Performance metrics are then presented as the median and 95%percent9595\%95 % confidence intervals over all experiments. All models for both rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) are trained over 500 iterations with a mini-batch size of 1024, and the ADF function is estimated with u𝑢uitalic_u in (11) taken to be the q=0.9995𝑞0.9995q=0.9995italic_q = 0.9995 empirical quantile of the structure variable T𝒘subscript𝑇𝒘T_{\bm{w}}italic_T start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT. Early-stop** is used with a patience of Δ=5Δ5\Delta=5roman_Δ = 5; see Appendix A.2 for details.

4.2 Models and performance metrics

We consider three copulas with known gauge functions and limit sets: Gaussian, Student-t, and logistic. Their theoretical unit-level sets and gauge functions are provided by Papastathopoulos et al., (2024). Recall that we consider a random vector 𝑿𝑿\bm{X}bold_italic_X with standard Laplace margins. If 𝑿𝑿\bm{X}bold_italic_X also has a d𝑑ditalic_d-variate Gaussian copula with positive-definite precision matrix Qd×d𝑄superscript𝑑𝑑Q\in\mathbb{R}^{d\times d}italic_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, its gauge function is g(𝒙)={sgn(𝒙)|𝒙|1/2}TQ{sgn(𝒙)|𝒙|1/2}𝑔𝒙superscriptsgn𝒙superscript𝒙12𝑇𝑄sgn𝒙superscript𝒙12g(\bm{x})=\left\{\text{sgn}(\bm{x})|\bm{x}|^{1/2}\right\}^{T}Q\left\{\text{sgn% }(\bm{x})|\bm{x}|^{-1/2}\right\}italic_g ( bold_italic_x ) = { sgn ( bold_italic_x ) | bold_italic_x | start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q { sgn ( bold_italic_x ) | bold_italic_x | start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT } where sgn is applied component-wise. If 𝑿𝑿\bm{X}bold_italic_X follows a d𝑑ditalic_d-variate Student-t copula with Q𝑄Qitalic_Q as before and degrees of freedom ν>0𝜈0\nu>0italic_ν > 0, its gauge function is g(𝒙)=ν1i=1d|xi|+(1+dν1)maxi=1,,d|xi|𝑔𝒙superscript𝜈1subscriptsuperscript𝑑𝑖1subscript𝑥𝑖1𝑑superscript𝜈1subscript𝑖1𝑑subscript𝑥𝑖g(\bm{x})=-\nu^{-1}\sum^{d}_{i=1}|x_{i}|+(1+d\nu^{-1})\max_{i=1,\dots,d}|x_{i}|italic_g ( bold_italic_x ) = - italic_ν start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + ( 1 + italic_d italic_ν start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |. Finally, let 𝑿𝑿\bm{X}bold_italic_X follow a logistic copula, i.e., a Gumbel copula with dependence parameter 1/θ1𝜃1/\theta1 / italic_θ for θ(0,1]𝜃01\theta\in(0,1]italic_θ ∈ ( 0 , 1 ]. In this case, the form of the theoretical gauge function is cumbersome; see Appendix A.4 of Papastathopoulos et al., (2024). For the Gaussian and Student-t copulas, we randomly generate, for each d𝑑ditalic_d, a valid precision matrix. However, we impose that the corresponding correlation matrices are ordered with respect to d𝑑ditalic_d. That is, if we consider two copulas with dimension d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, satisfying d1<d2subscript𝑑1subscript𝑑2d_{1}<d_{2}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, then the d1×d1subscript𝑑1subscript𝑑1d_{1}\times d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT correlation matrix is a submatrix of the d2×d2subscript𝑑2subscript𝑑2d_{2}\times d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT correlation matrix. Note that the generated correlation matrices are kept fixed throughout all studies, and we further set ν=1𝜈1\nu=1italic_ν = 1 and θ=0.3𝜃0.3\theta=0.3italic_θ = 0.3

We simulate from these copulas and apply the methodology described in Section 3 to estimate the gauge function on 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. To quantify the accuracy of the estimated unit-level set, we measure the integrated squared error (ISE) of the form

ISE=𝒮d1{1/g0(𝒘)1/g~(𝒘)}2d𝒘,ISEsubscriptsuperscript𝒮𝑑1superscript1subscript𝑔0𝒘1~𝑔𝒘2differential-d𝒘\mbox{ISE}=\int_{\mathcal{S}^{d-1}}\{1/g_{0}(\bm{w})-1/\tilde{g}(\bm{w})\}^{2}% \mathrm{d}\bm{w},ISE = ∫ start_POSTSUBSCRIPT caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { 1 / italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_w ) - 1 / over~ start_ARG italic_g end_ARG ( bold_italic_w ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_d bold_italic_w , (12)

where g0subscript𝑔0g_{0}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and g~~𝑔\tilde{g}over~ start_ARG italic_g end_ARG are, respectively, the theoretical and estimated (rescaled) gauge functions. We evaluate the integral in (12) using Monte-Carlo methods; specifically, we use the estimator ISE^=(Ad/|𝒲|)𝒘𝒲{1/g0(𝒘)1/g~(𝒘)}2^ISEsubscript𝐴𝑑𝒲subscript𝒘𝒲superscript1subscript𝑔0𝒘1~𝑔𝒘2\widehat{\mbox{ISE}}=(A_{d}/|\mathcal{W}|)\sum_{\bm{w}\in\mathcal{W}}\{1/g_{0}% (\bm{w})-1/\tilde{g}(\bm{w})\}^{2}over^ start_ARG ISE end_ARG = ( italic_A start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / | caligraphic_W | ) ∑ start_POSTSUBSCRIPT bold_italic_w ∈ caligraphic_W end_POSTSUBSCRIPT { 1 / italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_w ) - 1 / over~ start_ARG italic_g end_ARG ( bold_italic_w ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where Ad=2πd/2/Γ(d/2)subscript𝐴𝑑2superscript𝜋𝑑2Γ𝑑2A_{d}=2\pi^{d/2}/\Gamma(d/2)italic_A start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 2 italic_π start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT / roman_Γ ( italic_d / 2 ) is the surface area of 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, 𝒲𝒲\mathcal{W}caligraphic_W denotes the set of points described in Section 3.3, and Γ()Γ\Gamma(\cdot)roman_Γ ( ⋅ ) is the gamma function.

We validate estimates of the extended angular dependence function (ADF) by evaluating, for each considered copula, four joint probabilities. The first two are exceedance probabilities of the form Pr{Xi>u,i=1,,d},Prsubscript𝑋𝑖𝑢𝑖1𝑑\Pr\{X_{i}>u,i=1,\dots,d\},roman_Pr { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_u , italic_i = 1 , … , italic_d } , with u𝑢uitalic_u taken to be both the 0.990.990.990.99 and 0.9990.9990.9990.999 standard Laplace quantiles. For the joint lower tail, we also evaluate Pr{Xi<u,i=1,,d},Prsubscript𝑋𝑖𝑢𝑖1𝑑\Pr\{X_{i}<u,i=1,\dots,d\},roman_Pr { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_u , italic_i = 1 , … , italic_d } , with u𝑢uitalic_u taken as the 0.010.010.010.01 and 0.0010.0010.0010.001 standard Laplace quantiles. One of the strengths of the DeepGauge framework is that it is not limited to estimation of probabilities on hyper-cubes. To show this, we also estimate

Pr{u1,i<Xi<u2,i,i=1,,d},Prsubscript𝑢1𝑖subscript𝑋𝑖subscript𝑢2𝑖𝑖1𝑑\Pr\{u_{1,i}<X_{i}<u_{2,i},i=1,\dots,d\},roman_Pr { italic_u start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT < italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_u start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_d } ,

where u1,i=u1,u2,i=formulae-sequencesubscript𝑢1𝑖subscript𝑢1subscript𝑢2𝑖u_{1,i}=u_{1},u_{2,i}=\inftyitalic_u start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT = ∞ if i𝑖iitalic_i is odd and, otherwise, u1,i=,u2,i=u2formulae-sequencesubscript𝑢1𝑖subscript𝑢2𝑖subscript𝑢2u_{1,i}=-\infty,u_{2,i}=u_{2}italic_u start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT = - ∞ , italic_u start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT corresponds to the 0.9990.9990.9990.999 standard Laplace quantile. For u2subscript𝑢2u_{2}italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we consider two cases: u2subscript𝑢2u_{2}italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as either the 0.20.20.20.2 or 0.40.40.40.4 standard Laplace quantile. We consider estimation of the two latter probabilities for only the Gaussian and Student-t copulas, as evaluation is permissible using standard computational software. This is not the case for the logistic copula, and we found that Monte-Carlo methods perform poorly. To quantify efficacy, we provide the mean absolute log error (MALE), taking the average over all probabilities. The order-of-magnitude for the probabilities ranges from approximately 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT to 1037superscript103710^{-37}10 start_POSTSUPERSCRIPT - 37 end_POSTSUPERSCRIPT.

4.3 Results

We first consider a setting with the quantile level fixed to τ=0.75𝜏0.75\tau=0.75italic_τ = 0.75 and the architecture for g()𝑔g(\cdot)italic_g ( ⋅ ) (equivalently, g~~𝑔\tilde{g}over~ start_ARG italic_g end_ARG) taken to be to a neural network with N=3𝑁3N=3italic_N = 3 hidden layers of width 64 (see (A.1) for details), but allow the sample size n𝑛nitalic_n and dimension d𝑑ditalic_d to vary: we consider d=3𝑑3d=3italic_d = 3, d=5𝑑5d=5italic_d = 5, and d=8𝑑8d=8italic_d = 8, as well as n=10,000𝑛10000n=10,000italic_n = 10 , 000, n=50,000𝑛50000n=50,000italic_n = 50 , 000, n=100,000𝑛100000n=100,000italic_n = 100 , 000, and n=250,000𝑛250000n=250,000italic_n = 250 , 000. For rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ), we use a neural network with N=3𝑁3N=3italic_N = 3 hidden layers of width 32323232 and keep this fixed throughout. Results for this setting are provided in Figure A2 of Appendix C. As expected, we generally observe decreasing ISE and MALE with increasing sample size n𝑛nitalic_n across all dimensions and copulas.

We then consider a setting with the sample size n𝑛nitalic_n and dimension d𝑑ditalic_d fixed to investigate how the choice of hyper-parameters impact the model fits. We fix n=100,000𝑛100000n=100,000italic_n = 100 , 000 and d=5𝑑5d=5italic_d = 5 or d=8𝑑8d=8italic_d = 8, but, across two scenarios, we allow the quantile level and architecture of g()𝑔g(\cdot)italic_g ( ⋅ ) to vary. For the first scenario, we consider a sequence of quantile levels τ{0.1,0.3,,0.9}𝜏0.10.30.9\tau\in\{0.1,0.3,\dots,0.9\}italic_τ ∈ { 0.1 , 0.3 , … , 0.9 } and take g()𝑔g(\cdot)italic_g ( ⋅ ) to be a neural network with N=3𝑁3N=3italic_N = 3 hidden layers and with 32323232 nodes per layer. For the second scenario, we fix τ=0.75𝜏0.75\tau=0.75italic_τ = 0.75 and vary the architecture for g()𝑔g(\cdot)italic_g ( ⋅ ). We consider eight neural networks with N=1𝑁1N=1italic_N = 1 to N=4𝑁4N=4italic_N = 4 hidden layers, and with consistent width across layers; we take this to be either 16 or 64. Results for the first and second scenarios are presented in Figure 6 and Table A1, respectively.

Figure 6 illustrates the simulation results for varying quantile level, τ𝜏\tauitalic_τ. We find that the optimal value of τ𝜏\tauitalic_τ (in terms of minimising the performance metrics) differs across the copulas and dimension. For the Gaussian and Student-t copulas, we find that higher values of τ𝜏\tauitalic_τ are preferable when estimating the unit-level set, as the ISE is reduced; the converse holds for the logistic copula. When considering the MALE, we find that the Gaussian and Student-t copulas tend to favour a lower value of τ𝜏\tauitalic_τ, particularly for the high dimensional case where d=8𝑑8d=8italic_d = 8. Conversely, for the logistic copula, the MALE is minimised with the largest value of τ𝜏\tauitalic_τ. We note that the logistic copula has the quickest rate of convergence to the truncated gamma model described in (7), which may explain why this copula benefits from a smaller τ𝜏\tauitalic_τ (i.e., more data is available for inference).

Table A1 in Appendix C presents the simulation results for a fixed quantile level τ=0.75𝜏0.75{\tau=0.75}italic_τ = 0.75, but with varying architecture for g()𝑔g(\cdot)italic_g ( ⋅ ). We find that, for the Gaussian and Student-t copulas, the ISE is minimised when using the more complicated architecture, e.g., 3–4 layers of width 64; the converse holds for the logistic copula. Note that the logistic copula is determined by a single parameter, and so its corresponding gauge function has a much simpler shape that the other copulas (see Figure 2 for the case when d=2𝑑2d=2italic_d = 2). Hence, it is unsurprising that the logistic copulas favours a simpler neural network when considering the ISE. For the MALE, the results are less clear: all three copulas generally favour an architecture with fewer parameters (and, hence, a larger effective sample size), except the Gaussian copula with d=5𝑑5d=5italic_d = 5. In applications, the shape of the gauge function that arises from the data generating process is likely to be quite complex. Figure 6 and Table A1 suggest that, if the goal is to accurately estimate the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G, then one should consider using a more complicated neural network architecture for g()𝑔g(\cdot)italic_g ( ⋅ ). However, a more conservative approach should be taken if we wish to accurately estimate tail probabilities; reducing the quantile level τ𝜏\tauitalic_τ and using a simpler neural network architecture will increase the effective sample size used in inference for model (7), and this improve estimation of tail probabilities.

Refer to caption
Refer to caption
Figure 6: Boxplots of the estimated ISE and MALE for the quantile-level, τ𝜏\tauitalic_τ, simulation study described in Section 4.2. Here the sample size is n=100,000𝑛100000n=100,000italic_n = 100 , 000, the architecture for the rescaled gauge function has N=3𝑁3N=3italic_N = 3 hidden layers, each of width 32, and the considered dimensions are d=5𝑑5d=5italic_d = 5 and d=8𝑑8d=8italic_d = 8.

Generally, we find some variation in model fits across different choices of architecture and quantile level τ𝜏\tauitalic_τ, but the results are broadly similar. From this study, we can conclude that there is no one-size-fits-all approach to fully optimising hyper-parameters of the the DeepGauge framework. In practice, where the true gauge function and underlying data generating process are unknown, we advocate that practitioners fit a collection of DeepGauge models with various quantile levels and architecture, and choose the best model fit using the goodness-of-fit metrics detailed in Section 3.5.

5 Analysis of the NORA10 Metocean Data

5.1 Overview

Refer to caption
Figure 7: Locations used to study severe ocean events. Locations where we study the joint extremal dependence of ws, hs, and mslp are circled in blue. Locations for studying the joint extremal dependence in hs are highlighted in black.

We demonstrate the efficacy of the DeepGauge framework by modelling extremal dependence of metocean variables associated with severe ocean events. Such events typically occur when multiple metocean variables are simultaneously extreme, and pose a risk to offshore and coastal structures, such as wind farms and oil platforms (Shooter et al.,, 2021, 2022). Robust analyses of joint extremes is therefore crucial in this setting for informed decision making.

Our study uses the NORA10 hindcast gridded data set (NOrwegian ReAnalysis 10km, Reistad et al.,, 2011), which provides three-hourly wave fields over the Norwegian Sea, the North Sea, and the Barents Sea, at a spatial resolution of 10km. We focus on an area between the British Isles and Iceland for the period between September 1957 and December 2009 inclusive, which amounts to n=152,917𝑛152917n=152,917italic_n = 152 , 917 observations. These data have also previously been analysed by Shooter et al., (2022). We run two analyses to illustrate the efficacy of the DeepGauge framework for dimensions d=3𝑑3d=3italic_d = 3 and d=5𝑑5d=5italic_d = 5. For both analyses, hyper-parameters (e.g., model architecture and quantile level τ𝜏\tauitalic_τ) were optimised with respect to the goodness-of-fit diagnostics described in Section 3.5. As our interest is in modelling extremal dependence and not marginal extremes, the data are apriori transformed to standard Laplace margins using a standard rank transform, applied independently at each spatial location and for each variable. All study locations are plotted in Figure 7.

First, we consider the joint behaviour of wind speed (ws; m/s), significant wave height (hs; m), and mean sea level pressure (mslp; hPa) separately at three locations (01, 46, and 85, from south to north), outlined in blue in Figure 7. These three variables are associated with severe ocean events (Ewans and Jonathan,, 2014), and we aim to model their joint tail behaviour separately at each of the three locations. Of the d=3𝑑3d=3italic_d = 3 variables, ws and hs exhibit positive dependence, while the other two pairs of variables exhibit negative dependence; see Figure 9. Given the complex dependence structure of these data, it is likely that standard parametric extremal dependence models will perform poorly here; hence, these data provide the perfect candidate for illustrating the flexibility of our DeepGauge model.

Our second analysis models the joint tail behaviour of significant wave height (hs) across five locations (16, 33, 58, 72, and 92; highlighted in black, Figure 7) along a transect that runs approximately south-west to north-east, representing storm fronts that move along that direction. Simultaneous extremes at multiple locations pose higher risks than extremes at individual locations, and so quantification of joint extreme risk across these d=5𝑑5d=5italic_d = 5 locations represents an important area for investigation.

For the d=3𝑑3d=3italic_d = 3 and d=5𝑑5d=5italic_d = 5 cases, we use τ=0.9𝜏0.9\tau=0.9italic_τ = 0.9 and τ=0.4𝜏0.4\tau=0.4italic_τ = 0.4, respectively. The neural networks for rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) in the d=3𝑑3d=3italic_d = 3 setting have identical architectures: three hidden layers, each with 64 neurons. For d=5𝑑5d=5italic_d = 5, both neural networks have two hidden layers; the neural networks for rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) have all widths 16 and 64, respectively.

5.2 Results

5.2.1 Joint distribution of hs, ws, and mslp: d=3𝑑3d=3italic_d = 3

Refer to caption
Refer to caption
Figure 8: The estimated unit-level set (left) and the extended ADF (right) for hs, ws, and mslp at location 01 of the metocean data. All data are plotted on standard Laplace margins.

Here we present results for the three metocean variables observed at location 01; results for the remaining two locations are presented in Appendix C. Figure 8 plots the estimates of the three-dimensional unit-level set and the extended ADF for the three variables, with the sample clouds transformed to standard Laplace margins. We can see that the estimated unit-level sets capture well the shape and joint tail behaviour of the sample cloud. We also plot bivariate unit-level set slices for each of the three pairs of variables in Figure 9. For these, we set ϵ=0.01italic-ϵ0.01\epsilon=0.01italic_ϵ = 0.01 (see Section 3.5), resulting in a reasonable amount of validation data for each unit-level set slice. We observe that the estimated slices closely match the shapes of the corresponding bivariate sample clouds, again suggesting good model fits to these data. Corresponding plots for the remaining two locations are provided in Figures A6A9.

Goodness-of-fit is further verified using the truncated gamma QQ plot, extended ADF diagnostic, and return level set probabilities, as described in Section 3.5. The plots, provided in Figure A3 of Appendix C, illustrate strong agreement between model and empirical quantiles across all three diagnostics. The goodness-of-fit analyses for the remaining two locations also indicate good model fits over all diagnostics (Figures A4A5 of Appendix C). For all three locations, we also provide animated three dimensional plots for the unit-level sets and extended ADFs in the Supplementary Material.

Refer to caption
Figure 9: Scaled bivariate sample clouds with ϵ=0.01italic-ϵ0.01\epsilon=0.01italic_ϵ = 0.01 for pairwise combinations of ws, hs, and mslp at location 01, transformed onto common Laplace margins; the red shapes outline the corresponding estimated bivariate unit-level set slices.

5.2.2 Joint distribution of hs along transect: d=5𝑑5d=5italic_d = 5

For the d=5𝑑5d=5italic_d = 5 joint distribution of significant wave height hs, we cannot visualise the unit-level set estimate. However, there are 10 possible bivariate unit-level slices corresponding to different pairs of locations; three of these are illustrated in Figure 10. In this case, we set ϵ=0.05italic-ϵ0.05\epsilon=0.05italic_ϵ = 0.05 to ensure a reasonable amount of validation data in each slice, accounting for the fact angular observations are more sparse in higher dimensions (relative to d=3𝑑3d=3italic_d = 3). As previously, we observed that the estimated slices closely match the shapes of the corresponding bivariate sample cloud, indicating that the DeepGauge model can capture extremal dependence of hs across all five locations. Goodness-of-fit diagnostics, as illustrated in Figure A10 of Appendix C, show generally good agreement between model and empirical quantiles, albeit with some small divergences at extremely high quantiles (for the truncated gamma QQ plot).

Refer to caption
Figure 10: Scaled bivariate sample clouds with ϵ=0.05italic-ϵ0.05\epsilon=0.05italic_ϵ = 0.05 for pairwise combinations of hs at three locations, transformed into common Laplace margins; the red shapes outline the corresponding estimated bivariate unit-level set slices.

The DeepGauge framework provides fantastic model fits for our metocean data, in both d=3𝑑3d=3italic_d = 3 and d=5𝑑5d=5italic_d = 5 dimensions. The estimated unit-level sets facilitate inference of the entire extremal dependence structure of the data, not only dependence in a single orthant of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, without the need for restrictive parametric models or low-order summary measures (see Section 1). Whilst we have omitted this part of the study for brevity, we note that estimates of our novel extended ADF permit estimation of joint tail probabilities on arbitrary hyper-cubes; our accompanying diagnostics for the ADF estimates suggest high accuracy of such estimates. Finally, we remark that our favourable results for the case where d=5𝑑5d=5italic_d = 5 are particularly encouraging. Flexible models for extremal dependence are typically limited to d3𝑑3d\leq 3italic_d ≤ 3, but we have shown that this limitation does not hold for the DeepGauge framework.

6 Discussion

We introduce novel theoretical contributions for the geometric representation of multivariate extremes, which include extensions of the Wadsworth and Tawn, (2013) model and a demonstration that many of the results introduced by Nolde and Wadsworth, (2022) can be extended to data on Laplace margins. We further introduce the DeepGauge modelling framework and, through rigorous theoretical treatment and in contrast to all existing models, we prove that our framework gives estimates that satisfy all of the required theoretical properties of unit-level sets. In practice, this results in consistent inference of extremal dependence and provides a means to estimate joint tail properties; we demonstrate this by evaluating the performance of the DeepGauge framework on simulated and observed data in Sections 4 and 5, respectively. Unlike the majority of existing models for multivariate extremes, the DeepGauge framework is not limited to asymptotically dependent data nor low dimension d3𝑑3d\leq 3italic_d ≤ 3. Thus, our approach represents a significant step towards flexible, non-parametric, and scalable models for multivariate extremes.

We acknowledge that there is no theoretical guarantee that our proposed estimator for 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G will converge to the true unit-level set. One could, for instance, try to prove consistency of the rescaled gauge function estimator, g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ), for all angles in the set 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. However, theoretical results of this nature usually necessitate strict assumptions, which themselves can be difficult to verify. In practice, however, one can only ever look at diagnostics obtained from the data; we have therefore opted for a more practical treatment of our proposed estimator, and leave proofs of theoretical convergence for future work.

One noticeable omission from the DeepGauge framework is a model for the angular density f𝑾(𝒘),𝒘𝒮d1subscript𝑓𝑾𝒘𝒘superscript𝒮𝑑1f_{\bm{W}}(\bm{w}),\bm{w}\in\mathcal{S}^{d-1}italic_f start_POSTSUBSCRIPT bold_italic_W end_POSTSUBSCRIPT ( bold_italic_w ) , bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT; see Section 3.4 for further discussion. Several existing approaches propose non-parametric techniques for this estimation, albeit in low dimensional (d3𝑑3d\leq 3italic_d ≤ 3) settings (Papastathopoulos et al.,, 2024; Murphy-Barltrop et al., 2024a, ; Simpson and Tawn, 2024b, ). Models for the angular density offer the advantage of simulation from the model described in equation (7), from which one can obtain probability estimates in joint tail regions of any form, i.e., more general regions than those illustrated in Figure 4; see Wadsworth and Campbell, (2024) for instance. The estimation of, and simulation from, the angular density function f𝑾(𝒘)subscript𝑓𝑾𝒘f_{\bm{W}}(\bm{w})italic_f start_POSTSUBSCRIPT bold_italic_W end_POSTSUBSCRIPT ( bold_italic_w ) therefore represents an important area for future work in the context of multivariate extremes.

Finally, given an estimate of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G for some d3𝑑3d\geq 3italic_d ≥ 3, we remark that our modelling framework provides no means of obtaining lower dimensional unit-level sets, e.g., the unit-level set of (X𝔦,X𝔧)subscript𝑋𝔦subscript𝑋𝔧(X_{\mathfrak{i}},X_{\mathfrak{j}})( italic_X start_POSTSUBSCRIPT fraktur_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT fraktur_j end_POSTSUBSCRIPT ) for indices (𝔦,𝔧),𝔦𝔧(\mathfrak{i},\mathfrak{j}),( fraktur_i , fraktur_j ) , with 1𝔦<𝔧d1𝔦𝔧𝑑1\leq\mathfrak{i}<\mathfrak{j}\leq d1 ≤ fraktur_i < fraktur_j ≤ italic_d. This is because we only estimate the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ) over 𝒮d1superscript𝒮𝑑1\mathcal{S}^{d-1}caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, and to obtain a lower-dimensional gauge function, such as g(X𝔦,X𝔧)()subscript𝑔subscript𝑋𝔦subscript𝑋𝔧g_{(X_{\mathfrak{i}},X_{\mathfrak{j}})}(\cdot)italic_g start_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT fraktur_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT fraktur_j end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( ⋅ ), would require us to minimise g()𝑔g(\cdot)italic_g ( ⋅ ) over all components in 𝒟{𝔦,𝔧}𝒟𝔦𝔧\mathcal{D}\setminus\{\mathfrak{i},\mathfrak{j}\}caligraphic_D ∖ { fraktur_i , fraktur_j } (Nolde and Wadsworth,, 2022); this is only possible if we can evaluate g()𝑔g(\cdot)italic_g ( ⋅ ) on the whole of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Future work could investigate techniques for extracting lower dimensional unit-level sets from higher dimensional estimates, which would avoid the need for refitting.

Acknowledgements

Reetam Majumder was supported by grants from the United States Geological Survey’s National Climate Adaptation Science Center (G24AC00197), and the National Science Foundation (DMS2152887). The authors would like to thank Phil Jonathan for providing the data, Ryan Campbell for access to code, and Lambert de Monte for helpful discussions.

Appendix

Appendix A Neural network construction and inference

A.1 Neural network representation

We here describe the construction of rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) and g()𝑔g(\cdot)italic_g ( ⋅ ) used multi-layer perceptrons. Let rτ(𝒘):=exp{m𝝍(𝒘)},assignsubscript𝑟𝜏𝒘subscript𝑚𝝍𝒘r_{\tau}(\bm{w}):=\exp\{m_{\bm{\psi}}(\bm{w})\},italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) := roman_exp { italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( bold_italic_w ) } , where m𝝍(𝒘)subscript𝑚𝝍𝒘m_{\bm{\psi}}(\bm{w})italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( bold_italic_w ) is a neural network, parameterised by 𝝍𝝍\bm{\psi}bold_italic_ψ, that comprises N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N hidden layers. Then m𝝍subscript𝑚𝝍m_{\bm{\psi}}italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT is a composition of the form m𝝍()=m𝝍(N+1)𝐦𝝍(1)()subscript𝑚𝝍superscriptsubscript𝑚𝝍𝑁1subscriptsuperscript𝐦1𝝍{m_{\bm{\psi}}(\cdot)=m_{\bm{\psi}}^{(N+1)}\circ\dots\circ\mathbf{m}^{(1)}_{% \bm{\psi}}(\cdot)}italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( ⋅ ) = italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_N + 1 ) end_POSTSUPERSCRIPT ∘ ⋯ ∘ bold_m start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( ⋅ ), where, for j=1,,N𝑗1𝑁j=1,\dots,Nitalic_j = 1 , … , italic_N, the output from the j𝑗jitalic_j-th hidden layer, 𝐦𝝍(j):nj1nj:subscriptsuperscript𝐦𝑗𝝍maps-tosuperscriptsubscript𝑛𝑗1superscriptsubscript𝑛𝑗\mathbf{m}^{(j)}_{\bm{\psi}}:\mathbb{R}^{n_{j-1}}\mapsto\mathbb{R}^{n_{j}}bold_m start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, can be written as

𝒘(j):=𝐦(j)(𝒘(j1))=ReLU(𝐌(j)𝒘(j1)+𝐛(j)),assignsuperscript𝒘𝑗superscript𝐦𝑗superscript𝒘𝑗1ReLUsuperscript𝐌𝑗superscript𝒘𝑗1superscript𝐛𝑗\bm{w}^{(j)}:=\mathbf{m}^{(j)}(\bm{w}^{(j-1)})=\text{ReLU}\left(\mathbf{M}^{(j% )}\bm{w}^{(j-1)}+\mathbf{b}^{(j)}\right),bold_italic_w start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT := bold_m start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUPERSCRIPT ( italic_j - 1 ) end_POSTSUPERSCRIPT ) = ReLU ( bold_M start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ( italic_j - 1 ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) , (A.1)

where 𝒘(0):=𝒘assignsuperscript𝒘0𝒘\bm{w}^{(0)}:=\bm{w}bold_italic_w start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT := bold_italic_w is the input angles with dimension n0=dsubscript𝑛0𝑑n_{0}=ditalic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_d and the final layer is m(N+1)(𝒘(N))=𝐌(N+1)𝒘(N)+𝐛(N)superscript𝑚𝑁1superscript𝒘𝑁superscript𝐌𝑁1superscript𝒘𝑁superscript𝐛𝑁m^{(N+1)}(\bm{w}^{(N)})=\mathbf{M}^{(N+1)}\bm{w}^{(N)}+\mathbf{b}^{(N)}italic_m start_POSTSUPERSCRIPT ( italic_N + 1 ) end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT ) = bold_M start_POSTSUPERSCRIPT ( italic_N + 1 ) end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT. The ReLU function evaluates the component-wise maxima of its input and a vector of zeroes (of suitable length), i.e., ReLU(𝒙)=(max{x1,0},max{x2,0},)T\bm{x})=(\max\{x_{1},0\},\max\{x_{2},0\},\dots)^{T}bold_italic_x ) = ( roman_max { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 } , roman_max { italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 } , … ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where 𝒙=(x1,x2,)T𝒙superscriptsubscript𝑥1subscript𝑥2𝑇\bm{x}=(x_{1},x_{2},\dots)^{T}bold_italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is a vector of finite (unspecified) length. Each layer j𝑗jitalic_j, for j=1,,N+1,𝑗1𝑁1j=1,\dots,N+1,italic_j = 1 , … , italic_N + 1 , is parameterised by estimable weights matrices and vectors 𝐌(j)nj×nj1superscript𝐌𝑗superscriptsubscript𝑛𝑗subscript𝑛𝑗1{\mathbf{M}^{(j)}\in\mathbb{R}^{n_{j}\times n_{j-1}}}bold_M start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐛(j)njsuperscript𝐛𝑗superscriptsubscript𝑛𝑗\mathbf{b}^{(j)}\in\mathbb{R}^{n_{j}}bold_b start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. Thus, the neural network is fully-parameterised by the collection of parameters 𝝍={{𝐌(j),𝐛(j)}j=1N+1}𝝍superscriptsubscriptsuperscript𝐌𝑗superscript𝐛𝑗𝑗1𝑁1\bm{\psi}=\{\{\mathbf{M}^{(j)},\mathbf{b}^{(j)}\}_{j=1}^{N+1}\}bold_italic_ψ = { { bold_M start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT , bold_b start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N + 1 end_POSTSUPERSCRIPT }. The dimension njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of the output of layer 𝐦(j)superscript𝐦𝑗\mathbf{m}^{(j)}bold_m start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, as well as the “depth” N𝑁Nitalic_N of m𝝍()subscript𝑚𝝍m_{\bm{\psi}}(\cdot)italic_m start_POSTSUBSCRIPT bold_italic_ψ end_POSTSUBSCRIPT ( ⋅ ), are referred to as the neural network’s architecture and they are tunable hyper-parameters.

A.2 Training and pre-training

The minimisation in (8) is performed using the Adaptive Moment Estimation (Adam) algorithm (Kingma and Ba,, 2014), which is a variant of mini-batch stochastic gradient descent111We use the default settings of the hyper-parameters for Adam.. We utilise the R interface to keras (Allaire and Chollet,, 2021). Prior to training, we uniformly-at-random partition the data into 80%percent8080\%80 % training and 20%percent2020\%20 % validation, with this partition consistent across the training stage for both rτ(𝒘)subscript𝑟𝜏𝒘r_{\tau}(\bm{w})italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) and g~(𝒘)~𝑔𝒘\tilde{g}(\bm{w})over~ start_ARG italic_g end_ARG ( bold_italic_w ). To mitigate overfitting of the neural network model, we train with checkpoints and early-stop** (Prechelt,, 2002). Neural network models are optimised for a finite number of pre-specified iterations, say M𝑀Mitalic_M, by minimising the loss function evaluated on the training data. The validation data are not used to optimise the neural network parameters. Instead, at every iteration of the training algorithm, we evaluate the loss function on the validation data. Then, the “best fitting” model is determined to be that which which minimises the loss function, evaluated on the validation data, across all epochs. Early-stop** ensures that the training scheme does not necessarily run for all M𝑀Mitalic_M iterations; training stops early if the validation loss has not decreased in the last ΔΔ\Delta\in\mathbb{N}roman_Δ ∈ blackboard_N iterations. We also add L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT penalties to the estimable neural network parameters, ψ𝜓\psiitalic_ψ, to provide further model regularisation; throughout we adopt these penalties with equal shrinkage weight of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. For further details on fitting and regularising deep neural network models, see Goodfellow et al., (2016).

Typically, without prior knowledge of the function that one seeks to approximate, the estimable parameter set, 𝝍,𝝍\bm{\psi},bold_italic_ψ , of a neural network model is randomly initialised prior to training; thus when training our neural network model for the exceedance threshold rτ()subscript𝑟𝜏r_{\tau}(\cdot)italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) we always choose to take this approach, as we have no prior knowledge of its structure. However, for training of the neural network that comprises the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ), we can exploit some of the theoretical structure of gauge functions to perform pre-training, which is a popular technique used in applications of neural networks to reduce computation times. Pre-training use parameter estimates of a neural network trained for one task as initial parameter estimates for a neural network model designed for a different task (with equivalent architecture; see Goodfellow et al., (2016), Ch. 8). Following Corollary 2.2, we can use an estimate of the radial exceedance threshold, say r^τ(𝒘),subscript^𝑟𝜏𝒘\hat{r}_{\tau}(\bm{w}),over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) , to obtain an initial estimate for the gauge function, say g^τ(𝒘)subscript^𝑔𝜏𝒘\hat{g}_{\tau}(\bm{w})over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ); this is achieved by replacing the radial function h()h(\cdot)italic_h ( ⋅ ) in Corollary 2.2 with r^τ()subscript^𝑟𝜏\hat{r}_{\tau}(\cdot)over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( ⋅ ) and deriving the corresponding rescaled gauge function. In the two dimensional setting, Wadsworth and Campbell, (2024) use a similar idea to get an initial estimate for the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G. The authors also show that, for τ𝜏\tauitalic_τ close to 1, the rescaled quantile set is a valid approximation of 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G. We use this idea to pre-train the neural network of the gauge function model g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ); we perform an initial optimisation of its weights subject to minimisation of the squared-error loss between the initial gauge function g^τ(𝒘)subscript^𝑔𝜏𝒘\hat{g}_{\tau}(\bm{w})over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_w ) and the model output, g~(𝒘)~𝑔𝒘\tilde{g}(\bm{w})over~ start_ARG italic_g end_ARG ( bold_italic_w ). For α𝛼\alphaitalic_α, we always set its initial estimate to d𝑑ditalic_d, as this is the theoretical value attained by gauge functions for many popular copula models (Wadsworth and Campbell,, 2024). In unreported experiments, we found that this pre-training helps to mitigate numerical instability during training and improves accuracy of gauge function estimation.

With the initial pre-trained weights, we estimate the rescaled gauge function, from which we obtain estimates of the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G. Selection of tuning parameters is discussed in Section 4 of the main text.

Appendix B Additional proofs

B.1 Proof of Proposition 2.1

Proof.

First, observe that for any XEExp(1)similar-tosubscript𝑋𝐸Exp1X_{E}\sim\text{Exp}(1)italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∼ Exp ( 1 ), we have that the variable given by

X:={log(1eXE)+log(2),XElog(2),XElog(2),XE>log(2),assign𝑋cases1superscript𝑒subscript𝑋𝐸2subscript𝑋𝐸2otherwisesubscript𝑋𝐸2subscript𝑋𝐸2otherwiseX:=\begin{cases}\log(1-e^{-X_{E}})+\log(2),\hskip 10.00002ptX_{E}\leq\log(2),% \\ X_{E}-\log(2),\hskip 55.70015ptX_{E}>\log(2),\end{cases}italic_X := { start_ROW start_CELL roman_log ( 1 - italic_e start_POSTSUPERSCRIPT - italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + roman_log ( 2 ) , italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ≤ roman_log ( 2 ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT - roman_log ( 2 ) , italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT > roman_log ( 2 ) , end_CELL start_CELL end_CELL end_ROW

is standard Laplace distributed, where the log(2)2\log(2)roman_log ( 2 ) here denotes the median of XEsubscript𝑋𝐸X_{E}italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. Given 𝒘𝒮+d1𝒘subscriptsuperscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT we have, by definition, gE(𝒘)=limt(logf𝑿E(t𝒘)/t)subscript𝑔𝐸𝒘subscript𝑡subscript𝑓subscript𝑿𝐸𝑡𝒘𝑡g_{E}(\bm{w})=\lim_{t\to\infty}(-\log f_{\bm{X}_{E}}(t\bm{w})/t)italic_g start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( bold_italic_w ) = roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT ( - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t bold_italic_w ) / italic_t ), where f𝑿E()subscript𝑓subscript𝑿𝐸f_{\bm{X}_{E}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) denotes the continuous joint density function for 𝑿Esubscript𝑿𝐸\bm{X}_{E}bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. For any large t𝑡titalic_t satisfying t>maxi=1,,d{log(2)/wi}>0𝑡subscript𝑖1𝑑2subscript𝑤𝑖0t>\max_{i=1,\dots,d}\{\log(2)/w_{i}\}>0italic_t > roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { roman_log ( 2 ) / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > 0, we have that

logf𝑿E(t𝒘)t=logf𝑿(t𝒘log2)t,subscript𝑓subscript𝑿𝐸𝑡𝒘𝑡subscript𝑓𝑿𝑡𝒘2𝑡\displaystyle\frac{-\log f_{\bm{X}_{E}}(t\bm{w})}{t}=\frac{-\log f_{\bm{X}}(t% \bm{w}-{\log 2})}{t},divide start_ARG - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t bold_italic_w ) end_ARG start_ARG italic_t end_ARG = divide start_ARG - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_w - roman_log 2 ) end_ARG start_ARG italic_t end_ARG ,

where f𝑿()subscript𝑓𝑿f_{\bm{X}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) is the continuous joint density function for 𝑿𝑿\bm{X}bold_italic_X and the Jacobian of the transformation 𝑿𝑿Emaps-to𝑿subscript𝑿𝐸\bm{X}\mapsto\bm{X}_{E}bold_italic_X ↦ bold_italic_X start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT equals 1111 for the considered coordinates. For a fixed 𝒘𝒘\bm{w}bold_italic_w, set t:=t𝒘log2=t𝒘log2/tassignsuperscript𝑡norm𝑡𝒘2𝑡norm𝒘2𝑡t^{*}:=||t\bm{w}-{\log 2}||=t||\bm{w}-{\log 2}/t||italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := | | italic_t bold_italic_w - roman_log 2 | | = italic_t | | bold_italic_w - roman_log 2 / italic_t | | and 𝒘:=(t𝒘log2)/t=(𝒘log2/t)/𝒘log2/tassignsuperscript𝒘𝑡𝒘2superscript𝑡𝒘2𝑡norm𝒘2𝑡\bm{w}^{*}:=(t\bm{w}-{\log 2})/t^{*}=(\bm{w}-{\log 2}/t)/||\bm{w}-{\log 2}/t||bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := ( italic_t bold_italic_w - roman_log 2 ) / italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( bold_italic_w - roman_log 2 / italic_t ) / | | bold_italic_w - roman_log 2 / italic_t | |. We then observe that ttsimilar-tosuperscript𝑡𝑡t^{*}\sim titalic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∼ italic_t and 𝒘𝒘similar-tosuperscript𝒘𝒘\bm{w}^{*}\sim\bm{w}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∼ bold_italic_w as t𝑡t\to\inftyitalic_t → ∞. Therefore, it follows that

logf𝑿(t𝒘log2)t=logf𝑿(t𝒘)tlogf𝑿(t𝒘)t.subscript𝑓𝑿𝑡𝒘2𝑡subscript𝑓𝑿superscript𝑡superscript𝒘𝑡similar-tosubscript𝑓𝑿𝑡𝒘𝑡\frac{-\log f_{\bm{X}}(t\bm{w}-{\log 2})}{t}=\frac{-\log f_{\bm{X}}(t^{*}\bm{w% }^{*})}{t}\sim\frac{-\log f_{\bm{X}}(t\bm{w})}{t}.divide start_ARG - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_w - roman_log 2 ) end_ARG start_ARG italic_t end_ARG = divide start_ARG - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_t end_ARG ∼ divide start_ARG - roman_log italic_f start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( italic_t bold_italic_w ) end_ARG start_ARG italic_t end_ARG .

Taking the limit as t𝑡t\to\inftyitalic_t → ∞, the result follows. ∎

B.2 Proof of Proposition 2.3

Proof.

To prove the star-shaped property, it suffices to show that for any 𝒙𝒙\bm{x}\in\mathcal{H}bold_italic_x ∈ caligraphic_H, we have t𝒙𝑡𝒙t\bm{x}\in\mathcal{H}italic_t bold_italic_x ∈ caligraphic_H for all t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. This is trivial for 𝒙=𝟎d𝒙subscript0𝑑\bm{x}=\bm{0}_{d}bold_italic_x = bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Taking 𝒙{𝟎d}𝒙subscript0𝑑\bm{x}\in\mathcal{H}\setminus\{\bm{0}_{d}\}bold_italic_x ∈ caligraphic_H ∖ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT }, we see that

t𝒙=t𝒙th(𝒙/𝒙)h(𝒙/𝒙)=h(t𝒙/t𝒙),norm𝑡𝒙𝑡norm𝒙𝑡𝒙norm𝒙𝒙norm𝒙𝑡𝒙norm𝑡𝒙||t\bm{x}||=t||\bm{x}||\leq th(\bm{x}/||\bm{x}||)\leq h(\bm{x}/||\bm{x}||)=h(t% \bm{x}/||t\bm{x}||),| | italic_t bold_italic_x | | = italic_t | | bold_italic_x | | ≤ italic_t italic_h ( bold_italic_x / | | bold_italic_x | | ) ≤ italic_h ( bold_italic_x / | | bold_italic_x | | ) = italic_h ( italic_t bold_italic_x / | | italic_t bold_italic_x | | ) ,

and thus t𝒙G𝑡𝒙𝐺t\bm{x}\in{G}italic_t bold_italic_x ∈ italic_G. Moreover, given 𝒙{𝟎d}𝒙subscript0𝑑\bm{x}\in\mathcal{H}\setminus\{\bm{0}_{d}\}bold_italic_x ∈ caligraphic_H ∖ { bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT }, we have

𝒙h(𝒙/𝒙)1𝒙/𝒙=𝒙𝒙,norm𝒙𝒙norm𝒙1subscriptnorm𝒙norm𝒙norm𝒙subscriptnorm𝒙||\bm{x}||\leq h(\bm{x}/||\bm{x}||)\leq\frac{1}{\big{\|}\bm{x}/||\bm{x}||\big{% \|}_{\infty}}=\frac{||\bm{x}||}{||\bm{x}||_{\infty}},| | bold_italic_x | | ≤ italic_h ( bold_italic_x / | | bold_italic_x | | ) ≤ divide start_ARG 1 end_ARG start_ARG ∥ bold_italic_x / | | bold_italic_x | | ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG = divide start_ARG | | bold_italic_x | | end_ARG start_ARG | | bold_italic_x | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG ,

so 𝒙1subscriptnorm𝒙1||\bm{x}||_{\infty}\leq 1| | bold_italic_x | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1, implying 1xi11subscript𝑥𝑖1-1\leq x_{i}\leq 1- 1 ≤ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 for all i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d; thus, [1,1]dsuperscript11𝑑\mathcal{H}\subset[-1,1]^{d}caligraphic_H ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. This also implies that \mathcal{H}caligraphic_H is bounded; thus, to prove \mathcal{H}caligraphic_H is compact, we just need to show that \mathcal{H}caligraphic_H is closed, owing to the Heine–Borel theorem (see, e.g., Hayes,, 1956). Let (𝒙n)nsubscriptsubscript𝒙𝑛𝑛(\bm{x}_{n})_{n\in\mathbb{N}}( bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT denote a sequence in \mathcal{H}caligraphic_H such that 𝒙n𝒙subscript𝒙𝑛𝒙\bm{x}_{n}\to\bm{x}bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → bold_italic_x as n𝑛n\to\inftyitalic_n → ∞, i.e., 𝒙n𝒙0normsubscript𝒙𝑛𝒙0||\bm{x}_{n}-\bm{x}||\to 0| | bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_italic_x | | → 0. We must show that 𝒙𝒙\bm{x}\in\mathcal{H}bold_italic_x ∈ caligraphic_H. If 𝒙=𝟎d𝒙subscript0𝑑\bm{x}=\bm{0}_{d}bold_italic_x = bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, the proof is trivial. Thus, we assume 𝒙𝟎d𝒙subscript0𝑑\bm{x}\neq\bm{0}_{d}bold_italic_x ≠ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Given ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, there exists N1subscript𝑁1N_{1}\in\mathbb{N}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_N such that 𝒙n𝒙<ϵnormsubscript𝒙𝑛𝒙italic-ϵ||\bm{x}_{n}-\bm{x}||<\epsilon| | bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_italic_x | | < italic_ϵ for all nN1𝑛subscript𝑁1n\geq N_{1}italic_n ≥ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Furthermore, since h(){h}(\cdot)italic_h ( ⋅ ) is continuous, there must exist N2subscript𝑁2N_{2}\in\mathbb{N}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_N such that |h(𝒙n/𝒙n)h(𝒙/𝒙)|<ϵsubscript𝒙𝑛normsubscript𝒙𝑛𝒙norm𝒙italic-ϵ|{h}(\bm{x}_{n}/||\bm{x}_{n}||)-{h}(\bm{x}/||\bm{x}||)|<\epsilon| italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / | | bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | | ) - italic_h ( bold_italic_x / | | bold_italic_x | | ) | < italic_ϵ for all nN2𝑛subscript𝑁2n\geq N_{2}italic_n ≥ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Setting N:=max{N1,N2}assign𝑁subscript𝑁1subscript𝑁2N:=\max\{N_{1},N_{2}\}italic_N := roman_max { italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, we have that

𝒙norm𝒙\displaystyle||\bm{x}||| | bold_italic_x | | =𝒙𝒙n+𝒙n𝒙𝒙n+𝒙n<ϵ+h(𝒙n/𝒙n)<2ϵ+h(𝒙/𝒙).absentnorm𝒙subscript𝒙𝑛subscript𝒙𝑛norm𝒙subscript𝒙𝑛normsubscript𝒙𝑛italic-ϵsubscript𝒙𝑛normsubscript𝒙𝑛2italic-ϵ𝒙norm𝒙\displaystyle=||\bm{x}-\bm{x}_{n}+\bm{x}_{n}||\leq||\bm{x}-\bm{x}_{n}||+||\bm{% x}_{n}||<\epsilon+{h}(\bm{x}_{n}/||\bm{x}_{n}||)<2\epsilon+{h}(\bm{x}/||\bm{x}% ||).= | | bold_italic_x - bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | | ≤ | | bold_italic_x - bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | | + | | bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | | < italic_ϵ + italic_h ( bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / | | bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | | ) < 2 italic_ϵ + italic_h ( bold_italic_x / | | bold_italic_x | | ) .

Taking the limit as ϵ0italic-ϵ0\epsilon\to 0italic_ϵ → 0, we have 𝒙𝒙\bm{x}\in\mathcal{H}bold_italic_x ∈ caligraphic_H; thus, \mathcal{H}caligraphic_H is compact. ∎

B.3 Proof of Proposition 2.4

To prove Proposition 2.4, we require Lemma 2.1, which we first prove below.

Proof of Lemma 2.1.

To prove bijectivity, it suffices to show that κ𝜅\kappaitalic_κ is injective and surjective.

Injectivity: suppose κ(𝒘)=κ(𝒘)𝜅𝒘𝜅superscript𝒘\kappa(\bm{w})=\kappa(\bm{w}^{*})italic_κ ( bold_italic_w ) = italic_κ ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) for 𝒘,𝒘𝒮d1𝒘superscript𝒘superscript𝒮𝑑1\bm{w},\bm{w}^{*}\in\mathcal{S}^{d-1}bold_italic_w , bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, and define the constant

c:=(w1b1(w1),,wdbd(wd))(w1b1(w1),,wdbd(wd))+.assign𝑐delimited-∥∥subscript𝑤1subscript𝑏1subscript𝑤1subscript𝑤𝑑subscript𝑏𝑑subscript𝑤𝑑delimited-∥∥subscriptsuperscript𝑤1subscript𝑏1subscriptsuperscript𝑤1subscriptsuperscript𝑤𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑑subscriptc:=\frac{\Big{\lVert}\left(\frac{w_{1}}{b_{1}(w_{1})},\ldots,\frac{w_{d}}{b_{d% }(w_{d})}\right)\Big{\rVert}}{\Big{\lVert}\left(\frac{w^{*}_{1}}{b_{1}(w^{*}_{% 1})},\ldots,\frac{w^{*}_{d}}{b_{d}(w^{*}_{d})}\right)\Big{\rVert}}\in\mathbb{R% }_{+}.italic_c := divide start_ARG ∥ ( divide start_ARG italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ end_ARG start_ARG ∥ ( divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ end_ARG ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT .

If κ(𝒘)=κ(𝒘)𝜅𝒘𝜅superscript𝒘\kappa(\bm{w})=\kappa(\bm{w}^{*})italic_κ ( bold_italic_w ) = italic_κ ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), then for each i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d, we have wi=cwisubscript𝑤𝑖𝑐superscriptsubscript𝑤𝑖w_{i}=cw_{i}^{*}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_c italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Furthermore, since 𝒘,𝒘𝒮d1𝒘superscript𝒘superscript𝒮𝑑1\bm{w},\bm{w}^{*}\in\mathcal{S}^{d-1}bold_italic_w , bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, we see that c=c𝒘=c𝒘=𝒘=1𝑐𝑐normsuperscript𝒘norm𝑐superscript𝒘norm𝒘1c=c||\bm{w}^{*}||=||c\bm{w}^{*}||=||\bm{w}||=1italic_c = italic_c | | bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | = | | italic_c bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | = | | bold_italic_w | | = 1; thus, 𝒘=𝒘𝒘superscript𝒘\bm{w}=\bm{w}^{*}bold_italic_w = bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Surjectivity: let 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT and without loss of generality, assume that wd0subscript𝑤𝑑0w_{d}\neq 0italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ≠ 0. Setting

κ1(𝒘):=(a(𝒘)b1(w1)w1bd(wd)|wd|,,a(𝒘)bd1(wd1)wd1bd(wd)|wd|,a(𝒘)sgn(wd)),assignsuperscript𝜅1𝒘𝑎𝒘subscript𝑏1subscript𝑤1subscript𝑤1subscript𝑏𝑑subscript𝑤𝑑subscript𝑤𝑑𝑎𝒘subscript𝑏𝑑1subscript𝑤𝑑1subscript𝑤𝑑1subscript𝑏𝑑subscript𝑤𝑑subscript𝑤𝑑𝑎𝒘sgnsubscript𝑤𝑑\kappa^{-1}(\bm{w}):=\left(\frac{a(\bm{w})b_{1}(w_{1})w_{1}}{b_{d}(w_{d})|w_{d% }|},\ldots,\frac{a(\bm{w})b_{d-1}(w_{d-1})w_{d-1}}{b_{d}(w_{d})|w_{d}|},a(\bm{% w})\text{sgn}(w_{d})\right),italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) := ( divide start_ARG italic_a ( bold_italic_w ) italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) | italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT | end_ARG , … , divide start_ARG italic_a ( bold_italic_w ) italic_b start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ) italic_w start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) | italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT | end_ARG , italic_a ( bold_italic_w ) sgn ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) ,

where a(𝒘):=1/1+j=1d1(bj(wj)wj/bd(wd)wd)2assign𝑎𝒘11superscriptsubscript𝑗1𝑑1superscriptsubscript𝑏𝑗subscript𝑤𝑗subscript𝑤𝑗subscript𝑏𝑑subscript𝑤𝑑subscript𝑤𝑑2a(\bm{w}):=1/\sqrt{1+\sum_{j=1}^{d-1}(b_{j}(w_{j})w_{j}/b_{d}(w_{d})w_{d})^{2}}italic_a ( bold_italic_w ) := 1 / square-root start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, it is straightforward to verify that κ(κ1(𝒘))=𝒘𝜅superscript𝜅1𝒘𝒘\kappa(\kappa^{-1}(\bm{w}))=\bm{w}italic_κ ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) ) = bold_italic_w, completing the proof. ∎

Proof of Proposition 2.4.

First, observe that we can rewrite ~~\widetilde{\partial\mathcal{H}}over~ start_ARG ∂ caligraphic_H end_ARG as

~={𝒘h(κ1(𝒘))(κ1(𝒘)1b1(κ1(𝒘)1),,κ1(𝒘)dbd(κ1(𝒘)d))|𝒘𝒮d1},~conditional-set𝒘superscript𝜅1𝒘delimited-∥∥superscript𝜅1subscript𝒘1subscript𝑏1superscript𝜅1subscript𝒘1superscript𝜅1subscript𝒘𝑑subscript𝑏𝑑superscript𝜅1subscript𝒘𝑑𝒘superscript𝒮𝑑1\widetilde{\partial\mathcal{H}}=\left\{\bm{w}h(\kappa^{-1}(\bm{w}))\left\lVert% \left(\frac{\kappa^{-1}(\bm{w})_{1}}{b_{1}(\kappa^{-1}(\bm{w})_{1})},\ldots,% \frac{\kappa^{-1}(\bm{w})_{d}}{b_{d}(\kappa^{-1}(\bm{w})_{d})}\right)\right% \rVert\bigg{|}\bm{w}\in\mathcal{S}^{d-1}\right\},over~ start_ARG ∂ caligraphic_H end_ARG = { bold_italic_w italic_h ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) ) ∥ ( divide start_ARG italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ | bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT } ,

where κ1(𝒘)isuperscript𝜅1subscript𝒘𝑖\kappa^{-1}(\bm{w})_{i}italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the i𝑖iitalic_i-th component of κ1(𝒘)superscript𝜅1𝒘\kappa^{-1}(\bm{w})italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_w ) for each i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d. Since κ()𝜅\kappa(\cdot)italic_κ ( ⋅ ) is bijective, there exists a unique, bijective map** between \partial\mathcal{H}∂ caligraphic_H and ~~\widetilde{\partial\mathcal{H}}over~ start_ARG ∂ caligraphic_H end_ARG; thus, the sets are in one-to-one correspondence. Furthermore, by definition, we have 1wih(𝒘)/bi(wi)11subscript𝑤𝑖𝒘subscript𝑏𝑖subscript𝑤𝑖1-1\leq w_{i}h(\bm{w})/b_{i}(w_{i})\leq 1- 1 ≤ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) / italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ 1 for each i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d and 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT; thus ~[1,1]d~superscript11𝑑\widetilde{\partial\mathcal{H}}\subset[-1,1]^{d}over~ start_ARG ∂ caligraphic_H end_ARG ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Finally, considering i=1,,d𝑖1𝑑i=1,\dots,ditalic_i = 1 , … , italic_d and setting 𝒘u,i=argmax𝒘𝒮d1{wih(𝒘)}superscript𝒘𝑢𝑖subscriptargmax𝒘superscript𝒮𝑑1subscript𝑤𝑖𝒘\bm{w}^{u,i}=\operatorname*{arg\,max}_{\bm{w}\in\mathcal{S}^{d-1}}\{w_{i}h(\bm% {w})\}bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) } and 𝒘l,i=argmin𝒘𝒮d1{wih(𝒘)}superscript𝒘𝑙𝑖subscriptargmin𝒘superscript𝒮𝑑1subscript𝑤𝑖𝒘\bm{w}^{l,i}=\operatorname*{arg\,min}_{\bm{w}\in\mathcal{S}^{d-1}}\{w_{i}h(\bm% {w})\}bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w ) }, we have wiu,ih(𝒘u,i)/bi(wiu,i)=1subscriptsuperscript𝑤𝑢𝑖𝑖superscript𝒘𝑢𝑖subscript𝑏𝑖subscriptsuperscript𝑤𝑢𝑖𝑖1w^{u,i}_{i}h(\bm{w}^{u,i})/b_{i}(w^{u,i}_{i})=1italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) / italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1 and wil,ih(𝒘l,i)/bi(wil,i)=1subscriptsuperscript𝑤𝑙𝑖𝑖superscript𝒘𝑙𝑖subscript𝑏𝑖subscriptsuperscript𝑤𝑙𝑖𝑖1w^{l,i}_{i}h(\bm{w}^{l,i})/b_{i}(w^{l,i}_{i})=-1italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT ) / italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - 1, implying the componentwise maxima and minima equal 𝟏dsubscript1𝑑\bm{1}_{d}bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and 𝟏dsubscript1𝑑-\bm{1}_{d}- bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, respectively. ∎

B.4 Proof of Corollary 2.1

Proof.

By definition, we have that 𝒘/g~(𝒘)~[1,1]d𝒘~𝑔𝒘~superscript11𝑑\bm{w}/\tilde{g}(\bm{w})\in\widetilde{\partial\mathcal{H}}\subset[-1,1]^{d}bold_italic_w / over~ start_ARG italic_g end_ARG ( bold_italic_w ) ∈ over~ start_ARG ∂ caligraphic_H end_ARG ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for any 𝒘𝒮d1𝒘superscript𝒮𝑑1\bm{w}\in\mathcal{S}^{d-1}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. Applying similar reasoning to the proof of Proposition 2.2, we immediately see that g~(𝒘)𝒘~𝑔𝒘subscriptnorm𝒘\tilde{g}(\bm{w})\geq||\bm{w}||_{\infty}over~ start_ARG italic_g end_ARG ( bold_italic_w ) ≥ | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. Furthermore, taking any i=1,,d𝑖1𝑑i=1,\dots,ditalic_i = 1 , … , italic_d, we have

κ(𝒘u,i)subscriptnorm𝜅superscript𝒘𝑢𝑖\displaystyle||\kappa(\bm{w}^{u,i})||_{\infty}| | italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT =(w1u,ib1(w1u,i),,wdu,ibd(wdu,i))(w1u,ib1(w1u,i),,wdu,ibd(wdu,i))=h(𝒘u,i)(w1u,ib1(w1u,i),,wdu,ibd(wdu,i))h(𝒘u,i)(w1u,ib1(w1u,i),,wdu,ibd(wdu,i))absentsubscriptdelimited-∥∥subscriptsuperscript𝑤𝑢𝑖1subscript𝑏1subscriptsuperscript𝑤𝑢𝑖1subscriptsuperscript𝑤𝑢𝑖𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑢𝑖𝑑delimited-∥∥subscriptsuperscript𝑤𝑢𝑖1subscript𝑏1subscriptsuperscript𝑤𝑢𝑖1subscriptsuperscript𝑤𝑢𝑖𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑢𝑖𝑑subscriptdelimited-∥∥superscript𝒘𝑢𝑖subscriptsuperscript𝑤𝑢𝑖1subscript𝑏1subscriptsuperscript𝑤𝑢𝑖1subscriptsuperscript𝑤𝑢𝑖𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑢𝑖𝑑delimited-∥∥superscript𝒘𝑢𝑖subscriptsuperscript𝑤𝑢𝑖1subscript𝑏1subscriptsuperscript𝑤𝑢𝑖1subscriptsuperscript𝑤𝑢𝑖𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑢𝑖𝑑\displaystyle=\left\lVert\frac{\left(\frac{w^{u,i}_{1}}{b_{1}(w^{u,i}_{1})},% \ldots,\frac{w^{u,i}_{d}}{b_{d}(w^{u,i}_{d})}\right)}{\Big{\lVert}\left(\frac{% w^{u,i}_{1}}{b_{1}(w^{u,i}_{1})},\ldots,\frac{w^{u,i}_{d}}{b_{d}(w^{u,i}_{d})}% \right)\Big{\rVert}}\right\rVert_{\infty}=\frac{\left\lVert h(\bm{w}^{u,i})% \left(\frac{w^{u,i}_{1}}{b_{1}(w^{u,i}_{1})},\ldots,\frac{w^{u,i}_{d}}{b_{d}(w% ^{u,i}_{d})}\right)\right\rVert_{\infty}}{\left\lVert h(\bm{w}^{u,i})\left(% \frac{w^{u,i}_{1}}{b_{1}(w^{u,i}_{1})},\ldots,\frac{w^{u,i}_{d}}{b_{d}(w^{u,i}% _{d})}\right)\right\rVert\;\;}= ∥ divide start_ARG ( divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) end_ARG start_ARG ∥ ( divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = divide start_ARG ∥ italic_h ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) ( divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_h ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) ( divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ end_ARG
=1h(𝒘u,i)(w1u,ib1(w1u,i),,wdu,ibd(wdu,i))=g~(κ(𝒘u,i)).absent1delimited-∥∥superscript𝒘𝑢𝑖subscriptsuperscript𝑤𝑢𝑖1subscript𝑏1subscriptsuperscript𝑤𝑢𝑖1subscriptsuperscript𝑤𝑢𝑖𝑑subscript𝑏𝑑subscriptsuperscript𝑤𝑢𝑖𝑑~𝑔𝜅superscript𝒘𝑢𝑖\displaystyle=\frac{1}{\Big{\lVert}h(\bm{w}^{u,i})\left(\frac{w^{u,i}_{1}}{b_{% 1}(w^{u,i}_{1})},\ldots,\frac{w^{u,i}_{d}}{b_{d}(w^{u,i}_{d})}\right)\Big{% \rVert}}=\tilde{g}(\kappa(\bm{w}^{u,i})).= divide start_ARG 1 end_ARG start_ARG ∥ italic_h ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) ( divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , … , divide start_ARG italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ∥ end_ARG = over~ start_ARG italic_g end_ARG ( italic_κ ( bold_italic_w start_POSTSUPERSCRIPT italic_u , italic_i end_POSTSUPERSCRIPT ) ) .

The same reasoning holds for 𝒘l,isuperscript𝒘𝑙𝑖\bm{w}^{l,i}bold_italic_w start_POSTSUPERSCRIPT italic_l , italic_i end_POSTSUPERSCRIPT with i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d. ∎

B.5 Proof of Proposition 2.6

Proof.

Given any large u𝑢uitalic_u satisfying u>maxi=1,,d{log(2)/wi}>0𝑢subscript𝑖1𝑑2subscript𝑤𝑖0u>\max_{i=1,\dots,d}\{\log(2)/w_{i}\}>0italic_u > roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { roman_log ( 2 ) / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > 0, and applying similar reasoning as in the proof of Proposition 2.1 (Appendix B.1), we have that

Pr(mini𝒟{XE,i/wi}>u)Prsubscript𝑖𝒟subscript𝑋𝐸𝑖subscript𝑤𝑖𝑢\displaystyle\Pr\left(\min_{i\in\mathcal{D}}\{X_{E,i}/w_{i}\}>u\right)roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) =Pr(XE,i>wiu,i=1,,d)absentPrsubscript𝑋𝐸𝑖subscript𝑤𝑖𝑢𝑖1𝑑\displaystyle=\Pr\left(X_{E,i}>w_{i}u,i=1,\ldots,d\right)= roman_Pr ( italic_X start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT > italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u , italic_i = 1 , … , italic_d )
=Pr(Xi>wiulog2,i=1,,d)absentPrsubscript𝑋𝑖subscript𝑤𝑖𝑢2𝑖1𝑑\displaystyle=\Pr\left(X_{i}>w_{i}u-\log 2,i=1,\ldots,d\right)= roman_Pr ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u - roman_log 2 , italic_i = 1 , … , italic_d )
Pr(Xi>wiu,i=1,,d)similar-toabsentPrsubscript𝑋𝑖subscript𝑤𝑖𝑢𝑖1𝑑\displaystyle\sim\Pr\left(X_{i}>w_{i}u,i=1,\ldots,d\right)∼ roman_Pr ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u , italic_i = 1 , … , italic_d )
=Pr(mini𝒟{Xi/wi}>u).absentPrsubscript𝑖𝒟subscript𝑋𝑖subscript𝑤𝑖𝑢\displaystyle=\Pr\left(\min_{i\in\mathcal{D}}\{X_{i}/w_{i}\}>u\right).= roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) .

Taking u𝑢u\to\inftyitalic_u → ∞, we have

Pr(mini𝒟{Xi/wi}>u)=Pr(mini𝒟{XE,i/wi}>u)=L(eu;𝒘)eλ(𝒘)u,Prsubscript𝑖𝒟subscript𝑋𝑖subscript𝑤𝑖𝑢Prsubscript𝑖𝒟subscript𝑋𝐸𝑖subscript𝑤𝑖𝑢𝐿superscript𝑒𝑢𝒘superscript𝑒𝜆𝒘𝑢\Pr\left(\min_{i\in\mathcal{D}}\{X_{i}/w_{i}\}>u\right)=\Pr\left(\min_{i\in% \mathcal{D}}\{X_{E,i}/w_{i}\}>u\right)=L(e^{u};\bm{w})e^{-\lambda(\bm{w})u},roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) = roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) = italic_L ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ; bold_italic_w ) italic_e start_POSTSUPERSCRIPT - italic_λ ( bold_italic_w ) italic_u end_POSTSUPERSCRIPT ,

proving the statement. ∎

B.6 Proof of Proposition 2.7

Proof.

Given 𝒘𝒮d1𝒜𝒘superscript𝒮𝑑1𝒜\bm{w}\in\mathcal{S}^{d-1}\setminus\mathcal{A}bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A, set 𝒄:=sgn(𝒘)=(sgn(w1),,sgn(wd))Tassign𝒄sgn𝒘superscriptsgnsubscript𝑤1sgnsubscript𝑤𝑑𝑇\bm{c}:=\text{sgn}(\bm{w})=(\text{sgn}(w_{1}),\ldots,\text{sgn}(w_{d}))^{T}bold_italic_c := sgn ( bold_italic_w ) = ( sgn ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , sgn ( italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where sgn()sgn\text{sgn}(\cdot)sgn ( ⋅ ) denotes the regular signum function. Then, we have that

Pr(mini𝒟{Xi/wi}>u)Prsubscript𝑖𝒟subscript𝑋𝑖subscript𝑤𝑖𝑢\displaystyle\Pr\left(\min_{i\in\mathcal{D}}\{X_{i}/w_{i}\}>u\right)roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) =Pr(mini𝒟{ciXi/ciwi}>u)L𝒄𝑿(eu;𝒄𝒘)eλ𝒄𝑿(𝒄𝒘)u,absentPrsubscript𝑖𝒟subscript𝑐𝑖subscript𝑋𝑖subscript𝑐𝑖subscript𝑤𝑖𝑢subscript𝐿𝒄𝑿superscript𝑒𝑢𝒄𝒘superscript𝑒subscript𝜆𝒄𝑿𝒄𝒘𝑢\displaystyle=\Pr\left(\min_{i\in\mathcal{D}}\{c_{i}X_{i}/c_{i}w_{i}\}>u\right% )\to L_{\bm{c}\bm{X}}(e^{u};\bm{c}\bm{w})e^{-\lambda_{\bm{c}\bm{X}}(\bm{c}\bm{% w})u},= roman_Pr ( roman_min start_POSTSUBSCRIPT italic_i ∈ caligraphic_D end_POSTSUBSCRIPT { italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } > italic_u ) → italic_L start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ; bold_italic_c bold_italic_w ) italic_e start_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ) italic_u end_POSTSUPERSCRIPT ,

as u𝑢u\to\inftyitalic_u → ∞, where L𝒄𝑿(;𝒘)subscript𝐿𝒄𝑿𝒘L_{\bm{c}\bm{X}}(\cdot;\bm{w})italic_L start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ; bold_italic_w ) and λ𝒄𝑿()subscript𝜆𝒄𝑿\lambda_{\bm{c}\bm{X}}(\cdot)italic_λ start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) denote the slowly varying function and ADF for 𝒄𝑿𝒄𝑿\bm{c}\bm{X}bold_italic_c bold_italic_X as in equation (2), respectively. The result follows with (;𝒘)=L𝒄𝑿(;𝒄𝒘)𝒘subscript𝐿𝒄𝑿𝒄𝒘\mathcal{L}(\cdot;\bm{w})=L_{\bm{c}\bm{X}}(\cdot;\bm{c}\bm{w})caligraphic_L ( ⋅ ; bold_italic_w ) = italic_L start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ; bold_italic_c bold_italic_w ) and Λ(𝒘)=λ𝒄𝑿(𝒄𝒘)Λ𝒘subscript𝜆𝒄𝑿𝒄𝒘\Lambda(\bm{w})=\lambda_{\bm{c}\bm{X}}(\bm{c}\bm{w})roman_Λ ( bold_italic_w ) = italic_λ start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ). ∎

B.7 Proof of Proposition 2.8

Proof.

Let 𝒄=sgn(𝒘){1,1}d𝒄sgn𝒘superscript11𝑑\bm{c}=\text{sgn}(\bm{w})\in\{-1,1\}^{d}bold_italic_c = sgn ( bold_italic_w ) ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and define 𝒄𝑿𝒄𝑿\bm{c}\bm{X}bold_italic_c bold_italic_X and 𝒄𝒘𝒮+d1𝒄𝒘subscriptsuperscript𝒮𝑑1\bm{c}\bm{w}\in\mathcal{S}^{d-1}_{+}bold_italic_c bold_italic_w ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT as in Proposition 2.5, noting that 𝒄𝒘=𝒘subscriptnorm𝒄𝒘subscriptnorm𝒘||\bm{c}\bm{w}||_{\infty}=||\bm{w}||_{\infty}| | bold_italic_c bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. Further, let 𝒢𝒄𝑿subscript𝒢𝒄𝑿\partial\mathcal{G}_{\bm{c}\bm{X}}∂ caligraphic_G start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT and g𝒄𝑿()subscript𝑔𝒄𝑿g_{\bm{c}\bm{X}}(\cdot)italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( ⋅ ) denote the unit-level set and gauge function, respectively, for 𝒄𝑿𝒄𝑿\bm{c}\bm{X}bold_italic_c bold_italic_X. For any 𝔯{𝔯[0,1]:𝔯𝒄𝒘𝒢𝒄𝑿}𝔯conditional-set𝔯01𝔯subscript𝒄𝒘subscript𝒢𝒄𝑿\mathfrak{r}\in\{\mathfrak{r}\in[0,1]:\mathfrak{r}\mathcal{R}_{\bm{c}\bm{w}}% \cap\partial\mathcal{G}_{\bm{c}\bm{X}}\neq\emptyset\}fraktur_r ∈ { fraktur_r ∈ [ 0 , 1 ] : fraktur_r caligraphic_R start_POSTSUBSCRIPT bold_italic_c bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ≠ ∅ }, there must exist some j{1,,d}𝑗1𝑑j\in\{1,\dots,d\}italic_j ∈ { 1 , … , italic_d } and 𝒘𝒮+d1superscript𝒘subscriptsuperscript𝒮𝑑1\bm{w}^{*}\in\mathcal{S}^{d-1}_{+}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT such that

𝔯cjwj𝒄𝒘=wjg𝒄𝑿(𝒘).𝔯subscript𝑐𝑗subscript𝑤𝑗subscriptnorm𝒄𝒘subscriptsuperscript𝑤𝑗subscript𝑔𝒄𝑿superscript𝒘\mathfrak{r}\frac{c_{j}w_{j}}{||\bm{c}\bm{w}||_{\infty}}=\frac{w^{*}_{j}}{g_{% \bm{c}\bm{X}}(\bm{w}^{*})}.fraktur_r divide start_ARG italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG | | bold_italic_c bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG .

By Proposition 2.5, g𝒄𝑿(𝒘)=g(𝒘/𝒄)subscript𝑔𝒄𝑿superscript𝒘𝑔superscript𝒘𝒄g_{\bm{c}\bm{X}}(\bm{w}^{*})=g(\bm{w}^{*}/\bm{c})italic_g start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_g ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / bold_italic_c ), with 𝒘/𝒄:=(w1/c1,,wd/cd)𝒮d1𝒜assignsuperscript𝒘𝒄subscriptsuperscript𝑤1subscript𝑐1subscriptsuperscript𝑤𝑑subscript𝑐𝑑superscript𝒮𝑑1𝒜\bm{w}^{*}/\bm{c}:=(w^{*}_{1}/c_{1},\ldots,w^{*}_{d}/c_{d})\in\mathcal{S}^{d-1% }\setminus\mathcal{A}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / bold_italic_c := ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ∈ caligraphic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ∖ caligraphic_A and sgn(𝒘)=sgn(𝒘/𝒄)sgn𝒘sgnsuperscript𝒘𝒄\text{sgn}(\bm{w})=\text{sgn}(\bm{w}^{*}/\bm{c})sgn ( bold_italic_w ) = sgn ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / bold_italic_c ). Thus, we have

𝔯wj𝒘=wj/cjg(𝒘/𝒄),𝔯subscript𝑤𝑗subscriptnorm𝒘subscriptsuperscript𝑤𝑗subscript𝑐𝑗𝑔superscript𝒘𝒄\mathfrak{r}\frac{w_{j}}{||\bm{w}||_{\infty}}=\frac{w^{*}_{j}/c_{j}}{g(\bm{w}^% {*}/\bm{c})},fraktur_r divide start_ARG italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_g ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / bold_italic_c ) end_ARG ,

implying 𝔯{𝔯[0,1]:𝔯~𝒘𝒢}𝔯conditional-set𝔯01𝔯subscript~𝒘𝒢\mathfrak{r}\in\{\mathfrak{r}\in[0,1]:\mathfrak{r}\tilde{\mathcal{R}}_{\bm{w}}% \cap\partial\mathcal{G}\neq\emptyset\}fraktur_r ∈ { fraktur_r ∈ [ 0 , 1 ] : fraktur_r over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ }. Analogous reasoning shows that

𝔯{𝔯[0,1]:𝔯~𝒘𝒢}𝔯{𝔯[0,1]:𝔯𝒄𝒘𝒢𝒄𝑿},𝔯conditional-set𝔯01𝔯subscript~𝒘𝒢𝔯conditional-set𝔯01𝔯subscript𝒄𝒘subscript𝒢𝒄𝑿\mathfrak{r}\in\{\mathfrak{r}\in[0,1]:\mathfrak{r}\tilde{\mathcal{R}}_{\bm{w}}% \cap\partial\mathcal{G}\neq\emptyset\}\Rightarrow\mathfrak{r}\in\{\mathfrak{r}% \in[0,1]:\mathfrak{r}\mathcal{R}_{\bm{c}\bm{w}}\cap\partial\mathcal{G}_{\bm{c}% \bm{X}}\neq\emptyset\},fraktur_r ∈ { fraktur_r ∈ [ 0 , 1 ] : fraktur_r over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ } ⇒ fraktur_r ∈ { fraktur_r ∈ [ 0 , 1 ] : fraktur_r caligraphic_R start_POSTSUBSCRIPT bold_italic_c bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ≠ ∅ } ,

giving {𝔯[0,1]:𝔯~𝒘𝒢}={𝔯[0,1]:𝔯𝒄𝒘𝒢𝒄𝑿}conditional-set𝔯01𝔯subscript~𝒘𝒢conditional-set𝔯01𝔯subscript𝒄𝒘subscript𝒢𝒄𝑿\{\mathfrak{r}\in[0,1]:\mathfrak{r}\tilde{\mathcal{R}}_{\bm{w}}\cap\partial% \mathcal{G}\neq\emptyset\}=\{\mathfrak{r}\in[0,1]:\mathfrak{r}\mathcal{R}_{\bm% {c}\bm{w}}\cap\partial\mathcal{G}_{\bm{c}\bm{X}}\neq\emptyset\}{ fraktur_r ∈ [ 0 , 1 ] : fraktur_r over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ } = { fraktur_r ∈ [ 0 , 1 ] : fraktur_r caligraphic_R start_POSTSUBSCRIPT bold_italic_c bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ≠ ∅ } and 𝔯~𝒘=max{𝔯[0,1]:𝔯~𝒘𝒢}=𝔯𝒄𝒘subscript~𝔯𝒘:𝔯01𝔯subscript~𝒘𝒢subscript𝔯𝒄𝒘\tilde{\mathfrak{r}}_{\bm{w}}=\max\{\mathfrak{r}\in[0,1]:\mathfrak{r}\tilde{% \mathcal{R}}_{\bm{w}}\cap\partial\mathcal{G}\neq\emptyset\}=\mathfrak{r}_{\bm{% c}\bm{w}}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT = roman_max { fraktur_r ∈ [ 0 , 1 ] : fraktur_r over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G ≠ ∅ } = fraktur_r start_POSTSUBSCRIPT bold_italic_c bold_italic_w end_POSTSUBSCRIPT. Applying Proposition 3.3 of Nolde and Wadsworth, (2022), alongside Proposition 2.1, we see that

Λ(𝒘)=λ𝒄𝑿(𝒄𝒘)=𝒄𝒘×𝔯𝒄𝒘1=𝒘×𝔯~𝒘1,Λ𝒘subscript𝜆𝒄𝑿𝒄𝒘subscriptnorm𝒄𝒘superscriptsubscript𝔯𝒄𝒘1subscriptnorm𝒘superscriptsubscript~𝔯𝒘1\displaystyle\Lambda(\bm{w})=\lambda_{\bm{c}\bm{X}}(\bm{c}\bm{w})=||\bm{c}\bm{% w}||_{\infty}\times\mathfrak{r}_{\bm{c}\bm{w}}^{-1}=||\bm{w}||_{\infty}\times% \tilde{\mathfrak{r}}_{\bm{w}}^{-1},roman_Λ ( bold_italic_w ) = italic_λ start_POSTSUBSCRIPT bold_italic_c bold_italic_X end_POSTSUBSCRIPT ( bold_italic_c bold_italic_w ) = | | bold_italic_c bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT × fraktur_r start_POSTSUBSCRIPT bold_italic_c bold_italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT × over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

completing the proof.

An illustration of Proposition 2.8 is given in Figure A1.

B.8 Proof of Corollary 2.3

Proof.

Let 𝒘/g(𝒘)𝒢𝒘𝑔𝒘𝒢\bm{w}/g(\bm{w})\in\partial\mathcal{G}bold_italic_w / italic_g ( bold_italic_w ) ∈ ∂ caligraphic_G denote coordinates on the unit-level set. Recalling the definitions of 𝔯~𝒘subscript~𝔯𝒘\tilde{\mathfrak{r}}_{\bm{w}}over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT and ~𝒘subscript~𝒘\tilde{\mathcal{R}}_{\bm{w}}over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT, we must have that 𝔯~𝒘|wi|/𝒘|wi|/g(𝒘)subscript~𝔯𝒘subscript𝑤𝑖subscriptnorm𝒘subscript𝑤𝑖𝑔𝒘\tilde{\mathfrak{r}}_{\bm{w}}|w_{i}|/||\bm{w}||_{\infty}\geq|w_{i}|/g(\bm{w})over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | / | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≥ | italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | / italic_g ( bold_italic_w ) for all i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d; to see this, consider the two possibilities 𝒘/g(𝒘){𝔯~𝒘~𝒘𝒢}𝒘𝑔𝒘subscript~𝔯𝒘subscript~𝒘𝒢\bm{w}/g(\bm{w})\in\{\tilde{\mathfrak{r}}_{\bm{w}}\tilde{\mathcal{R}}_{\bm{w}}% \cap\partial\mathcal{G}\}bold_italic_w / italic_g ( bold_italic_w ) ∈ { over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G } and 𝒘/g(𝒘){𝔯~𝒘~𝒘𝒢}𝒘𝑔𝒘subscript~𝔯𝒘subscript~𝒘𝒢\bm{w}/g(\bm{w})\notin\{\tilde{\mathfrak{r}}_{\bm{w}}\tilde{\mathcal{R}}_{\bm{% w}}\cap\partial\mathcal{G}\}bold_italic_w / italic_g ( bold_italic_w ) ∉ { over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT over~ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ∩ ∂ caligraphic_G } in turn. Consequently,

𝔯~𝒘𝒘mini=1,,d{wig(𝒘)wi}=1g(𝒘).subscript~𝔯𝒘subscriptnorm𝒘subscript𝑖1𝑑subscript𝑤𝑖𝑔𝒘subscript𝑤𝑖1𝑔𝒘\frac{\tilde{\mathfrak{r}}_{\bm{w}}}{||\bm{w}||_{\infty}}\geq\min_{i=1,\dots,d% }\left\{\frac{w_{i}}{g(\bm{w})w_{i}}\right\}=\frac{1}{g(\bm{w})}.divide start_ARG over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_ARG start_ARG | | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG ≥ roman_min start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { divide start_ARG italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_g ( bold_italic_w ) italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG } = divide start_ARG 1 end_ARG start_ARG italic_g ( bold_italic_w ) end_ARG .

Recalling from Proposition 2.8 that 𝒘Λ(𝒘)1=𝔯~𝒘subscriptnorm𝒘Λsuperscript𝒘1subscript~𝔯𝒘||\bm{w}||_{\infty}\Lambda(\bm{w})^{-1}=\tilde{\mathfrak{r}}_{\bm{w}}| | bold_italic_w | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT roman_Λ ( bold_italic_w ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over~ start_ARG fraktur_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT, the first inequality follows. The second inequality follows directly from Proposition 2.8.

B.9 Algorithm for computing the extended ADF diagnostic

Consider a sample {𝒙j}j=1nsuperscriptsubscriptsubscript𝒙𝑗𝑗1𝑛\{\bm{x}_{j}\}_{j=1}^{n}{ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of independent copies of 𝑿𝑿\bm{X}bold_italic_X, with {(rj,𝒘j)}j=1nsuperscriptsubscriptsubscript𝑟𝑗subscript𝒘𝑗𝑗1𝑛\{(r_{j},\bm{w}_{j})\}_{j=1}^{n}{ ( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denoting the corresponding radial-angular observations, and let q𝑞qitalic_q denote quantile level close to 1. The extended ADF diagnostic is computed using Algorithm 1.

Algorithm 1 Computing the extended ADF diagnostic.
b1𝑏1b\leftarrow 1italic_b ← 1
for j1𝑗1j\leftarrow 1italic_j ← 1 to n𝑛nitalic_n do
     for k1𝑘1k\leftarrow 1italic_k ← 1 to n𝑛nitalic_n do
         t𝒘jkmini=1,,d{xk,i/wj,i}superscriptsubscript𝑡subscript𝒘𝑗𝑘subscript𝑖1𝑑subscript𝑥𝑘𝑖subscript𝑤𝑗𝑖t_{\bm{w}_{j}}^{k}\leftarrow\min_{i=1,\ldots,d}\{x_{k,i}/w_{j,i}\}italic_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ← roman_min start_POSTSUBSCRIPT italic_i = 1 , … , italic_d end_POSTSUBSCRIPT { italic_x start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT }
     end for
     𝐭𝒘j(t𝒘j1,,t𝒘jn)subscript𝐭subscript𝒘𝑗superscriptsubscript𝑡subscript𝒘𝑗1superscriptsubscript𝑡subscript𝒘𝑗𝑛\mathbf{t}_{\bm{w}_{j}}\leftarrow(t_{\bm{w}_{j}}^{1},\ldots,t_{\bm{w}_{j}}^{n})bold_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ( italic_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )
     u^𝒘jquantile(𝐭𝒘j,q)subscript^𝑢subscript𝒘𝑗quantilesubscript𝐭subscript𝒘𝑗𝑞\hat{u}_{\bm{w}_{j}}\leftarrow\texttt{quantile}(\mathbf{t}_{\bm{w}_{j}},q)over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← quantile ( bold_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_q )
     if rj>u^𝒘jsubscript𝑟𝑗subscript^𝑢subscript𝒘𝑗r_{j}>\hat{u}_{\bm{w}_{j}}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT then
         ebΛ^(𝒘j)(rju^𝒘j)subscript𝑒𝑏^Λsubscript𝒘𝑗subscript𝑟𝑗subscript^𝑢subscript𝒘𝑗e_{b}\leftarrow\hat{\Lambda}(\bm{w}_{j})(r_{j}-\hat{u}_{\bm{w}_{j}})italic_e start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ← over^ start_ARG roman_Λ end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
         bb+1𝑏𝑏1b\leftarrow b+1italic_b ← italic_b + 1
     end if
end for

Note that rj=t𝒘jjsubscript𝑟𝑗superscriptsubscript𝑡subscript𝒘𝑗𝑗r_{j}=t_{\bm{w}_{j}}^{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT; hence, the radial component can be considered an observation of the min-projection for the corresponding angle. We compare the resulting quantiles, (e1,,eb)Tsuperscriptsubscript𝑒1subscript𝑒𝑏𝑇(e_{1},\ldots,e_{b})^{T}( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, to the corresponding theoretical quantiles using a QQ plot.

Appendix C Supplementary figures and tables

Refer to caption
Figure A1: Plot illustrating the result described in Proposition 2.8. Here, the red line denotes the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G of a Gaussian copula with ρ=0.5𝜌0.5\rho=-0.5italic_ρ = - 0.5. Moreover, the blue and cyan points, and regions, illustrate the re-scaling to obtain r~𝒘subscript~𝑟𝒘\tilde{r}_{\bm{w}}over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT for the angle 𝒘=(0.8,10.82)𝒘0.81superscript0.82\bm{w}=(0.8,-\sqrt{1-0.8^{2}})bold_italic_w = ( 0.8 , - square-root start_ARG 1 - 0.8 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ). Setting 𝒄=sgn(𝒘)=(1,1)𝒄sgn𝒘11\bm{c}=\text{sgn}(\bm{w})=(1,-1)bold_italic_c = sgn ( bold_italic_w ) = ( 1 , - 1 ), the green dotted line denotes the unit-level set 𝒢𝒢\partial\mathcal{G}∂ caligraphic_G of 𝒄𝑿𝒄𝑿\bm{c}\bm{X}bold_italic_c bold_italic_X, with the corresponding scaling procedure for the angle 𝒄𝒘=(0.8,10.82)𝒄𝒘0.81superscript0.82\bm{c}\bm{w}=(0.8,\sqrt{1-0.8^{2}})bold_italic_c bold_italic_w = ( 0.8 , square-root start_ARG 1 - 0.8 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) illustrated by the orange points and regions. One can observe that the rescaling procedures are analogous in both orthants.
Refer to caption
Refer to caption
Figure A2: Boxplots of the estimated ISE and MALE for the varying sample size, n𝑛nitalic_n, and dimension, d𝑑ditalic_d, simulation study described in Section 4.2. Here the radial quantile level is τ=0.75𝜏0.75\tau=0.75italic_τ = 0.75 and the architecture for the rescaled gauge function has N=3𝑁3N=3italic_N = 3 hidden layers (each with width 64).
Table A1: Estimates of the median (2.5%percent2.52.5\%2.5 %, 97.5%percent97.597.5\%97.5 % percentile) of the ISE and MALE for the simulation study of the architecture for the gauge function g()𝑔g(\cdot)italic_g ( ⋅ ), described in Section 4.2. Here the sample size is n=100,000𝑛100000n=100,000italic_n = 100 , 000, the radial quantile level is τ=0.75𝜏0.75\tau=0.75italic_τ = 0.75, and the considered dimensions are d=5𝑑5d=5italic_d = 5 and d=8𝑑8d=8italic_d = 8.
d=5𝑑5d=5italic_d = 5 d=8𝑑8d=8italic_d = 8
Copula g()𝑔g(\cdot)italic_g ( ⋅ ) ISE MALE ISE MALE
Gaussian (16)16(16)( 16 ) 0.44 (0.27, 1.87) 4.58 (2.85, 9.03) 2.52 (0.80. 5.67) 14.4 (6.48, 20.8)
(16,16)1616(16,16)( 16 , 16 ) 0.16 (0.13, 0.23) 1.43 (0.50, 2.81) 0.28 (0.14, 0.93) 3.12 (1.02, 8.28)
(16,16,16)161616(16,16,16)( 16 , 16 , 16 ) 0.14 (0.11, 0.19) 1.11 (0.39, 2.27) 0.19 (0.11, 0.62) 3.87 (1.71, 6.83)
(16,16,16,16)16161616(16,16,16,16)( 16 , 16 , 16 , 16 ) 0.14 (0.11, 0.22) 1.10 (0.35, 2.52) 0.23 (0.11, 0.61) 3.61 (1.19, 7.20)
(64)64(64)( 64 ) 0.19 (0.16, 0.25) 1.88 (0.98, 3.02) 0.33 (0.22, 0.49) 2.44 (1.27, 4.49)
(64,64)6464(64,64)( 64 , 64 ) 0.14 (0.11, 0.20) 0.91 (0.28, 1.87) 0.16 (0.11, 0.24) 4.52 (2.31, 7.13)
(64,64,64)646464(64,64,64)( 64 , 64 , 64 ) 0.13 (0.10, 0.17) 0.88 (0.36, 1.80) 0.11 (0.08, 0.19) 5.94 (3.58, 9.14)
(64,64,64,64)64646464(64,64,64,64)( 64 , 64 , 64 , 64 ) 0.12 (0.10, 0.18) 0.97 (0.37, 2.04) 0.11 (0.08, 0.17) 6.44 (3.13, 8.83)
Logistic (16)16(16)( 16 ) 0.49 (0.40, 0.58) 0.35 (0.28, 0.45) 0.36 (0.26, 0.53) 0.53 (0.36, 0.66)
(16,16)1616(16,16)( 16 , 16 ) 0.67 (0.53, 0.91) 0.33 (0.24, 0.44) 0.52 (0.38, 0.69) 0.54 (0.39, 0.69)
(16,16,16)161616(16,16,16)( 16 , 16 , 16 ) 0.78 (0.62, 1.04) 0.32 (0.16, 0.44) 0.62 (0.50, 0.81) 0.54 (0.40, 0.66)
(16,16,16,16)16161616(16,16,16,16)( 16 , 16 , 16 , 16 ) 0.85 (0.65, 1.03) 0.32 (0.21, 0.45) 0.66 (0.52, 0.90) 0.54 (0.40, 0.69)
(64)64(64)( 64 ) 0.48 (0.39, 0.64) 0.34 (0.28, 0.45) 0.37 (0.31, 0.54) 0.56 (0.43, 0.68)
(64,64) 0.69 (0.56, 1.01) 0.35 (0.29, 0.47) 0.57 (0.45, 0.78) 0.54 (0.42, 0.67)
(64,64,64) 0.84 (0.66, 1.07) 0.35 (0.28, 0.46) 0.66 (0.53, 0.87) 0.55 (0.36, 0.68)
(64,64,64,64) 0.91 (0.73, 1.07) 0.35 (0.23, 0.45) 0.78 (0.62, 0.98) 0.54 (0.39, 0.67)
Student-t (16) 12.0 (6.92, 15.0) 1.73 (1.59, 1.97) 25.0 (17.7, 28.8) 3.74 (2.60, 5.40)
(16,16) 4.39 (0.97, 7.51) 1.75 (1.58, 2.38) 16.2 (11.3, 21.1) 2.97 (2.36, 3.97)
(16,16,16) 0.62 (0.25, 3.21) 2.09 (1.70, 2.72) 9.44 (4.05, 15.3) 3.06 (2.40, 4.70)
(16,16,16,16) 0.43 (0.20, 2.07) 2.24 (1.77, 2.78) 6.42 (2.90, 13.1) 3.22 (2.43, 5.98)
(64) 2.43 (1.33, 4.41) 1.76 (1.61, 2.12) 6.76 (4.36, 13.98) 3.13 (2.30, 4.83)
(64,64) 0.30 (0.15, 0.64) 2.07 (1.64, 2.86) 1.20 (0.42, 3.83) 4.71 (2.80, 6.28)
(64,64,64) 0.26 (0.15, 0.47) 2.26 (1.76, 2.80) 0.43 (0.25, 0.68) 6.28 (4.40, 9.00)
(64,64,64,64) 0.26 (0.18, 0.45) 2.30 (1.87, 2.98) 0.40(0.25, 0.70) 6.42 (4.13, 9.47)
Refer to caption
Refer to caption
Refer to caption
Figure A3: Goodness-of-fit diagnostics for location 01: Truncated gamma QQ plot (left), the extended ADF diagnostic (centre), and return level probability estimates (right) for the DeepGauge model fitted to hs, ws, and mslp.
Refer to caption
Refer to caption
Refer to caption
Figure A4: Goodness-of-fit diagnostics for location 46: Truncated gamma QQ plot (left), the extended ADF diagnostic (centre), and return level probability estimates (right) for the DeepGauge model fitted to hs, ws, and mslp.
Refer to caption
Refer to caption
Refer to caption
Figure A5: Goodness-of-fit diagnostics for location 85: Truncated gamma QQ plot (left), the extended ADF diagnostic (centre), and return level probability estimates (right) for the DeepGauge model fitted to hs, ws, and mslp.
Refer to caption
Refer to caption
Figure A6: The estimated unit-level set (left) and the extended ADF (right) for hs, ws, and mslp at location 46 of the metocean data. Data for the three variables are plotted on standard Laplace margins.
Refer to caption
Refer to caption
Figure A7: The estimated unit-level set (left) and the extended ADF (right) for hs, ws, and mslp at location 85 of the metocean data. Data for the three variables are plotted on standard Laplace margins.
Refer to caption
Figure A8: Pairwise distributions of ws, hs, and mslp from the metocean data at location 46, transformed into common Laplace margins; the red shapes outline the corresponding estimated bivariate unit-level set slices.
Refer to caption
Figure A9: Pairwise distributions of ws, hs, and mslp from the metocean data at location 85, transformed into common Laplace margins; the red shapes outline the corresponding estimated bivariate unit-level set slices.
Refer to caption
Refer to caption
Refer to caption
Figure A10: Goodness-of-fit diagnostics for hs: Truncated gamma QQ plot (left), the extended ADF diagnostic (centre), and return level probability estimates (right) for the DeepGauge model fitted to wave height data at 5 locations.

References

  • Allaire and Chollet, (2021) Allaire, J. and Chollet, F. (2021). keras: R Interface to ’Keras’. R package version 2.7.0.
  • Balkema and Nolde, (2010) Balkema, G. and Nolde, N. (2010). Asymptotic independence for unimodal densities. Advances in Applied Probability, 42:411–432.
  • Boulaguiem et al., (2022) Boulaguiem, Y., Zscheischler, J., Vignotto, E., van der Wiel, K., and Engelke, S. (2022). Modeling and simulating spatial extremes by combining extreme value theory with generative adversarial networks. Environmental Data Science, 1:e5.
  • Cisneros et al., (2024) Cisneros, D., Richards, J., Dahal, A., Lombardo, L., and Huser, R. (2024). Deep graphical regression for jointly moderate and extreme Australian wildfires. Spatial Statistics, 59:100811.
  • Coles, (2001) Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer London.
  • Coles et al., (1999) Coles, S., Heffernan, J., and Tawn, J. (1999). Dependence measures for multivariate extremes. Extremes, 2:339–365.
  • Cooley et al., (2019) Cooley, D., Thibaud, E., Castillo, F., and Wehner, M. F. (2019). A nonparametric method for producing isolines of bivariate exceedance probabilities. Extremes, 22:373–390.
  • Davis et al., (1988) Davis, R. A., Mulrow, E., and Resnick, S. I. (1988). Almost sure limit sets of random samples in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Advances in Applied Probability, 20:573–599.
  • Dawkins and Stephenson, (2018) Dawkins, L. C. and Stephenson, D. B. (2018). Quantification of extremal dependence in spatial natural hazard footprints: independence of windstorm gust speeds and its impact on aggregate losses. Natural Hazards and Earth System Sciences, 18:2933–2949.
  • Einmahl and Segers, (2009) Einmahl, J. H. and Segers, J. (2009). Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. Annals of Statistics, 37:2953–2989.
  • Ewans and Jonathan, (2014) Ewans, K. and Jonathan, P. (2014). Recent advances in the analysis of extreme metocean events. In Offshore Technology Conference Asia, pages 3009–3019. OTC.
  • Fisher, (1969) Fisher, L. (1969). Limiting Sets and Convex Hulls of Samples from Product Measures. The Annals of Mathematical Statistics, 40:1824–1832.
  • Goodfellow et al., (2016) Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
  • Hasan et al., (2022) Hasan, A., Elkhalil, K., Ng, Y., Pereira, J. M., Farsiu, S., Blanchet, J., and Tarokh, V. (2022). Modeling extremes with d𝑑ditalic_d-max-decreasing neural networks. In Uncertainty in Artificial Intelligence, pages 759–768. PMLR.
  • Hayes, (1956) Hayes, C. A. (1956). The Heine-Borel theorem. The American Mathematical Monthly, 63:180.
  • Heffernan and Tawn, (2001) Heffernan, J. E. and Tawn, J. A. (2001). Extreme value analysis of a large designed experiment: a case study in bulk carrier safety. Extremes, 4:359–378.
  • Heffernan and Tawn, (2004) Heffernan, J. E. and Tawn, J. A. (2004). A conditional approach for multivariate extreme values. Journal of the Royal Statistical Society. Series B: Methodology, 66:497–546.
  • Huser et al., (2024) Huser, R., Opitz, T., and Wadsworth, J. (2024). Modeling of spatial extremes in environmental data science: Time to move away from max-stable processes. arXiv, 2401.17430.
  • Keef et al., (2013) Keef, C., Papastathopoulos, I., and Tawn, J. A. (2013). Estimation of the conditional distribution of a multivariate variable given that one of its components is large: Additional constraints for the Heffernan and Tawn model. Journal of Multivariate Analysis, 115:396–404.
  • Kingma and Ba, (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • Kinoshita and Resnick, (1991) Kinoshita, K. and Resnick, S. I. (1991). Convergence of Scaled Random Samples in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The Annals of Probability, 19:1640–1663.
  • Koenker et al., (2017) Koenker, R., Chernozhukov, V., He, X., and Peng, L. (2017). Handbook of Quantile Regression. Chapman and Hall/CRC.
  • Lafon et al., (2023) Lafon, N., Naveau, P., and Fablet, R. (2023). A VAE approach to sample multivariate extremes. arXiv preprint arXiv:2306.10987.
  • Ledford and Tawn, (1996) Ledford, A. W. and Tawn, J. A. (1996). Statistics for near independence in multivariate extreme values. Biometrika, 83:169–187.
  • Ledford and Tawn, (1997) Ledford, A. W. and Tawn, J. A. (1997). Modelling dependence within joint tail regions. Journal of the Royal Statistical Society. Series B: Methodology, 59:475–499.
  • Mackay and Haselsteiner, (2021) Mackay, E. and Haselsteiner, A. F. (2021). Marginal and total exceedance probabilities of environmental contours. Marine Structures, 75:1–24.
  • Mackay and Jonathan, (2023) Mackay, E. and Jonathan, P. (2023). Modelling multivariate extremes through angular-radial decomposition of the density function. arXiv, 2310.12711.
  • (28) Majumder, R., Reich, B. J., and Shaby, B. A. (2024a). Modeling extremal streamflow using deep learning approximations and a flexible spatial process. The Annals of Applied Statistics, 18(2):1519 – 1542.
  • (29) Majumder, R., Shaby, B. A., Reich, B. J., and Cooley, D. (2024b). Semiparametric estimation of the shape of the limiting bivariate point cloud. arXiv, 2306.13257.
  • (30) Murphy-Barltrop, C. J. R., Mackay, E., and Jonathan, P. (2024a). Inference for bivariate extremes via a semi-parametric angular-radial model. arXiv, 2401.07259.
  • Murphy-Barltrop and Wadsworth, (2024) Murphy-Barltrop, C. J. R. and Wadsworth, J. L. (2024). Modelling non-stationarity in asymptotically independent extremes. arXiv, 2203.05860.
  • Murphy-Barltrop et al., (2023) Murphy-Barltrop, C. J. R., Wadsworth, J. L., and Eastoe, E. F. (2023). New estimation methods for extremal bivariate return curves. Environmetrics, e2797:1–22.
  • (33) Murphy-Barltrop, C. J. R., Wadsworth, J. L., and Eastoe, E. F. (2024b). Improving estimation for asymptotically independent bivariate extremes via global estimators for the angular dependence function. arXiv, 2303.13237.
  • Neumann, (1951) Neumann, J. V. (1951). Various techniques used in connection with random digits. NBS Applied Mathematics Series, 12:36–38.
  • Nolde, (2014) Nolde, N. (2014). Geometric interpretation of the residual dependence coefficient. Journal of Multivariate Analysis, 123:85–95.
  • Nolde and Wadsworth, (2022) Nolde, N. and Wadsworth, J. L. (2022). Linking representations for multivariate extremes via a limit set. Advances in Applied Probability, 54:688–717.
  • Opitz, (2016) Opitz, T. (2016). Modeling asymptotically independent spatial extremes based on Laplace random fields. Spatial Statistics, 16:1–18.
  • Papastathopoulos et al., (2024) Papastathopoulos, I., de Monte, L., Campbell, R., and Rue, H. (2024). Statistical inference for radially-stable generalized Pareto distributions and return level-sets in geometric extremes. arXiv, 2310.06130.
  • Pasche and Engelke, (2022) Pasche, O. C. and Engelke, S. (2022). Neural networks for extreme quantile regression with an application to forecasting of flood risk. arXiv preprint arXiv:2208.07590.
  • Prechelt, (2002) Prechelt, L. (2002). Early stop**-but when? In Neural Networks: Tricks of the Trade, pages 55–69. Springer.
  • Ramos and Ledford, (2009) Ramos, A. and Ledford, A. (2009). A new class of models for bivariate joint tails. Journal of the Royal Statistical Society. Series B: Methodology, 71:219–241.
  • Reistad et al., (2011) Reistad, M., Øyvind Breivik, Haakenstad, H., Aarnes, O. J., Furevik, B. R., and Bidlot, J.-R. (2011). A high-resolution hindcast of wind and waves for the North Sea, the Norwegian Sea, and the Barents Sea. Journal of Geophysical Research, 116:C05019.
  • Resnick, (2002) Resnick, S. (2002). Hidden Regular Variation, Second Order Regular Variation and Asymptotic Independence. Extremes, 5:303–336.
  • Richards and Huser, (2022) Richards, J. and Huser, R. (2022). Regression modelling of spatiotemporal extreme US wildfires via partially-interpretable neural networks. arXiv preprint arXiv:2208.07581.
  • Richards and Huser, (2024) Richards, R. and Huser, R. (2024). Extreme Quantile Regression with Deep Learning. In de Carvalho, M., Huser, R., Naveau, P., and Reich, B. J., editors, Handbook on Statistics of Extremes. Chapman & Hall / CRC.
  • Rootzén and Tajvidi, (2006) Rootzén, H. and Tajvidi, N. (2006). Multivariate generalized Pareto distributions. Bernoulli, 12:917–930.
  • Rothfuss et al., (2019) Rothfuss, J., Ferreira, F., Walther, S., and Ulrich, M. (2019). Conditional density estimation with neural networks: Best practices and benchmarks. arXiv preprint arXiv:1903.00954.
  • Ruzgas et al., (2021) Ruzgas, T., Lukauskas, M., and Čepkauskas, G. (2021). Nonparametric Multivariate Density Estimation: Case Study of Cauchy Mixture Model. Mathematics, 9:2717.
  • Shooter et al., (2021) Shooter, R., Ross, E., Ribal, A., Young, I. R., and Jonathan, P. (2021). Spatial dependence of extreme seas in the North East Atlantic from satellite altimeter measurements. Environmetrics, 32:1–15.
  • Shooter et al., (2022) Shooter, R., Ross, E., Ribal, A., Young, I. R., and Jonathan, P. (2022). Multivariate spatial conditional extremes for extreme ocean environments. Ocean Engineering, 247:110647.
  • (51) Simpson, E. S. and Tawn, J. A. (2024a). Estimating the limiting shape of bivariate scaled sample clouds: with additional benefits of self-consistent inference for existing extremal dependence properties. arXiv, 2207.02626.
  • (52) Simpson, E. S. and Tawn, J. A. (2024b). Inference for new environmental contours using extreme value analysis. Journal of Agricultural, Biological and Environmental Statistics.
  • Simpson et al., (2020) Simpson, E. S., Wadsworth, J. L., and Tawn, J. A. (2020). Determining the dependence structure of multivariate extremes. Biometrika, 107:513–532.
  • Tawn, (1990) Tawn, J. A. (1990). Modelling multivariate extreme value distributions. Biometrika, 77:245–253.
  • Wadsworth and Campbell, (2024) Wadsworth, J. L. and Campbell, R. (2024). Statistical inference for multivariate extremes via a geometric approach. Journal of the Royal Statistical Society Series B: Methodology, to appear.
  • Wadsworth and Tawn, (2013) Wadsworth, J. L. and Tawn, J. A. (2013). A new representation for multivariate tail probabilities. Bernoulli, 19:2689–2714.
  • Zhang et al., (2023) Zhang, L., Ma, X., Wikle, C. K., and Huser, R. (2023). Flexible and efficient spatial extremes emulation via variational autoencoders. arXiv preprint arXiv:2307.08079.