Regularized estimation of Monge-Kantorovich quantiles for spherical data

Bernard Bercu, Jérémie Bigot and Gauthier Thurin Université de Bordeaux
Institut de Mathématiques de Bordeaux et CNRS (UMR 5251)

Abstract.

Tools from optimal transport (OT) theory have recently been used to define a notion of quantile function for directional data. In practice, regularization is mandatory for applications that require out-of-sample estimates. To this end, we introduce a regularized estimator built from entropic optimal transport, by extending the definition of the entropic map to the spherical setting. We propose a stochastic algorithm to directly solve a continuous OT problem between the uniform distribution and a target distribution, by expanding Kantorovich potentials in the basis of spherical harmonics. In addition, we define the directional Monge-Kantorovich depth, a companion concept for OT-based quantiles. We show that it benefits from desirable properties related to Liu-Zuo-Serfling axioms for the statistical analysis of directional data. Building on our regularized estimators, we illustrate the benefits of our methodology for data analysis.

The authors gratefully acknowledge financial support from the Agence Nationale de la Recherche (MaSDOL grant ANR-19-CE23-0017).

Keywords: Spherical data; Directional statistics; Monge-Kantorovich quantiles; Entropic Optimal Transport; Spherical harmonics; Fast Fourier transform; Stochastic optimisation.

AMS classifications: 62H12, 62G20, 62L20.

1. Introduction

1.1. Quantiles for directional data using optimal transport

In various situations, data naturally correspond to directions that are modeled as observations belonging to the circle or the unit $d$ -sphere ${\mathbb{S}}^{d-1}$ for $d\geq 2$ . Such observations, referred to as directional data, can be found in various applications including wildfires [2], gene expressions [23], or cosmology [55] to name but a few. Directional statistics [54, 47, 48, 68] is the field that brings together the corresponding models, methods and applications for statistical inference.

In this paper, we focus on the concept of quantiles for directional data. For real random variables, the notion of quantile is a well established statistical concept thanks to the canonical ordering of observations on the real line. Beyond the setting of distributions with rotational symmetry [46], the absence of a canonical ordering on ${\mathbb{S}}^{d-1}$ makes the definition of quantiles for directional data more involved.

A recent line of research in nonparametric statistics [12, 36, 37, 35] deals with the use of the theory of optimal transport (OT) to define Monge-Kantorovich (MK) quantiles for multivariate data, that are also referred to as center-outward quantiles. Desirable properties of MK quantiles include ancillarity, distribution-freeness of associated ranks, and consistence with the univariate setting [36], together with connections to the celebrated Tukey’s notion of statistical depth [12] and Mahalanobis distance [34]. The concept of MK quantiles has also proven to be fruitful in many applications, including statistical testing [32, 39, 63, 76, 77], regression [10, 19], risk measurement [4], and Lorenz maps [24, 38].

This approach has recently been applied in [37] to obtain a new notion of center-outward quantiles for directional data. Starting from independent and identically distributed ( $i.i.d.$ ) directional data $X_{1},\cdots,X_{n}$ sampled from a target measure $\nu$ , the main idea in [37] is to define an empirical quantile function $\mathbf{Q}_{n}$ as an OT map, from $\mu_{{\mathbb{S}}^{d-1}}$ the uniform probability distribution on ${\mathbb{S}}^{d-1}$ towards the empirical measure $\widehat{\nu}_{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}$ , with a transport cost equal to the squared Riemannian distance. In this manner, if $U$ denotes a random variable with distribution $\mu_{{\mathbb{S}}^{d-1}}$ , the random vector $\mathbf{Q}_{n}(U)$ follows the empirical distribution of the observations, which is consistent with the standard univariate quantile function. Many statistical properties of these directional quantiles and related notions of ranks, signs and MANOVA are then investigated in [37]. Various numerical experiments reported in [37] also illustrate the benefits of OT to compute relevant quantiles for directional data.

1.2. Main contributions

To compute the estimator $\mathbf{Q}_{n}$ , it is proposed in [37] to first approximate the uniform measure $\mu_{{\mathbb{S}}^{d-1}}$ by a “regular" grid of $n$ -points over the unit sphere ${\mathbb{S}}^{d-1}$ . Then, the empirical quantile function is defined through the discrete OT problem between this $n$ -points grid and $\widehat{\nu}_{n}$ . However, the computational cost of finding a numerical solution to a discrete OT problem is known to scale cubically in the number of observations [18]. Moreover, being a matching between two discrete distributions, the resulting quantile function does not provide out-of-sample estimates which is desirable in many statistical applications.

To circumvent these issues, we suggest to adapt the stochastic algorithm developed in [7] dealing with multivariate data in ${\mathbb{R}}^{d}$ , that is not assumed to belong to the unit $d$ -sphere. The method in [7] benefits from entropic regularization [16], that is well-known for its computational advantages [18]. By expanding Kantorovich dual potentials in their series of Fourier coefficients, each iteration of the stochastic algorithm in [7] reduces to the use of the Fast Fourier Transform (FFT). On the sphere ${\mathbb{S}^{2}}$ , dual potentials can be parameterized via spherical harmonics coefficients instead, that is the analog of the Fourier basis for square-integrable functions on ${\mathbb{S}^{2}}$ . In this manner, using a sequence of random variables $X_{1},\ldots,X_{n}$ sampled from a target distribution $\nu$ supported on ${\mathbb{S}^{2}}$ , we construct a stochastic algorithm in the space of spherical harmonics coefficients. In practice, this algorithm depends on the choice of a grid of points in ${\mathbb{S}^{2}}$ of size $\mathcal{O}\left(p^{2}\right)$ to implement a FFT on ${\mathbb{S}^{2}}$ [84]. The computational cost at each iteration is thus of order $\mathcal{O}\left(p^{2}\log^{2}(p)\right)$ [45].

Furthermore, we also derive an estimator of the quantile function for spherical data that is smoother than $\mathbf{Q}_{n}$ . This regularized quantile function is a spherical counterpart of the entropic map [69], following classical results on entropic optimal transport in the euclidean space ${\mathbb{R}}^{d}$ .

Finally, we introduce the new notion of MK directional statistical depth, in accordance with the euclidean MK depth [12]. We discuss its properties relative to traditional Liu-Zuo-Serfling axioms for the statistical analysis of directional data. Moreover, we study statistical applications built from it, and provide a comparison with other estimators, to better highlight the potential of entropic regularization for spherical quantiles estimation and out-of-sample estimation.

1.3. Organization of the paper

In Section 2, we discuss related works on alternative definitions and estimators of quantiles for directional data. The definitions of OT-based directional distribution and quantile functions on ${\mathbb{S}^{2}}$ are given in Section 3. Our approach to obtain regularized estimators of MK quantiles on ${\mathbb{S}^{2}}$ is detailed in Section 4. A study of the MK directional statistical depth, is proposed in Section 5 and illustrated with simulated data. Numerical experiments are reported in Section 6 to highlight the benefits of entropic regularization. Concluding remarks and discussions on this work are proposed in Section 7. Finally, mathematical details and proofs are deferred to technical appendices at the end of the paper.

2. Related works

2.1. Directional quantiles and depth

Very much related to multivariate quantiles is the idea of statistical depth and the center-outward ordering of data, with a long history dating back to Tukey’s work [81]. The issue of picturing data, in Tukey’s terminology, has since gained very much attention, which prevents us from providing an exhaustive survey, and we rather refer to [50, 3, 75, 60]. The first notion of directional depth function was introduced in [78], followed by the work of [51] that developed three different approaches. The properties of the latter have been studied and applied for inference in [71, 1]. The required computational effort led the authors of [46] to build the angular Mahalanobis depth, and, doing so, they provided the first concept of directional quantiles. Despite appealing properties, the obtained contours are constrained to be rotationally symmetric, motivating the elliptic counterpart from [41]. Still, the elliptic assumption is a strong one as discussed in [37]. Facing either this lack of adaptiveness or the computational burdens of previous references, distance-based depths were proposed in [66], even though not explicitly related to the notion of quantiles. These directional depth functions can been applied, for instance, in data analysis and inference [51, 78, 1, 44], classification [66, 51, 21, 22, 61, 44] or clustering [65]. Among recent years, two concepts of multivariate quantiles have emerged in ${\mathbb{R}}^{d}$ , namely the spatial quantiles [11] and the center-outward ones [12, 36], both gathering most of commonly sought-after properties. More importantly here, these promising ideas have successfully been extended to directional data [44, 37], improving on the lack of adaptiveness of Mahalanobis quantiles [46], with desirable asymptotic results inherited from the formalism of quantiles.

To put it in a nutshell, existing concepts of directional quantiles include Mahalanobis quantiles, Spatial quantiles and Monge-Kantorovich ones. On the one hand, a statistical depth associated to a notion of quantiles is amenable to benefit from the best of both worlds, that is adaptivity to the underlying geometry and consistency of empirical versions, as argued for instance in [44]. On another hand, in comparison with other directional quantiles, Monge-Kantorovich ones present an additional descriptive power inherited from the fact that $\mathbf{Q}(U)\sim\nu$ . A direct consequence is that $\mathbf{Q}$ must contain all the available information, which is appealing with the purpose of summing up unknown features of multivariate data. Even more, these concepts provide a curvilinear coordinate system within the support of the distribution of interest $\nu$ , [37], which is a promising way to render the available information.

2.2. Regularized center-outward quantiles

In [37], the estimation of directional quantiles amounts to a discrete OT problem, between a regular grid and the observed sample, in a similar fashion to what is done for Euclidean data in [12]. However, such estimator is piecewise constant, restricted to taking its values in the set of observations. As in the Euclidean setting ${\mathbb{R}}^{d}$ , this raises issues in a number of situations that require smoothness of the quantile function, for instance to implement volumes of quantile regions [4]. There, regularization naturally enters the picture, either after the estimation of unregularized OT [36, 4], or with EOT [7] building on the entropic map, [69]. Facing the same issue in the present directional context, the entropic map is extended here to the non-Euclidean setting, similarly to what is done in [17] for general costs in ${\mathbb{R}}^{d}$ . To the best of our knowledge, this appears to be new, although it benefits from explicit formulation tractable in linear time, given the dual potentials solving EOT. This belongs to the line of work estimating OT maps on manifolds, see $e.g.$ [28, 14, 67].

3. Directional distribution and quantile functions on the 2-sphere based on optimal transport

In this section, we introduce the main definitions of OT-based distribution and quantile functions for spherical data, beginning with notation related to spherical harmonics and differentiation on ${\mathbb{S}^{2}}$ .

3.1. Context

The unit $2$ -sphere is defined by ${\mathbb{S}^{2}}=\{x\in{\mathbb{R}}^{3}:\|x\|=1\}.$ The points $x\in{\mathbb{S}^{2}}$ can be written in spherical coordinates, with longitude $\phi\in[-\pi,\pi]$ and colatitude $\theta\in[0,\pi]$ , as

(3.1)

x=\Phi(\theta,\phi):=(\cos\phi\sin\theta,\sin\phi\sin\theta,\cos\theta).

On the $2$ -sphere, the geodesic distance is $d(x,y)=\arccos(\langle x,y\rangle),$ and the squared Riemannian distance is

(3.2)

c(x,y)=\frac{1}{2}d(x,y)^{2}.

Both $d$ and $c$ are continuous and bounded as $d(x,y)\in[0,\pi]$ . Moreover, $({\mathbb{S}^{2}},d)$ is a separable complete metric space, with Borel algebra $\mathcal{B}^{2}$ . The surface measure $\sigma_{\mathbb{S}^{2}}$ on ${\mathbb{S}^{2}}$ is given by

\int_{\mathbb{S}^{2}}f(x)d\sigma_{\mathbb{S}^{2}}(x)=\int_{0}^{\pi}\int_{-\pi}% ^{\pi}f(\Phi(\theta,\phi))\sin\theta d\phi d\theta,

and the uniform measure on ${\mathbb{S}^{2}}$ writes $\mu_{\mathbb{S}^{2}}=\frac{1}{4\pi}\sigma_{\mathbb{S}^{2}}$ . The space of all equivalence classes of square integrable functions on ${\mathbb{S}^{2}}$ is denoted by $L^{2}({\mathbb{S}^{2}})$ . We define the spherical harmonic function of degree $l\in\mathbb{N}^{*}$ and order $m\in\{-l,\cdots,l\}$ by

Y_{l}^{m}(x)=Y_{l}^{m}(\Phi(\theta,\phi))=\sqrt{\frac{2l+1}{4\pi}\frac{(l-m)!}% {(l+m)!}}P_{l}^{m}(\cos\theta)e^{im\phi},

where the associated Legendre functions $P_{l}^{m}:[-1,1]\rightarrow{\mathbb{R}}$ verify, for $l\in\mathbb{N}^{*}$ and $m\geq 0$ ,

P_{l}^{m}(t)=\frac{(-1)^{m}}{2^{l}l!}(1-t^{2})^{m/2}\frac{d^{l+m}(t^{2}-1)^{l}% }{dt^{l+m}}\quad\text{and}\quad P_{l}^{-m}(t)=(-1)^{m}\frac{(l-m)!}{(l+m)!}P_{% l}^{m}(t).

Importantly, the spherical harmonics form an orthonormal basis of $L^{2}({\mathbb{S}^{2}})$ , so that every function $f\in L^{2}({\mathbb{S}^{2}})$ is uniquely decomposed, for $x\in{\mathbb{S}^{2}}$ , as

(3.3)

f(x)=\sum_{l=0}^{\infty}\sum_{m=-l}^{l}\bar{f}_{l}^{m}Y_{l}^{m}(x),

where the sequence of spherical harmonic coefficients $\bar{f}=(\bar{f}_{l}^{m})$ verifies

(3.4)

\bar{f}_{l}^{m}=\frac{1}{4\pi}\int_{\mathbb{S}^{2}}f(x)\overline{Y_{l}^{m}(x)}% d\sigma_{\mathbb{S}^{2}}(x).

We refer to [13] for an introduction to Fourier analysis on the sphere.

Below, we introduce a few notation from differential geometry [79], that one can find for instance in [25]. At any point $x\in{\mathbb{S}^{2}}$ , the tangent space is $\mathcal{T}_{x}{\mathbb{S}^{2}}=\{y\in{\mathbb{R}}^{3}:\langle x,y\rangle=0\}$ , and the associated orthogonal projection $\rho_{x}:{\mathbb{R}}^{3}\rightarrow\mathcal{T}_{x}{\mathbb{S}^{2}}$ verifies

(3.5)

\rho_{x}\xi=(I-xx^{T})\xi=\xi-\langle\xi,x\rangle x.

The exponential map at $x\in{\mathbb{S}^{2}}$ , $\text{Exp}_{x}:\mathcal{T}_{x}{\mathbb{S}^{2}}\rightarrow{\mathbb{S}^{2}}$ , has the explicit form

\text{Exp}_{x}(v)=\cos(\|v\|)x+\sin(\|v\|)\frac{v}{\|v\|},

and its inverse $\text{Log}_{x}$ writes

\text{Log}_{x}(z)=\frac{d(x,z)}{\sqrt{1-\langle x,z\rangle^{2}}}\rho_{x}z=d(x,% z)\frac{\rho_{x}(z-x)}{\|\rho_{x}(z-x)\|}.

We take the extrinsic viewpoint for the manifold ${\mathbb{S}^{2}}$ embedded in ${\mathbb{R}}^{3}$ , so that the Riemannian gradient is given by orthogonally projecting Euclidean derivatives onto the tangent space. For a smooth function $f:{\mathbb{S}^{2}}\rightarrow{\mathbb{R}}$ , its Riemannian gradient $\nabla f(x)$ at $x$ is thus defined by

(3.6)

\nabla f(x)=\rho_{x}Df(x),\hskip 28.45274pt\mbox{where}\hskip 28.45274ptDf(x)=% \Big{(}\frac{\partial f(x)}{\partial x_{i}}\Big{)}_{i,j\in\{1,2,3\}}.

The Riemannian Hessian $\nabla^{2}f(x):\mathcal{T}_{x}{\mathbb{S}^{2}}\rightarrow\mathcal{T}_{x}{% \mathbb{S}^{2}}$ at $x$ is defined by the same token,

(3.7)

\nabla^{2}f(x)=\rho_{x}\Big{[}D^{2}f(x)-\langle Df(x),x\rangle I\Big{]}\hskip 2% 8.45274pt\mbox{with}\hskip 28.45274ptD^{2}f(x)=\Big{(}\frac{\partial^{2}f(x)}{% \partial x_{i}\partial x_{j}}\Big{)}_{i,j\in\{1,2,3\}}.

3.2. Main definitions

Here, we fix the definitions of directional distribution and quantile functions as introduced in [37]. Given $\mu$ and $\nu$ two probability measures supported on ${\mathbb{S}^{2}}$ , we say that $T$ pushes forward $\mu$ to $\nu$ , denoted by $T_{\#}\mu=\nu$ , if, for each measurable $B\in\mathcal{B}^{2}$ ,

\nu(B)=\mu(T^{-1}(B)).

Then, for the quadratic cost $c$ defined in (3.2), Monge’s formulation of the OT problem between $\mu$ and $\nu$ writes

(3.8)

\min\limits_{T:T_{\#}\mu=\nu}\mathbb{E}_{X\sim\mu}\left[c(X,T(X))\right].

The solution of (3.8) is referred to as a Monge map, while $\mu$ and $\nu$ are called the reference and target measures, respectively. Before considering the existence of Monge maps, we recall the key definition of $c$ -transforms as stated in [57], that is equivalent to the formulation of [37][Definition 2].

Definition 3.1.

Given a function $\psi:{\mathbb{S}^{2}}\rightarrow{\mathbb{R}}$ , its $c$ -transform is defined by

\psi^{c}(y)=\inf_{x\in{\mathbb{S}^{2}}}\{c(x,y)-\psi(x)\}.

Then, $\psi$ is said to be $c$ -concave when $\psi^{cc}:=(\psi^{c})^{c}=\psi$ .

The proper summary of [37][Proposition 1] highlights that $c$ -concavity is related to optimality in Monge’s OT problem (3.8). For our continuous and bounded cost $c$ , a sufficient assumption is that the reference measure belongs to $\mathbf{B}_{2}$ , the family of $\sigma_{\mathbb{S}^{2}}$ -absolutely continuous distributions with densities bounded away from $0$ and $\infty$ , see [57][Theorem 9], which is the case for the uniform measure $\mu_{\mathbb{S}^{2}}$ . This enables the definition of directional distribution and quantile functions, first introduced in [37].

Definition 3.2.

The directional MK quantile function of the arbitrary probability measure $\nu$ is the $\mu_{\mathbb{S}^{2}}$ - $a.s.$ unique map $\mathbf{Q}:{\mathbb{S}^{2}}\rightarrow{\mathbb{S}^{2}}$ such that $\mathbf{Q}_{\#}\mu_{\mathbb{S}^{2}}=\nu$ and there exists a $c$ -concave differentiable map** $\psi:{\mathbb{S}^{2}}\rightarrow{\mathbb{R}}$ such that, $\sigma_{\mathbb{S}^{2}}$ - $a.e.$ ,

\mathbf{Q}(x)=\text{Exp}_{x}(-\nabla\psi(x)).

In addition, the directional MK distribution function of $\nu$ is given by

\mathbf{F}(x)=\text{Exp}_{x}(-\nabla\psi^{c}(x)).

As soon as $\nu$ belongs to $\mathbf{B}_{2}$ , [57][Corollary 10] ensures that $\mathbf{F}=\mathbf{Q}^{-1}$ almost everywhere, whereas $\mathbf{Q}^{-1}$ might not exist if $\nu$ is not absolutely continuous. Compared to [37], our definition begins with $\mathbf{Q}$ instead of $\mathbf{F}$ , and it does not require the absolute continuity for $\nu$ . This follows developments from [32] for measures supported in ${\mathbb{R}}^{d}$ .

Remark 3.1 (Regularity).

The regularity of OT maps on the sphere is a delicate subject that has inspired a number of works, including [83, 52, 20, 53]. Firstly, any $c$ -concave potential $\psi$ is twice differentiable almost everywhere [15][Proposition 3.14]. For further regularity, an appropriate requirement is that the underlying measures are smooth and belong to $\mathbf{B}_{2}$ . In particular, if, a minima, $\nu$ has density $f\in C^{1,1}({\mathbb{S}^{2}})$ with respect to $\sigma_{\mathbb{S}^{2}}$ , then the MK quantile function $\mathbf{Q}$ belongs to $C^{2,\beta}({\mathbb{S}^{2}})$ for all $\beta\in\,]0,1[$ , see [53] for more details.

Remark 3.2 (Gradient map**s).

A gradient map** is built from a $c$ -concave potential $\psi$ , through $x\mapsto\text{Exp}_{x}(-\nabla\psi(x))$ . A statistical model based on convex combinations of such maps was introduced in [73], and further used for directional data in [74]. With the viewpoint of [12, 36, 37], this amounts to a barycenter model for MK quantile functions.

In view of the proof of [57][Theorem 9], $\psi$ and its $c$ -transform $\psi^{c}$ in Definition 3.2 maximize the dual version of Kantorovich’s problem, in the sense that

(3.9)

(\psi,\psi^{c})\in\operatornamewithlimits{argmax}\limits_{(u,v)\in\text{Lip}_{% c}}\int_{\mathbb{S}^{2}}u(x)d\mu_{\mathbb{S}^{2}}(x)+\int_{\mathbb{S}^{2}}v(y)% d\nu(y),

with

\text{Lip}_{c}=\left\{u,v:{\mathbb{S}^{2}}\rightarrow{\mathbb{R}}\text{ % continuous };u(x)+v(y)\leq c(x,y)\right\}.

The semi-dual version of (3.9) refers to the optimisation over a single dual variable $u$ , (resp. $v$ ), the other being taken as $u^{c}$ , (resp. $v^{c}$ ). Our proposal is to build upon the semi-dual problem to tackle the issue of finding regularized estimators for $\mathbf{F}$ and $\mathbf{Q}$ .

Before that, we recall the definitions of directional quantile contours and regions from [37], that simplify in dimension $d=3$ . A central point in ${\mathbb{S}^{2}}$ must be chosen for the uniform distribution $\mu_{\mathbb{S}^{2}}$ , in view of defining nested regions with $\mu_{\mathbb{S}^{2}}$ -content $\tau\in[0,1]$ . In our Riemannian framework, a well-suited notion of central point is the Fréchet median

(3.10)

\theta_{M}=\mathop{\rm arg\;min}\limits_{z\in{\mathbb{S}^{2}}}\mathbb{E}_{Z% \sim\nu}[d(Z,z)],

that can be computed with the package geomstats [59], in Python. Then, the spherical cap with $\mu_{\mathbb{S}^{2}}$ probability $\tau\in[0,1]$ centered at $\mathbf{F}(\theta_{M})$ is

\mathbb{C}^{U}_{\tau}=\left\{x\in{\mathbb{S}^{2}}:\langle x,\mathbf{F}(\theta_% {M})\rangle\geq 1-2\tau\right\},

with boundary $\mathcal{C}^{U}_{\tau}=\left\{x\in{\mathbb{S}^{2}}:\langle x,\mathbf{F}(\theta% _{M})\rangle=1-2\tau\right\}$ a parallel of order $\tau$ . This defines a rotated version of the usual latitude-longitude coordinate system (3.1), with respect to the pole $\mathbf{F}(\theta_{M})$ , as follows. Any $x\in{\mathbb{S}^{2}}$ decomposes into

(3.11)

x=\langle x,\mathbf{F}(\theta_{M})\rangle\mathbf{F}(\theta_{M})+\sqrt{1-% \langle x,\mathbf{F}(\theta_{M})\rangle^{2}}\mathbf{S}_{\mathbf{F}(\theta_{M})% }(x),

where $\langle x,\mathbf{F}(\theta_{M})\rangle$ is a latitude, constant over the parallel $\mathcal{C}^{U}_{\tau}$ , while the directional sign

(3.12)

\mathbf{S}_{\mathbf{F}(\theta_{M})}(x)=\frac{x-\langle x,\mathbf{F}(\theta_{M}% )\rangle\mathbf{F}(\theta_{M})}{\|x-\langle x,\mathbf{F}(\theta_{M})\rangle% \mathbf{F}(\theta_{M})\|}

is a longitude, with the convention $\mathbf{0}/0=\mathbf{0}$ for $x=\pm\mathbf{F}(\theta_{M})$ . The unit vector $\mathbf{S}_{\mathbf{F}(\theta_{M})}(x)$ takes values on the rotated equator $\mathcal{C}^{U}_{1/2}$ , and thus allows to characterize meridians crossing $s\in\mathcal{C}^{U}_{1/2}$ through $\mathcal{M}_{s}^{U}=\{x\in{\mathbb{S}^{2}}:\mathbf{S}_{\mathbf{F}(\theta_{M})}% (x)=s\}$ . For ease of understanding, we take the example of $\mathbf{F}(\theta_{M})=(0,0,1)^{T}$ : taking $x\in\mathcal{C}_{\tau}^{U}$ such that $\langle x,\mathbf{F}(\theta_{M})\rangle=1-2\tau$ is equivalent to $x_{3}=1-2\tau$ . Thus, we retrieve that for a fixed $\tau$ , quantile contours $\mathcal{C}_{\tau}^{U}$ have indeed a fixed latitude in the classical longitude-latitude system (3.1). In fact, one can use contours of constant latitude in the system (3.1) to discretize a reference quantile contour oriented towards $\mathbf{F}(\theta_{M})$ , by choosing the appropriate rotation matrix $\mathbf{O}$ that sends $(0,0,1)^{T}$ towards $\mathbf{F}(\theta_{M})$ . Numerically, it can be computed using Rodrigues’ rotation formula, for instance.

The image by $\mathbf{Q}$ of the parallel / meridian system (3.11) provides curvilinear parallels $\mathbf{Q}(\mathcal{C}^{U}_{\tau})$ and curvilinear meridians $\mathbf{Q}(\mathcal{M}_{s}^{U})$ adapted to the geometry of the support of $\nu$ , giving rise to suitable directional concepts of quantile contours and signs [37]. Intuitively, a change in coordinates in a data-adaptive fashion must retain all the available information, in a simpler form amenable to be summed up.

Definition 3.3 (Quantile contours, regions and signs).

Let $\nu\in\mathbf{B}_{2}$ , with directional quantile function $\mathbf{Q}$ . Then,

•

the quantile contour of order $\tau\in[0,1]$ is $\mathcal{C}_{\tau}=\mathbf{Q}(\mathcal{C}^{U}_{\tau})$ ,
•

the quantile region of order $\tau\in[0,1]$ is $\mathbb{C}_{\tau}=\mathbf{Q}(\mathbb{C}^{U}_{\tau})$ ,
•

the sign curve associated with $s\in\mathcal{C}^{U}_{1/2}$ is $\mathbf{Q}(\mathcal{M}_{s}^{U})$ .

Since $\mathbf{Q}$ is a push-forward map**, $\mathbf{Q}_{\#}\mu_{\mathbb{S}^{2}}=\nu$ , the $\nu$ -probability content of $\mathbb{C}_{\tau}$ is $\tau$ . Moreover, the quantile contours of $\nu\in\mathbf{B}_{2}$ are continuous and the quantile regions are closed, connected and nested, as stated in [37]. Invariance properties, that were shown in [37], are gathered in Appendix A, for the sake of completeness. There, it is also mentioned that, from [74][Lemma 1], the convex combination between $c$ -concave functions is itself $c$ -concave.

4. Regularized estimation of Monge-Kantorovich quantiles on the 2-sphere

4.1. Entropic OT on the 2-sphere

We now introduce an algorithm based on the spherical Fourier transform to solve the regularized Kantorovich problem on the $2$ -sphere. In the Euclidean case, Kantorovich’s problem is known to be easier to solve than Monge’s problem [18]. Even more so, since the founding work of [16], adding an entropic regularization term to (3.9) has been a cornerstone for the development of OT-based methods in statistics and machine learning. In [31], rewriting the dual objective function to be optimized allowed the introduction of stochastic algorithms to obtain provably convergent algorithms, see also [6]. For arbitrary measures, (not only discrete ones), the dual variables cannot be viewed as finite-dimensional vectors anymore. Therefore, one requires the use of nonparametric families of dual functions, as proposed in [31] with reproducing kernel Hilbert spaces or in [72] with deep neural networks. In our previous work [7], we suggested the use of Fourier series in the specific context of center-outward quantiles to take advantage of the knowledge of the reference measure. The resulting algorithm directly targets the continuous OT problem between the continuous reference measure and the underlying $\nu$ , instead of the discrete or semi-discrete OT problem towards the empirical measure $\widehat{\nu}_{n}$ . The idea that we are introducing in this paper for spherical distributions is in the same spirit. We mention that several algorithms exist in order to solve the unregularized OT problem on the sphere, see [40] and the references therein. Also, the Network simplex and Sinkhorn algorithm only depend on the cost matrix, so that they can be adapted trivially on any space. While these discrete solvers require the storage of the cost matrix, of size $n^{2}$ for two samples of sizes $n$ , stochastic algorithms are designed to avoid it.

In the context of spherical quantiles, one must solve an OT problem where the reference measure is $\mu_{\mathbb{S}^{2}}$ , the uniform probability measure on the sphere. Here, we consider the formulation from [31, 30] for OT between $\mu_{\mathbb{S}^{2}}$ and $\nu$ , regularized by relative entropy with respect to the product measure $\mu_{\mathbb{S}^{2}}\otimes\nu$ . The semi-dual version of EOT between $\mu_{\mathbb{S}^{2}}$ and $\nu$ from [30][Proposition 12], for $\varepsilon>0$ a regularization parameter, writes

(4.1)

\max_{u\in L^{\infty}({\mathbb{S}^{2}})}\int_{{\mathbb{S}^{2}}}u(x)d\mu_{% \mathbb{S}^{2}}(x)+\int_{{\mathbb{S}^{2}}}u^{c,\varepsilon}(y)d\nu(y),

with $u^{c,\varepsilon}$ the smooth conjugate of $u$ defined by

(4.2)

u^{c,\varepsilon}(y)=-\varepsilon\log\left(\int_{{\mathbb{S}^{2}}}\exp\Big{(}% \frac{u(x)-c(x,y)}{\varepsilon}\Big{)}d\mu_{\mathbb{S}^{2}}(x)\right).

The smooth conjugate is the entropic counterpart of Definition 3.1. For bounded costs, the problem (4.1) admits a solution in $L^{\infty}$ , unique up to additive constants, [30][Theorem 7]. To leverage unicity, we impose that $\int_{{\mathbb{S}^{2}}}u(x)d\mu_{\mathbb{S}^{2}}(x)=0$ , so that the optimisation problem (4.1) becomes

\max_{u\in L^{\infty}({\mathbb{S}^{2}})}\int_{{\mathbb{S}^{2}}}u^{c,% \varepsilon}(y)d\nu(y).

It is well-known, see $e.g.$ [64, 31], that $\mathbf{u}_{\varepsilon}$ is solution of (4.1) if and only if

(4.3)

\mathbf{u}_{\varepsilon}=((\mathbf{u}_{\varepsilon})^{c,\varepsilon})^{c,% \varepsilon}.

Note that a Lipschitz continuous function on ${\mathbb{S}^{2}}$ shall equal its spherical Fourier series (3.3) pointwise [58][Theorem 5.26]. One can find in [57][Lemma 2] that the true unregularized potentials are Lipschitz, because ${\mathbb{S}^{2}}$ has a finite diameter $|{\mathbb{S}^{2}}|=\pi$ . Furthermore, the same holds in the regularized case $\varepsilon>0$ , from the optimality condition (4.3), see Proposition 12 from [26][Appendix B] or [64][Lemma 3.1]. Consequently, we suggest to parameterize the dual variable in (4.1) by its spherical harmonic coefficients. For a given $\varepsilon>0$ , we consider the optimal sequence of coefficients $\bar{\mathbf{u}}_{\varepsilon}$ defined as the solution of the following stochastic convex minimisation problem

(4.4)

\bar{\mathbf{u}}_{\varepsilon}=\mathop{\mathrm{argmin}}_{\bar{\mathbf{u}}\in% \ell_{1}}H_{\varepsilon}(\bar{\mathbf{u}})\hskip 28.45274pt\mbox{with}\hskip 2% 8.45274ptH_{\varepsilon}(\bar{\mathbf{u}})=\mathbb{E}\left[h_{\varepsilon}(% \bar{\mathbf{u}},X)\right]

where $X$ is a random vector with distribution $\nu$ , $\bar{\mathbf{u}}=(\bar{\mathbf{u}}_{l}^{m})_{l\geq 1}$ and

h_{\varepsilon}(\bar{\mathbf{u}},x)=-u^{c,\varepsilon}(x)\quad\mbox{with}\quad u% (z)=\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}\bar{\mathbf{u}}_{l}^{m}Y_% {l}^{m}(z).

Note that the spherical harmonic coefficient $\bar{\mathbf{u}}_{0}^{0}$ equals $0$ because of the identifiability condition $\int_{{\mathbb{S}^{2}}}u(x)d\mu_{\mathbb{S}^{2}}(x)=0$ .

We shall now discuss the equivalence between (4.4) and the original problem (4.1). On the $2$ -sphere, the series of spherical harmonics of a continuously differentiable function is uniformly convergent, see [43][p.259] or [42]. To obtain the stronger result that the sequence of spherical harmonics belongs to $\ell_{1}$ , the function needs to be twice continuously differentiable, [42][Theorem 2]. Regarding the unregularized Kantorovich potentials, such differentiability requires smoothness of the measures involved, as highlighted in Remark 3.1. A sufficient condition when the reference measure is $\mu_{\mathbb{S}^{2}}$ is that the density of $\nu$ is differentiable and bounded above and below by positive constants, see [53][Corollary 6.2]. It appears that this property holds for the regularized potential solving (4.1), without any continuity condition for $\nu$ .

Proposition 4.1.

Let $\mathbf{u}_{\varepsilon}$ be a solution of (4.1). Then, $\mathbf{u}_{\varepsilon}$ is twice continuously differentiable, and, as a byproduct, its series of spherical harmonics belongs to $\ell_{1}$ .

Consequently, the problem (4.4) is equivalent to the original one (4.1). The main virtue of this parameterization is that partial derivatives of $h_{\varepsilon}$ with respect to the parameters $\bar{\mathbf{u}}_{l}^{m}$ can be derived easily, which is appealing in view of a stochastic gradient scheme. Here, the objective $h_{\varepsilon}(\cdot,x)$ is of the same mathematical nature than in [7], so that it is differentiable and the following property holds.

Proposition 4.2.

For every $x\in{\mathbb{S}^{2}}$ , the function $h_{\varepsilon}(\cdot,x):\ell_{1}\rightarrow\mathbb{R}$ is Fréchet differentiable and its differential $D_{\bar{\mathbf{u}}}h_{\varepsilon}(\bar{\mathbf{u}},x)$ belongs to the dual Banach space $(\ell_{\infty},\|\cdot\|_{\ell_{\infty}})$ where $\|\bar{\mathbf{u}}\|_{\ell_{\infty}}=\sup_{l,m}|\bar{\mathbf{u}}_{l}^{m}|.$ The components of $D_{\bar{\mathbf{u}}}h_{\varepsilon}(\bar{\mathbf{u}},x)$ are the partial derivatives

(4.5)

\frac{\partial h_{\varepsilon}(\bar{\mathbf{u}},x)}{\partial\bar{\mathbf{u}}_{% l}^{m}}=\frac{1}{4\pi}\int_{\mathbb{S}^{2}}g_{\bar{\mathbf{u}},x}(z)Y_{l}^{m}(% z)d\sigma_{\mathbb{S}^{2}}(z),

that are the spherical harmonics coefficients of the function

(4.6)

g_{\bar{\mathbf{u}},x}(z)=\frac{\exp\left(\frac{u(z)-c(z,x)}{\varepsilon}% \right)}{\int\exp\left(\frac{u(y)-c(y,x)}{\varepsilon}\right)d\mu_{\mathbb{S}^% {2}}(y)}\hskip 14.22636pt\mbox{with}\hskip 14.22636ptu(z)=\sum\limits_{l=0}^{% \infty}\sum\limits_{m=-l}^{l}\bar{\mathbf{u}}_{l}^{m}Y_{l}^{m}(z).

For $(X_{n})$ a sequence of independent random vectors with distribution $\nu$ , we consider the stochastic algorithm in the Banach space $(\ell_{1},\|\cdot\|_{\ell_{1}})$ defined, for all $n\geq 0$ , by

(4.7)

\widehat{u}_{n+1}=\widehat{u}_{n}-\gamma_{n}WD_{\widehat{u}}h_{\varepsilon}(% \widehat{u}_{n},X_{n+1})

where $\gamma_{n}=\gamma n^{-\alpha}$ is a decreasing sequence of positive numbers with $1/2<\alpha<1$ and $\gamma>0$ . Because $\ell_{1}$ differs from its dual space $\ell_{\infty}$ , the linear operator $W$ is defined by

\left\{\begin{array}[]{ccc}W:(\ell_{\infty},\|\cdot\|_{\ell_{\infty}})&\to&(% \ell_{1},\|\cdot\|_{\ell_{1}})\\ \bar{v}=(\bar{v}_{l}^{m})&\mapsto&\bar{w}\odot\bar{v}=(\bar{w}_{l}^{m}\bar{v}_% {l}^{m})\end{array}\right.

where $\bar{w}=(\bar{w}_{l}^{m})$ is a deterministic sequence of positive weights satisfying the condition

(4.8)

\|\bar{w}\|_{\ell_{1}}=\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}\bar{w}% _{l}^{m}<+\infty.

In all the experiments carried out in this paper, we use the sequence $\bar{w}_{l}^{m}=(l^{2}+m^{2})^{-1}$ .

For a given regularization parameter $\varepsilon>0$ , a regularized estimator of the optimal potential $\mathbf{u}_{\varepsilon}(x)=\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}% \bar{\mathbf{u}}_{\varepsilon,l}^{m}Y_{l}^{m}(x)$ is naturally given by

(4.9)

\widehat{\mathbf{u}}_{\varepsilon,n}(x)=\sum\limits_{l=0}^{\infty}\sum\limits_% {m=-l}^{l}\widehat{u}_{n,l}^{m}Y_{l}^{m}(x).

From a practical point of view, the stochastic sequence (4.7) must be discretized. To do so, one must consider a grid of $p^{2}$ points on the $2$ -sphere and the associated spherical harmonics coefficients. We emphasize that this discretization takes place in the space of frequencies, willing to take advantage from implicit interpolation in this space. The Python library pyshtools [84], implements spherical harmonics transforms and reconstructions. Our numerical procedure builds upon it as the stochastic algorithm (4.7) requires computing the spherical harmonics coefficients of the function $g_{\widehat{u},y}(x)$ in (4.6), that relies on $u(x)$ reconstructed from $\widehat{u}$ thanks to the inverse Fourier transform on the $2$ -sphere. With the help of the fast routine within pyshtools, the computational cost at each iteration of (4.7) is of order $\mathcal{O}\left(p^{2}\log^{2}(p)\right)$ [84]. Finally, the estimator $\widehat{\mathbf{u}}_{\varepsilon,n}(x)$ in (4.9) can be quickly recovered for any $x$ thanks to the Python library sphericart [9].

Remark 4.1.

To study the convergence of the stochastic algorithm (4.7), we could adapt the theoretical results in [7] to our setting. Importantly, the consistency results obtained in [7] remain valid in the present spherical context, because they were irrespective of the orthonormal basis and the cost $c$ . However, such a study is beyond the scope of this paper.

4.2. Regularized distribution and quantile functions

We now introduce the regularized counterpart of Definition 3.2. When dealing with measures supported on ${\mathbb{R}}^{d}$ , one can use the entropic map, defined as the barycentric projection of the entropic optimal plan, see $e.g.$ [69, 70]. At first sight, this requires a notion of average on the sphere, as done in [28] from the unregularized empirical OT plan. Nonetheless, this map is alternatively characterized by analogy with Brenier theorem [69][Proposition 2], whose building block is the gradient of Kantorovich potential. Based on such differentiation, entropic maps were introduced in [17] for OT problems involving general convex costs in ${\mathbb{R}}^{d}$ . Because this enforces the structure of optimality, we pursue this idea for our non-Euclidean setting. Note that EOT has already been considered in the specific setting of MK quantiles in ${\mathbb{R}}^{d}$ [7, 56, 10], both for smoothing and computational purposes. Our numerical experiments in Section 6 flesh out the empirical benefits and shortcomings when varying $\varepsilon$ .

Definition 4.1.

Let $\nu$ be an arbitrary probability measure supported on ${\mathbb{S}^{2}}$ , and $\mathbf{u}_{\varepsilon}:{\mathbb{S}^{2}}\rightarrow{\mathbb{R}}$ be a solution of (4.1) between $\mu_{\mathbb{S}^{2}}$ and $\nu$ . Then, the regularized distribution function of $\nu$ is given by

(4.10)

\mathbf{F}_{\varepsilon}(z)=\text{Exp}_{z}(-\nabla\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)),

and the regularized quantile function of $\nu$ is

(4.11)

\mathbf{Q}_{\varepsilon}(x)=\text{Exp}_{x}(-\nabla\mathbf{u}_{\varepsilon}(x)).

This requires the differentiation of entropic Kantorovich potentials. For a given regularization parameter $\varepsilon>0$ , partial derivatives can be retrieved by

\frac{\partial\mathbf{u}_{\varepsilon}(x)}{\partial x_{i}}=\sum\limits_{l=0}^{% \infty}\sum\limits_{m=-l}^{l}\bar{\mathbf{u}}_{l}^{m}\frac{\partial Y_{l}^{m}(% x)}{\partial x_{i}},

where the package sphericart [9], allows the computation of $\frac{\partial Y_{l}^{m}(x)}{\partial x_{i}}$ easily. The Riemannian gradient $\nabla\mathbf{u}_{\varepsilon}$ follows using (3.6). But because this may lead to numerical instabilities, we suggest instead to make use of first-order conditions (4.3). With discrete counterparts in practice, changing a couple of potentials $(u,v)$ to $(v^{c,\varepsilon},u^{c,\varepsilon})$ improves the objective to solve. As (4.2) depends on the measure $\mu_{\mathbb{S}^{2}}$ , its symmetric version (4.3) depends on $\nu$ instead. Notably, Sinkhorn’s algorithm corresponds to perform such alternative smooth conjugates [30][Proposition 10]. The next proposition gives a generalized entropic map on the hypersphere ${\mathbb{S}^{2}}$ .

Proposition 4.3.

Denote by

(4.12)

g_{\varepsilon}(x,z)=\frac{-d(x,z)}{\sqrt{1-\langle x,z\rangle^{2}}}\exp\Big{(% }\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)}{\varepsilon}\Big{)}.

Then, the Euclidean partial derivatives of $\mathbf{u}_{\varepsilon}$ admit the closed-form expression

(4.13)

\partial_{x_{i}}\mathbf{u}_{\varepsilon}(x)=\int z_{i}g_{\varepsilon}(x,z)d\nu% (z).

Similarly, the Euclidean gradient of $\mathbf{u}_{\varepsilon}^{c,\varepsilon}$ verifies

(4.14)

\partial_{z_{i}}\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)=\int x_{i}g_{% \varepsilon}(x,z)d\mu_{\mathbb{S}^{2}}(x).

Combining Proposition 4.3 with (3.6) yields the following corollary, recalling that

(x,z)\mapsto\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{% \varepsilon}^{c,\varepsilon}(z)}{\varepsilon}\Big{)}

is the density of the optimal entropic plan with respect to $\mu\otimes\nu$ , [64].

Corollary 4.1.

The regularized distribution and quantile functions of $\nu$ on ${\mathbb{S}^{2}}$ admit closed-form expressions through

\mathbf{Q}_{\varepsilon}(x)=\text{Exp}_{x}\int\text{Log}_{x}(z)\exp\Big{(}% \frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)}{\varepsilon}\Big{)}d\nu(z),

and

\mathbf{F}_{\varepsilon}(z)=\text{Exp}_{x}\int\text{Log}_{z}(x)\exp\Big{(}% \frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)}{\varepsilon}\Big{)}d\mu_{\mathbb{S}^{2}}(x).

Remark 4.2.

It should be noted that $\mathbf{F}_{\varepsilon}$ , $resp.$ $\mathbf{Q}_{\varepsilon}$ , does not push $\nu$ forward to $\mu_{\mathbb{S}^{2}}$ anymore, $resp.$ $\mu_{\mathbb{S}^{2}}$ forward to $\nu$ . However, they are expected to be close to their unregularized counterparts, for small values of $\varepsilon>0$ , as studied, for the quadratic cost in ${\mathbb{R}}^{d}$ , in [33, 69, 80]. The limit $\varepsilon\rightarrow 0$ has been considered outside the Euclidean setting [5, 8, 64], although not directly about the generalized entropic map itself. In particular, up to some sequence $(\varepsilon_{k})$ such that $\lim_{k\rightarrow+\infty}\varepsilon_{k}=0$ , [64][Proposition 3.2] gives us the uniform convergence of potentials $(\mathbf{u}_{\varepsilon_{k}},\mathbf{u}_{\varepsilon_{k}}^{c,{\varepsilon_{k}% }})$ on compact subsets of ${\mathbb{S}^{2}}$ , towards $(\psi,\psi^{c})$ solving (3.9).

From Corollary 4.1, $\mathbf{F}_{\varepsilon}$ and $\mathbf{Q}_{\varepsilon}$ can be seen as weighted averages in the tangent space. We argue that this can be gainful in practice, because of the regularity it induces. Indeed, second-order derivatives are given hereafter, which entails the continuity of $\mathbf{F}_{\varepsilon}$ and $\mathbf{Q}_{\varepsilon}$ .

Proposition 4.4.

The potential $\mathbf{u}_{\varepsilon}$ is twice-differentiable everywhere, and

\frac{\partial^{2}\mathbf{u}_{\varepsilon}}{\partial_{x_{i}}\partial_{x_{j}}}(% x)=\int\tilde{c}_{ij}(x,z)\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+% \mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)}{\varepsilon}\Big{)}d\nu(z)+\frac{% \partial_{x_{i}}\mathbf{u}_{\varepsilon}(x)\partial_{x_{j}}\mathbf{u}_{% \varepsilon}(x)}{\varepsilon},

where

\displaystyle\tilde{c}_{ij}(x,z)

\displaystyle=\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}\partial_{x_{j}}}-% \frac{\partial_{x_{i}}c(x,z)\partial_{x_{j}}c(x,z)}{\varepsilon}.

Besides, the same holds for $\mathbf{u}_{\varepsilon}^{c,\varepsilon}$ and

\frac{\partial^{2}\mathbf{u}_{\varepsilon}^{c,\varepsilon}}{\partial_{z_{i}}% \partial_{z_{j}}}(z)=\int\overline{c}_{ij}(x,z)\exp\Big{(}\frac{\mathbf{u}_{% \varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)}{% \varepsilon}\Big{)}d\nu(z)+\frac{\partial_{z_{i}}\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)\partial_{z_{j}}\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)}{% \varepsilon},

with

\displaystyle\overline{c}_{ij}(x,z)

\displaystyle=\frac{\partial^{2}c(x,z)}{\partial_{z_{i}}\partial_{z_{j}}}-% \frac{\partial_{z_{i}}c(x,z)\partial_{z_{j}}c(x,z)}{\varepsilon}.

Remark 4.3.

One can find $e.g.$ in [25] that first-order derivatives of the cost $c$ are given by

\partial_{x_{i}}c(x,z)\partial_{x_{j}}c(x,z)=\frac{d(x,z)^{2}}{1-\langle x,z% \rangle^{2}}z_{i}z_{j},

and that second-order derivatives write

\displaystyle\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}\partial_{x_{j}}}

\displaystyle=d(x,z)\frac{\langle x,z\rangle}{\sqrt{1-\langle x,z\rangle^{2}}}% \Big{(}\mathds{1}_{i=j}-\frac{1}{1-\langle x,z\rangle^{2}}z_{i}z_{j}\Big{)}+% \frac{1}{1-\langle x,z\rangle^{2}}z_{i}z_{j}.

4.3. Regularized empirical distribution and quantile functions

Suppose that the estimator $\widehat{\mathbf{u}}_{\varepsilon,n}$ , defined in (4.9), has been computed using the stochastic algorithm (4.7) from $i.i.d.$ observations $X_{1},\cdots,X_{n}$ sampled from $\nu$ supported on ${\mathbb{S}^{2}}$ . To obtain a regularized quantile function, the empirical counterpart of Corollary 4.1 would involve integrals with respect to $\mu_{\mathbb{S}^{2}}$ to compute the smooth conjugate of $\widehat{\mathbf{u}}_{\varepsilon,n}$ . Therefore, to circumvent this issue, we consider a random sample $U_{1},\cdots,U_{N}$ uniformly drawn on ${\mathbb{S}^{2}}$ , and we define the following estimator (as an approximation of $\widehat{\mathbf{u}}_{\varepsilon,n}^{c,\varepsilon}$ )

(4.15)

\widehat{\mathbf{u}}_{N,n}^{c,\varepsilon}(z)=-\varepsilon\log\frac{1}{N}\sum_% {i=1}^{N}\exp\Big{(}\frac{\widehat{\mathbf{u}}_{\varepsilon,n}(U_{i})-c(U_{i},% z)}{\varepsilon}\Big{)}.

Then, thanks to (4.3), we remark that,

(4.16)

\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{% c,\varepsilon}(z)}{\varepsilon}\Big{)}=\frac{\exp\Big{(}\frac{\mathbf{u}_{% \varepsilon}^{c,\varepsilon}(z)-c(x,z)}{\varepsilon}\Big{)}}{\int\exp\Big{(}% \frac{\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)-c(x,z)}{\varepsilon}\Big{)}d% \nu(z)}.

Hence, plugging (4.15) into (4.16), we propose the following estimator, for the regularized quantile function $\mathbf{Q}_{\varepsilon}$ defined in Corollary 4.1,

(4.17)

\hat{\mathbf{Q}}_{N,n}^{\varepsilon}(x)=\text{Exp}_{x}\Big{(}\sum_{i=1}^{n}% \hat{g}_{N,n}^{\varepsilon}(x,X_{i})\text{Log}_{x}(X_{i})\Big{)},

where

\hat{g}_{N,n}^{\varepsilon}(x,z)=\frac{\exp\Big{(}\frac{\widehat{\mathbf{u}}_{% N,n}^{c,\varepsilon}(z)-c(x,z)}{\varepsilon}\Big{)}}{\sum_{j=1}^{n}\exp\Big{(}% \frac{\widehat{\mathbf{u}}_{N,n}^{c,\varepsilon}(X_{j})-c(x,X_{j})}{% \varepsilon}\Big{)}}.

In the same token, an estimator of $\mathbf{F}_{\varepsilon}$ is given by

(4.18)

\hat{\mathbf{F}}_{N,n}^{\varepsilon}(z)=\text{Exp}_{z}\Big{(}\sum_{i=1}^{N}% \tilde{g}_{N,n}^{\varepsilon}(U_{i},z)\text{Log}_{z}(U_{i})\Big{)},

with

\tilde{g}_{N,n}^{\varepsilon}(x,z)=\frac{\exp\Big{(}\frac{\widehat{\mathbf{u}}% _{\varepsilon,n}(x)-c(x,z)}{\varepsilon}\Big{)}}{\sum_{j=1}^{N}\exp\Big{(}% \frac{\widehat{\mathbf{u}}_{\varepsilon,n}(U_{j})-c(U_{j},z)}{\varepsilon}\Big% {)}}.

Note that the empirical version of unregularized MK quantiles that is proposed in [37] relies on discrete OT, yielding a bijection between a reference grid of points in ${\mathbb{S}^{2}}$ and the samples. This is beneficial for statistical testing where distribution-freeness of the ranks is highly desirable. On the contrary, regularization yields, even empirically, smooth maps that are not constrained to belong to the set of observed data, which is crucial for the descriptive analysis of Section 5.

Besides, the estimation of contours in [37] requires to solve two different discrete OT problems. The first one estimates the central point $\mathbf{F}(\theta_{M})$ , whereas the second involves a grid oriented towards the estimate of $\mathbf{F}(\theta_{M})$ , to render MK contours. On the contrary, with our algorithm targeting continuous OT, there is no need to solve two different OT problems, as the estimate $\widehat{\mathbf{u}}_{\varepsilon,n}$ yields both $\hat{\mathbf{F}}_{N,n}^{\varepsilon}$ and $\hat{\mathbf{Q}}_{N,n}^{\varepsilon}$ , and a fortiori $\hat{\mathbf{F}}_{N,n}^{\varepsilon}(\theta_{M})$ .

5. Depth-based data analysis

This section is dedicated to study a companion concept of directional MK quantiles, the MK statistical depth. We state a directional definition and discuss its properties. After that, we introduce descriptive tools in the spirit of the ones presented in [50] in the euclidean setting.

For the sake of completeness, we first study the Euclidean setting ${\mathbb{R}}^{d}$ before the directional one, that is of particular interest for us. Indeed, the results that we derive below do not appear as such in the literature, at least to the best of our knowledge.

5.1. Euclidean setting

We begin with the main definitions taken from [12]. Our chosen reference measure, denoted by $U_{d}$ , is given by the random vector $R\Phi$ , for $R$ and $\Phi$ independently drawn from $[0,1]$ and from the unit hypersphere $\mathbb{S}^{d-1}=\{\varphi\in{\mathbb{R}}^{d}:\|\varphi\|=1\}$ , respectively, as originally proposed in [12, 36] to define MK quantiles in ${\mathbb{R}}^{d}$ . Note that the MK distribution function, the inverse of the MK quantile function, might not exist, $e.g.$ if $\nu$ is discrete. Following [32], this is tackled with the Legendre-Fenchel dual of a convex function $\psi$ , given by $\psi^{*}(x)=\sup_{u\in{\mathbb{R}}^{d}}\{\langle x,u\rangle-\psi(u)\}$ .

Definition 5.1.

Let $\nu$ be an arbitrary probability measure on ${\mathbb{R}}^{d}$ . Its MK quantile function is the unique $\mathbf{Q}=\nabla\psi$ for some convex $\psi:{\mathbb{R}}^{d}\rightarrow{\mathbb{R}}$ such that $\mathbf{Q}_{\#}U_{d}=\nu$ . Then,

(1)

The MK $\alpha$ -quantile contour is the image by $\mathbf{Q}$ of the hypersphere

\mathcal{S}(\alpha)=\{u\in{\mathbb{R}}^{d}:\|u\|=\alpha\}.

(2)

The sign curve associated to $u\in\mathbb{B}(0,1)$ is the image by $\mathbf{Q}$ of the radius

L_{u}=\{t\frac{u}{\|u\|}:t\in[0,1]\}.

(3)

The MK depth of $x\in{\mathbb{R}}^{d}$ is the depth of $\nabla\psi^{*}$ under Tukey’s depth, [81],

D_{\nu}(x)=D_{U_{d}}^{Tukey}\Big{(}\nabla\psi^{*}(x)\Big{)}.

The Liu-Zuo-Serfling axioms [49, 85], describe desirable properties for depth concepts. The MK-depth softens some of them, to reach more relevant contours [12]. Firstly, MK depth corresponds to Tukey depth for elliptical families [12]. Moreover, it benefits from invariance properties [32][Lemmas A.7,A.8], with respect to scaling (multiplication by a positive constant), translations, and orthogonal transformations (multiplication by an orthogonal matrix). Note that the affine-invariance does not hold. Another axiom is the linear monotonicity relative to the deepest points, that is $D_{\nu}(x)\leq D_{\nu}((1-t)x_{0}+tx)$ for all $t\in[0,1]$ if $x_{0}$ is a deepest point. This is not fulfilled by the MK depth [12], although it verifies a similar property along sign curves, as we shall see now. Proofs of the following results are deferred to the Appendix.

Proposition 5.1 (Curvilinear monotonicity relative to the deepest points).

Assume that $\nu$ is continuous. The MK depth is monotonically decreasing along sign curves, that is, for each $u\in\mathbb{B}(0,1)$ and $t\in[0,1]$ ,

D_{\nu}(\mathbf{Q}(u))\leq D_{\nu}(\mathbf{Q}(tu)).

This corresponds to the classical linear monotonicity under distributions with straight sign curves, including spherical families due to the particular form of the MK quantile function in this setting, taken from [12].

Corollary 5.1.

For spherically symmetric distributions, sign curves are straight lines, and the MK depth verifies linear monotonicity relative to the deepest point. For any $x$ in the support of $\nu$ , for $x_{0}=\mathbb{E}(X)$ the deepest point of $\nu$ ,

(5.1)

\forall t\in[0,1],D_{\nu}(x)\leq D_{\nu}((1-t)x_{0}+tx).

We now turn to the properties of the directional MK depth.

5.2. Directional setting

Using the same ideas, one can define the MK depth on the sphere through any statistical depth with respect to the uniform $\mu_{\mathbb{S}^{2}}$ oriented towards $\mathbf{F}(\theta_{M})$ , and the simplest is surely to consider the proximity with $\mathbf{F}(\theta_{M})$ .

Definition 5.2.

Let $\nu$ be an arbitrary probability measure on ${\mathbb{S}^{2}}$ , with directional distribution function $\mathbf{F}$ . The directional MK depth of $x\in{\mathbb{S}^{2}}$ is defined by

D_{\nu}(x)=1-d(\mathbf{F}(x),\mathbf{F}(\theta_{M}))/\pi.

Regarding the ${\mathbb{S}^{2}}$ -adapted versions of Liu-Zuo-Serfling axioms, [49, 85], the directional MK depth behaves like its Euclidean counterpart. We begin with the four classical properties that are direct spherical counterparts of the Euclidean axioms, see $e.g.$ [46]. The affine-invariance is replaced on ${\mathbb{S}^{2}}$ by rotational invariance, which holds true from [37], see also Proposition A.1. Moreover, it is straightforward that $D_{\nu}$ attains its maximum at the center $\mathbf{F}(\theta_{M})$ , and that it vanishes at $-\mathbf{F}(\theta_{M})$ , the spherical counterpart of infinity. Finally, monotonicity along great circles is not fulfilled, but it is replaced in the same data-adaptive fashion than in ${\mathbb{R}}^{d}$ .

Proposition 5.2 (Curvilinear monotonicity relative to the deepest points).

Assume that $\nu$ is continuous. The directional MK depth is monotonically decreasing along sign curves. For each $x\in{\mathbb{S}^{2}}$ and $t\in[\langle x,\mathbf{F}(\theta_{m})\rangle,1]$ , let $x_{t}\in\mathcal{M}_{s}^{U}$ , for $s=\mathbf{S}_{\mathbf{F}(\theta_{M})}(x)$ , such that

(5.2)

x_{t}=t\mathbf{F}(\theta_{M})+\sqrt{1-t^{2}}s.

Then,

D_{\nu}(\mathbf{Q}(x))\leq D_{\nu}(\mathbf{Q}(x_{t})).

Explicit formulations for rotationally invariant distributions are given in [37], and recalled in Appendix B. In particular, they show that MK quantile contours coincide with Mahalanobis ones, [46], for such distributions, so that the following is straightforward.

Corollary 5.2.

For rotationally invariant distributions, sign curves are great circles. Thus, the MK depth verifies linear monotonicity along great circles, relative to the deepest point.

Other desirable axioms have been put forward recently in [62, 61], namely the upper semi-continuity and the non-rigidity of central regions. Upper semi-continuity is ensured to hold as soon as the MK distribution function $\mathbf{F}$ is continuous, thus at least for $\nu\in\mathbf{B}_{2}$ . Even more, when $\nu$ is arbitrary, taking the regularized directional MK depth built from $\mathbf{F}_{\varepsilon}$ , for $\varepsilon>0$ , imposes continuity, which may motivate such regularized estimator. Lastly, the non-rigidity of central regions states that quantile regions are not restricted to be spherical caps, which is readily true for the MK depth. In fact, its adaptivity to the underlying support is one of its main feature, and it can be seen as a stronger non-rigidity axiom, requiring that $\mathbf{Q}(U)\sim\nu$ as soon as $U\sim\mu$ . Furthermore, Proposition 5.2 and Corollary 5.2 shed some light on the non-verified axiom of monotony along great circles. Our results suggest that the directional MK depth alleviates these axioms when necessary, $e.g.$ for complex distributions such as mixtures, whereas the axioms are fulfilled for distributions for which it is useful, in particular for rotationally invariant ones.

5.3. Descriptive tools

The seminal paper [50] gathers descriptive tools based on data depths. Monge-Kantorovich analogs already exist for data in ${\mathbb{R}}^{d}$ , and we shall now extend some of them to the directional setting. We stress that the ability of our regularized estimator to interpolate between data points is crucial for $(i)$ smooth contours in practice and $(ii)$ computing volumes of quantile regions.

5.3.1. Representative plots

Firstly, [50] study representative plots for bivariate data, and the MK analog is given by the descriptive plots from [36, 37], the latter with the added information of sign curves. Figure 1 illustrates it on a Tangent von-Mises Fisher distribution, [29], and on a Mixture of two von-Mises Fisher distributions, with the help of our empirical regularized quantile function. One can observe that the shapes of the distributions are well recovered.

Refer to caption — Figure 1. Regularized quantile contours of levels $\{0.1,0.25,0.5,0.75,0.9\}$ and associated sign curves, with $\epsilon=10^{-1}$ .

5.3.2. Scale or dispersion

Hereafter, we present a graphical tool to describe the amount of dispersion, called the scale curve in [50] and whose MK analog has been introduced in [4] for Euclidean data. We mention that, in [50], this tool is also used to compare the variance of vector-valued estimators. Put simply, given any level $\alpha\in[0,1]$ , we consider the volumes $V(\alpha)$ of MK quantile regions. Plotting such volumes with respect to $\alpha\in[0,1]$ yields a scale curve [50]. The faster it grows, the greater the dispersion. Thus, if the scale curve of $\nu_{1}$ is consistently above the one of $\nu_{2}$ , then $\nu_{1}$ is more spread out than $\nu_{2}$ . On ${\mathbb{S}^{2}}$ , the volume is bounded, so we consider the normalized $\mu_{\mathbb{S}^{2}}$ instead of $\sigma_{\mathbb{S}^{2}}$ . Define

V(\alpha)=\int_{\mathcal{C}_{\alpha}}d\mu_{\mathbb{S}^{2}}(x)=\int_{\mathbb{S}% ^{2}}\mathds{1}_{\{x\in\mathcal{C}_{\alpha}\}}d\mu_{\mathbb{S}^{2}}(x)=\int_{% \mathbb{S}^{2}}\mathds{1}_{\{\langle\mathbf{F}(x),\mathbf{F}(\theta_{M})% \rangle\geq 1-2\alpha\}}d\mu_{\mathbb{S}^{2}}(x).

This can be estimated with a sample $U_{1},\cdots,U_{N}$ from $\mu_{\mathbb{S}^{2}}$ , by the proportion

V_{\varepsilon,n}(\alpha)=\frac{1}{N}\sum_{i=1}^{N}\mathds{1}_{\{\langle\hat{% \mathbf{F}}_{N,n}^{\varepsilon}(U_{i}),\mathbf{F}(\theta_{M})\rangle\geq 1-2% \alpha\}}.

On the left-hand side of Figure 2, we draw the scale curves of von-Mises Fisher distributions with varying concentration parameter $\kappa\in\{1,2,5,15\}$ , which controls the dispersion of samples. It is well-captured that the lower the value of $\kappa$ , the more spread out is the underlying distribution.

Besides, univariate order statistics (and equivalently, quantiles) are fundamental to analyse the presence of outliers. In our spherical setting, the scale curve is able to conveniently summarize this type of information, as illustrated in the right-hand side of Figure 2. We consider $n=500$ observations coming from three identical von-Mises Fisher distributions with dispersion parameter $\kappa=15$ and mean $(0,1,0)^{T}$ , but each with a certain number $N\in\{5,20,50\}$ of outliers localized near from the North Pole $(0,0,1)^{T}$ . It appears that the dispersion of quantile regions up to the order $\alpha\approx 0.8$ are identical, whereas the dispersion increases with the number of outliers for peripheric quantile regions, which is precisely the expected behavior.

6. Numerical experiments

Our numerical experiments first qualitatively compare our regularized estimator of the MK quantile function with other existing notions of spherical quantiles. After that, we study the influence of the regularization strength $\varepsilon>0$ on the quantitative criterion of MSE with known ground truth, and on the smoothness of regularized quantile contours.

6.1. Other concepts of quantiles

In Figure 3, we display a visual comparison of existing concepts for quantiles on the sphere by focusing our statistical analysis on mixtures of von-Mises Fisher distributions. It can be observed from Figure 3 that Mahalanobis quantile regions [46] are concentric spherical caps, whereas spatial quantiles [44] and our regularized MK quantiles can exhibit more complex shapes that better fit the geometry of the data. To that extent, spatial and regularized MK quantiles, obtained through our entropically regularized estimator $\mathbf{Q}_{\varepsilon}$ , are both more satisfactory. For the spatial quantiles, our naive implementation forces them to belong to data samples. For each notion of quantiles, $100$ points are drawn within each contours, with straight lines to link them. We emphasize that spatial quantiles are not indexed by their probability content, as opposed to the MK ones [37]. Because entropic MK quantiles interpolate between data points, contours cross the void between mixture components. A careful inspection shows that the number of points per contour within this void is much lower than in the high density areas. This illustrates how the variation of mass, that is the underlying geometry, is captured by our regularized estimator.

6.2. Estimation of OT maps

In Figure 4, we study the influence of the regularization parameter on the estimation of known quantile contours, that were described in [37] and that are recalled in Appendix B for the sake of completeness. The mean-squared error from uniform samples $(x_{i})\subset{\mathbb{S}^{2}}$ is

\mathcal{R}_{n}(\widehat{Q})=\frac{1}{n}\sum_{i=1}^{n}c(Q(x_{i}),\widehat{Q}(x% _{i})),

for $\widehat{Q}$ denoting either our regularized MK quantile estimator or the unregularized one proposed in [37]. The same experiments are performed $50$ times. Samples of size $n=500$ are drawn from the uniform $\mu_{\mathbb{S}^{2}}$ and from a von-Mises fisher distribution of location $(0,0,1)^{T}$ and concentration $\kappa=10$ . For several values of $\varepsilon$ , the regularized $\mathbf{Q}_{\varepsilon}$ is estimated, and the unregularized estimator $\widehat{\mathbf{Q}}_{0}$ from [37] is computed. Each resulting estimator is compared to the ground truth by $\mathcal{R}_{n}(\widehat{Q})$ , computed on the uniform sample $(x_{i})\subset{\mathbb{S}^{2}}$ of size $n$ . By doing this experiment $50$ times, we obtain a boxplot of MSE values for various values of the regularization parameter $\varepsilon$ . The results are reported in Figure 4, where $\varepsilon=0$ refers to the MSE of $\widehat{\mathbf{Q}}_{0}$ . The dashed horizontal line illustrates the median value for the MSE of $\widehat{\mathbf{Q}}_{0}$ . It can be observed that the entropic regularization is able to significantly outperform the estimation of the quantile map, in particular for values around $\varepsilon\approx 0.09$ .

6.3. Qualitative effect of regularization

In Figure 5, we visually compare regularized and unregularized MK spherical quantile contours of orders $24.4\%,48.8\%,75.6\%$ on the same von-Mises fisher distribution than in Figure 4. Such uncommon probability contents are inherent to empirical unregularized contours because the number of contours as well as their size depends on the sample size, that is here fixed at $n=2001$ . Ground-truth contours deduced from (B.3) are presented together with unregularized and regularized ones, for $\varepsilon\in\{0.01,0.05,1\}$ . Each contour contains $100$ points, linked by straight lines. For $\varepsilon=0.01$ , contours adapt too much on the finite-sample data, causing errors as ground-truth contours are smoother. For $\varepsilon=1$ , contours are smoother, but there is too much bias in the approximation between $\widehat{\mathbf{Q}}_{\varepsilon}$ and the underlying ground truth. For the well-chosen $\varepsilon=0.05$ , the trade-off between regularity and low-bias allows the better estimation. This sheds some light on the behavior of regularization. The lower the $\varepsilon$ , the more adapted $\widehat{\mathbf{Q}}_{\varepsilon}$ is to the finite-sample data and its irregularities. Larger values of $\varepsilon$ induce smoother contours, as a byproduct of a greater regularity for $\widehat{\mathbf{Q}}_{\varepsilon}$ . Thus, this emphasizes the need for calibration of the regularization strength.

7. Concluding remarks

The major limitation of discrete OT for quantiles estimation is that it results in a matching between samples, instead of a function $Q$ able to provide out-of-sample estimates $Q(x)$ . In the present paper, we showed that regularizing by entropy can be used as an alternative, particularly when the focus is on the quantile contours or the volumes of quantile regions. Still, we emphasize that the entropic regularization loses the distribution-freeness of associated ranks, compared to solving the discrete-discrete OT problem as in [37]. Because it is crucial for rank-based statistical testing, the choice of using OT or EOT shall depend on the considered task.

Our regularized quantile function is an entropic map, that has been generalized here outside of the euclidean setting, building on the particular structure of the $2$ -sphere. Our numerical scheme leverages the existence of spherical Fourier series to construct a stochastic gradient descent to solve continuous OT in the limit of the iterations. This is particularly useful when the number of observations $n$ is large and prevents the storage of the cost matrix. Numerical experiments revealed the ability of entropically regularized quantiles to improve the mean-squared-error when the ground truth is known, and showed the potential of this approach for the analysis of spherical data.

Appendix

Appendix A Invariance properties

Invariance properties of empirical versions of $\mathbf{F}$ and $\mathbf{Q}$ were shown in [37]. The same holds in fact for the population counterparts, with the same argument : the transport problem (3.8) inherits invariance from the Riemannian distance.

Proposition A.1.

In dimension $d$ , let $\mathbf{O}$ be a $d\times d$ orthogonal matrix and let $\nu\in\mathbf{B}_{2}$ . Denote by $\mathbf{O}_{\#}\nu$ the distribution of $\mathbf{O}Z$ if $Z\sim\nu$ , and by $\mathbf{F}_{Z},\mathbf{Q}_{Z}$ , ( $resp.$ $\mathbf{F}_{\mathbf{O}Z},\mathbf{Q}_{\mathbf{O}Z}$ ), the distribution and quantile functions of $\nu$ , ( $resp.$ $\mathbf{O}_{\#}\nu$ ). Then,

\mathbf{F}_{\mathbf{O}Z}(\mathbf{O}z)=\mathbf{O}\mathbf{F}(z).

and

\mathbf{Q}_{\mathbf{O}Z}(\mathbf{O}z)=\mathbf{O}\mathbf{Q}(z).

Proof.

Note that the Kantorovich problem, equivalent to (3.8) when $\nu\in\mathbf{B}_{2}$ , minimizes

\int_{\mathbb{S}^{2}}\int_{\mathbb{S}^{2}}c(x,y)d\pi(x,y),

over the set of joint probabilities $\pi$ supported on ${\mathbb{S}^{2}}\times{\mathbb{S}^{2}}$ with marginals $\mu_{\mathbb{S}^{2}},\nu$ , see $e.g.$ [37]. Because $c(\mathbf{O}x,\mathbf{O}y)=c(x,y),$ the transport problem between $\mu_{\mathbb{S}^{2}}$ and $\nu$ is equivalent to the one between $\mathbf{O}_{\#}\mu_{\mathbb{S}^{2}}$ and $\mathbf{O}_{\#}\nu$ . Going back to Monge’s problem (3.8), it immediately follows that the Monge map $T_{\#}\mu_{\mathbb{S}^{2}}=\nu$ verifies

\mathbf{O}T(z)=T_{\mathbf{O}Z}(\mathbf{O}z).

Up to interverting the reference and the target measures, the result follows. ∎

The following corollary is straightforward.

Corollary A.1.

For any $\tau\in[0,1]$ , $\mathbf{O}\mathcal{C}_{\tau}=\mathbf{Q}_{\mathbf{O}Z}(\mathbf{O}\mathcal{C}^{U% }_{\tau})$

From [74][Lemma 1] or [27][Theorem 3.2], the convex combination between $c$ -concave functions is itself $c$ -concave, giving rise to the following immediate consequence. We also refer to [15][Section 5].

Lemma A.1 (Interpolation).

Let $\nu_{1},\nu_{2}$ be directional probability distributions with given Kantorovich potentials $\psi_{1},\psi_{2}$ and MK quantile functions $\mathbf{Q}_{1},\mathbf{Q}_{2}$ , respectively. For any $t\in[0,1]$ , let $\psi_{t}=t\psi_{1}+(1-t)\psi_{2}$ . Then, the interpolation

\mathbf{Q}_{t}(x)=\text{Exp}_{x}(-\nabla\psi_{t}(x))

is the directional MK quantile function of the distribution $\nu_{t}$ defined by $\nu_{t}={Q_{t}}_{\#}\mu_{\mathbb{S}^{2}}$ .

Appendix B Explicit forms

Closed-form expressions of $\mathbf{F}$ for rotationally invariant distributions were given in [37] and simplify in dimension $d=3$ , allowing to deduce the inverse map $\mathbf{Q}$ . Let $Z\sim\nu$ be such a random vector with axis $\pm\theta_{M}$ . Then, assume that $\nu$ has density

z\in{\mathbb{S}^{2}}\mapsto c_{f}f(z^{T}\theta_{M}),

for $f$ some positive angular function and $c_{f}$ a normalizing constant. For $r\in[-1,1]$ , denote by

F_{f}(r)=\int_{-1}^{r}f(s)ds/\int_{-1}^{1}f(s)ds

the distribution function of $Z^{T}\theta_{M}$ and by $Q_{f}=F_{f}^{-1}$ its quantile function. Then, letting $F_{f}^{*}(r)=2F_{f}(r)-1$ , the directional distribution function of $Z$ writes

(B.1)

\mathbf{F}(z)=F_{f}^{*}(z^{T}\theta_{M})\theta_{M}+\sqrt{1-F_{f}^{*}(z^{T}% \theta_{M})^{2}}S_{\theta_{M}}(z).

For instance, taking $f(s)=\exp(\kappa z^{T}\theta_{M})$ corresponds to the von Mises-Fisher distribution with location parameter $\theta_{M}$ and concentration parameter $\kappa\in{\mathbb{R}}_{+}$ . Crucially, the transport (B.1) reduces to univariate transport along the axis $\theta_{M}=\mathbf{F}(\theta_{M})$ . If $\theta_{M}=(0,0,1)^{T}$ , that is to say up to some rotation thanks to Proposition A.1 and Corollary A.1, this corresponds to changing the latitude $w.r.t.$ the usual coordinate system (3.1). Indeed, as soon as $\theta_{M}=(0,0,1)^{T}$ ,

\mathbf{F}(z)=F_{f}^{*}(z_{3})\theta_{M}+\sqrt{1-F_{f}^{*}(z_{3})^{2}}\frac{(z% _{1},z_{2},0)^{T}}{\|(z_{1},z_{2},0)^{T}\|}.

The third coordinate is changed to $F_{f}^{*}(z_{3})$ , and the other coordinates are adapted to the constraint $\mathbf{F}(z)\in{\mathbb{S}^{2}}$ . This rewrites, in accordance with (3.1),

(B.2)

\mathbf{F}(z)=\mathbf{F}(\Phi(\theta,\phi))=\Phi(\overline{\theta},\phi)\hskip 2% 8.45274pt\mbox{for}\hskip 28.45274pt\overline{\theta}=\arccos\Big{(}(F_{f}^{*}% )(z_{3})\Big{)}.

Consequently, to get the inverse map $\mathbf{Q}=\mathbf{F}^{-1}$ , it suffices to change the pseudo latitude of $\mathbf{F}(z)\in\mathcal{C}_{\tau}^{U}$ $w.r.t.$ the axis $\pm\theta_{M}$ . If $x=\mathbf{F}(z)$ , $x_{3}=F_{f}^{*}(z_{3})$ and $z_{3}=(F_{f}^{*})^{-1}(x_{3})$ , that is

(B.3)

\mathbf{Q}(x)=\mathbf{Q}(\Phi(\theta,\phi))=\Phi(\tilde{\theta},\phi)\hskip 28% .45274pt\mbox{for}\hskip 28.45274pt\tilde{\theta}=\arccos\Big{(}(F_{f}^{*})^{-% 1}(x_{3})\Big{)}.

As highlighted in [37], this shows that MK quantile contours coincide with Mahalanobis ones from [46] under the rotationally symmetric model.

Appendix C Proofs : Entropic maps

C.1. Proof of Proposition 4.3

Denoting by $v=\mathbf{u}_{\varepsilon}^{c,\varepsilon}$ , rewriting (4.3) gives

\mathbf{u}_{\varepsilon}(x)=-\varepsilon\log\int\exp\Big{(}\frac{v(z)-c(x,z)}{% \varepsilon}\Big{)}d\nu(z).

By the chain rule,

(C.1)

\partial_{x_{i}}\mathbf{u}_{\varepsilon}(x)=-\varepsilon\frac{\partial_{x_{i}}% J(x)}{J(x)}\quad\mbox{for}\quad J(x)=\int\exp\Big{(}\frac{v(z)-c(x,z)}{% \varepsilon}\Big{)}d\nu(z).

We now turn to the differentiation of $J$ . As shown in [64][Lemma 2.1],

\inf_{x\in{\mathbb{S}^{2}}}\{c(x,z)-\mathbf{u}_{\varepsilon}(x)\}\leq v(z)\leq% \int c(x,z)d\mu_{\mathbb{S}^{2}}(x).

By boundedness of ${\mathbb{S}^{2}}$ , $v$ is bounded and so is the integrand in (C.1). As we deal with probability measures, this justifies using the differentiation under the integral sign, that is

\partial_{x_{i}}J(x)=\int\partial_{x_{i}}\exp\Big{(}\frac{v(z)-c(x,z)}{% \varepsilon}\Big{)}d\nu(z),

which induces

(C.2)

\partial_{x_{i}}J(x)=\int-\frac{\partial_{x_{i}}c(x,z)}{\varepsilon}\exp\Big{(% }\frac{v(z)-c(x,z)}{\varepsilon}\Big{)}d\nu(z).

Fix $z\in{\mathbb{S}^{2}}$ , so

(C.3)

\partial_{x_{i}}c(x,z)=d(x,z)\partial_{x_{i}}d(x,z)\quad\mbox{and}\quad% \partial_{x_{i}}d(x,z)=\frac{-1}{\sqrt{1-\langle x,z\rangle^{2}}}z_{i}.

Combining (C.2) with (C.3),

(C.4)

\partial_{x_{i}}J(x)=\int\frac{z_{i}}{\varepsilon}\frac{d(x,z)}{\sqrt{1-% \langle x,z\rangle^{2}}}\exp\Big{(}\frac{v(z)-c(x,z)}{\varepsilon}\Big{)}d\nu(% z).

Plugging (C.4) in (C.1) gives (4.14), where $g_{\varepsilon}$ defined in (4.12) shows up because, by properties of $\exp$ ,

(C.5)

\exp\Big{(}\frac{v(z)-c(x,z)+\mathbf{u}_{\varepsilon}(x)}{\varepsilon}\Big{)}=% \frac{\exp\Big{(}\frac{v(z)-c(x,z)}{\varepsilon}\Big{)}}{\int\exp\Big{(}\frac{% v(y)-c(x,y)}{\varepsilon}\Big{)}d\nu(y)}

Using also that,

(C.6)

\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{% c,\varepsilon}(z)}{\varepsilon}\Big{)}=\frac{\exp\Big{(}\frac{\mathbf{u}_{% \varepsilon}(x)-c(x,z)}{\varepsilon}\Big{)}}{\int\exp\Big{(}\frac{\mathbf{u}_{% \varepsilon}(y)-c(z,y)}{\varepsilon}\Big{)}d\mu_{\mathbb{S}^{2}}(y)},

the same arguments on $\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)=-\varepsilon\log\int\exp\Big{(}% \frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)}{\varepsilon}\Big{)}d\mu_{\mathbb{S}^% {2}}(x)$ yield (4.13). $\mathbin{\vbox{\hrule\hbox{\vrule height=6.02773pt\kern 6.00006pt\vrule height% =6.02773pt}\hrule}}$

C.2. Proof of Corollary 4.1

Using the Euclidean derivatives obtained in Proposition 4.3,

\nabla\mathbf{u}_{\varepsilon}(x)=\rho_{x}\int zg_{\varepsilon}(x,z)d\nu(z)% \quad\mbox{and}\quad\nabla\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)=\rho_{z}% \int xg_{\varepsilon}(x,z)d\mu_{\mathbb{S}^{2}}(x).

But this is equivalent to

\nabla\mathbf{u}_{\varepsilon}(x)=\int-\frac{d(x,z)}{\sqrt{1-\langle x,z% \rangle^{2}}}\rho_{x}(z)\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+% \mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)}{\varepsilon}\Big{)}d\nu(z),

and

\nabla\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)=\int-\frac{d(x,z)}{\sqrt{1-% \langle x,z\rangle^{2}}}\rho_{z}(x)\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x% )-c(x,z)+\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)}{\varepsilon}\Big{)}d\mu_% {\mathbb{S}^{2}}(x).

There, one recovers an explicit formulation of $\text{Log}_{x}=\text{Exp}_{x}^{-1}$ , the inverse of the exponential map, see for instance [25], that gives

\nabla\mathbf{u}_{\varepsilon}(x)=-\int\text{Log}_{x}(z)\exp\Big{(}\frac{% \mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)% }{\varepsilon}\Big{)}d\nu(z),

and

\nabla\mathbf{u}_{\varepsilon}^{c,\varepsilon}(z)=-\int\text{Log}_{z}(x)\exp% \Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{\varepsilon}^{c,% \varepsilon}(z)}{\varepsilon}\Big{)}d\mu_{\mathbb{S}^{2}}(x).

$\mathbin{\vbox{\hrule\hbox{\vrule height=6.02773pt\kern 6.00006pt\vrule height% =6.02773pt}\hrule}}$

C.3. Proof of Proposition 4.4

The same calculus can be found in [30][Lemma 3], up to the fact that one can recognize partial derivatives of $\mathbf{u}_{\varepsilon}$ , that is (4.13), in the result, at the very end of our proof. First of all, $g_{\varepsilon}$ is bounded by using [64][Lemma 2.1] and the compacity of ${\mathbb{S}^{2}}$ . Thus, one can differentiate in (4.13) under the integral sign, and

(C.7)

\frac{\partial^{2}\mathbf{u}_{\varepsilon}}{\partial_{x_{i}}\partial_{x_{j}}}(% x)=\int\partial_{x_{j}}z_{i}g_{\varepsilon}(x,z)d\nu(z).

In view of using classical rules of differentiation, note that $z_{i}g_{\varepsilon}(x,z)=(\partial_{x_{i}}c(x,z))G(x,z)$ , for

G(x,z)=\exp\Big{(}\frac{\mathbf{u}_{\varepsilon}(x)-c(x,z)+\mathbf{u}_{% \varepsilon}^{c,\varepsilon}(z)}{\varepsilon}\Big{)}.

Besides,

\partial_{x_{j}}G(x,z)=\frac{1}{\varepsilon}G(x,z)\partial_{x_{j}}(\mathbf{u}_% {\varepsilon}(x)-c(x,z)).

As a byproduct,

(C.8)		$\displaystyle\partial_{x_{j}}z_{i}g_{\varepsilon}(x,z)$	$\displaystyle=\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}\partial_{x_{j}}}G(x,z% )+\partial_{x_{i}}c(x,z)\frac{1}{\varepsilon}G(x,z)\Big{(}\partial_{x_{j}}% \mathbf{u}_{\varepsilon}(x)-\partial_{x_{j}}c(x,z)\Big{)},$
(C.9)			$\displaystyle=G(x,z)\Big{(}\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}\partial_% {x_{j}}}+\partial_{x_{i}}c(x,z)\frac{\partial_{x_{j}}\mathbf{u}_{\varepsilon}(% x)-\partial_{x_{j}}c(x,z)}{\varepsilon}\Big{)}.$

Plugging (C.9) in (C.7) and using that $z_{i}g_{\varepsilon}(x,z)=(\partial_{x_{i}}c(x,z))G(x,z)$ when rearranging,

	$\displaystyle\frac{\partial^{2}\mathbf{u}_{\varepsilon}}{\partial_{x_{i}}% \partial_{x_{j}}}(x)$	$\displaystyle=\int G(x,z)\Big{(}\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}% \partial_{x_{j}}}-\frac{\partial_{x_{i}}c(x,z)\partial_{x_{j}}c(x,z)}{% \varepsilon}\Big{)}+\frac{\partial_{x_{j}}\mathbf{u}_{\varepsilon}(x)}{% \varepsilon}z_{i}g_{\varepsilon}(x,z)d\nu(z),$
		$\displaystyle=\int G(x,z)\Big{(}\frac{\partial^{2}c(x,z)}{\partial_{x_{i}}% \partial_{x_{j}}}-\frac{\partial_{x_{i}}c(x,z)\partial_{x_{j}}c(x,z)}{% \varepsilon}\Big{)}d\nu(z)+\frac{1}{\varepsilon}\partial_{x_{i}}\mathbf{u}_{% \varepsilon}(x)\partial_{x_{j}}\mathbf{u}_{\varepsilon}(x).$

where we also used the explicit derivatives of $\mathbf{u}_{\varepsilon}$ from (4.13). The Hessian of $\mathbf{u}_{\varepsilon}^{c,\varepsilon}$ follows by symmetry.

C.4. Proof of Proposition 4.1

From Proposition 4.4, $\mathbf{u}_{\varepsilon}$ is twice-differentiable everywhere, that gives us the continuity of $\mathbf{Q}_{\varepsilon}$ , and of $\partial_{x_{i}}\mathbf{u}_{\varepsilon}$ . In the expression of the second-order partial derivatives given in (4.4), the term $\frac{1}{\varepsilon}\partial_{x_{i}}\mathbf{u}_{\varepsilon}(x)\partial_{x_{j% }}\mathbf{u}_{\varepsilon}(x)$ is thus continuous. The remaining term takes the form of a parameter-dependant integral, whose integrand is continuous and bounded. Thus, the result follows by a direct application of the theorem for continuity under the integral sign, and by using the property that a sequence of spherical harmonics belongs to $\ell_{1}$ for functions that are twice continuously differentiable, see [42][Theorem 2].

Appendix D Proofs : Directional MK depth

D.1. Proof of Proposition 5.1

Recall that Tukey’s depth verifies linear monotonicity relative to the deepest points [85]. As the origin is the deepest point for $U_{d}$ , this writes, for any $t\in[0,1]$ ,

(D.1)

D_{U_{d}}^{Tukey}(u)\leq D_{U_{d}}^{Tukey}(tu).

From Definition 5.1, $D_{\nu}(\mathbf{Q}(u))=D_{U_{d}}^{Tukey}(\nabla\psi^{*}\circ\mathbf{Q}(u))$ . By continuity of $\nu$ , $\nabla\psi^{*}\circ\mathbf{Q}(u)=u$ $a.e.$ , see [32] or [82][Theorem 2.12 and Corollary 2.3]. Thus the result follows, with (D.1).

D.2. Proof of Corollary 5.1

Let $X$ be a random vector associated with a spherically symmetric distribution, for which $\mathbb{E}(X)$ and the deepest point shall coincide. From [12], the MK distribution function of $X$ is known. By inverting it, we get its quantile function

\mathbf{Q}(u)=\frac{u}{\|u\|}G^{-1}(\|u\|)+\mathbb{E}(X),

where $G$ is the univariate distribution function of the radial part $\|X-\mathbb{E}(X)\|$ . Because $\|X\|\geq 0$ $a.s.$ and $G^{-1}$ is increasing, $G^{-1}(t\|u\|)/G^{-1}(\|u\|)\in[0,1]$ and

\mathbf{Q}(tu)=\frac{u}{\|u\|}G^{-1}(t\|u\|)+\mathbb{E}(X)=\frac{G^{-1}(t\|u\|% )}{G^{-1}(\|u\|)}\Big{(}\mathbf{Q}(u)-\mathbb{E}(X)\Big{)}+\mathbb{E}(X).

This rewrites, for $\delta_{t}=G^{-1}(t\|u\|)/G^{-1}(\|u\|)$ , $\mathbf{Q}(tu)=\delta_{t}\mathbf{Q}(u)+(1-\delta_{t})\mathbb{E}(X).$ Besides, $\delta_{t}$ takes all values between $0$ and $1$ for $t\in[0,1]$ . This, combined with Proposition (5.1) induces

\forall u\in\mathbb{B}(0,1),\forall\delta\in[0,1],D_{\nu}(\mathbf{Q}(u))\leq D% _{\nu}(\delta\mathbf{Q}(u)+(1-\delta)\mathbb{E}(X)).

But any $x$ in the support of $\nu$ writes $\mathbf{Q}(u)$ for $u=\mathbf{F}(x)$ , which gives (5.1).

D.3. Proof of Proposition 5.2

Fix $x\in{\mathbb{S}^{2}}$ . From the decomposition (3.11), $s=\mathbf{S}_{\mathbf{F}(\theta_{M})}(x)$ is the directional sign associated to $x$ . For $t\in[-1,1]$ , let $x_{t}\in\mathcal{M}_{s}^{U}$ be a parameterization of the reference sign curve associated to $s$ , as in (5.2). Immediately, one may note that $\langle x_{t},\mathbf{F}(\theta_{M})\rangle=t$ , and $x_{t}=x$ for $t=\langle x,\mathbf{F}(\theta_{M})\rangle$ . Besides, $D_{\nu}(\mathbf{Q}(x_{t}))=1-d(x_{t},\mathbf{F}(\theta_{M}))/\pi=1-\arccos(t)/\pi$ , so, as soon as $t\geq\langle x,\mathbf{F}(\theta_{m})\rangle$ ,

D_{\nu}(\mathbf{Q}(x_{t}))\geq D_{\nu}(\mathbf{Q}(x)).

D.4. Proof of Corollary 5.2

For any $x\in{\mathbb{S}^{2}}$ and $t\in[\langle x,\mathbf{F}(\theta_{M})\rangle,1]$ , consider a parametrization of the sign curve associated to $x$ as $x_{t}=t\mathbf{F}(\theta_{M})+\sqrt{1-t^{2}}\mathbb{S}_{\mathbf{F}(\theta_{M})% }(x)$ . For the explicit $\mathbf{F}$ for rotationally invariant distributions given in (B.1), $\mathbf{F}(\theta_{M})=\theta_{M}$ , thus $x_{t}=t\theta_{M}+\sqrt{1-t^{2}}\mathbb{S}_{\theta_{M}}(x)$ , and sign curves are great circles. $\mathbin{\vbox{\hrule\hbox{\vrule height=6.02773pt\kern 6.00006pt\vrule height% =6.02773pt}\hrule}}$

References

[1] C. Agostinelli and M. Romanazzi, Nonparametric analysis of directional data based on data depth, Environmental and ecological statistics, 20 (2013), pp. 253–270.
[2] J. Ameijeiras-Alonso and R. M. Crujeiras, Directional statistics for wildfires, in Applied directional statistics, Chapman and Hall/CRC, 2018, pp. 203–226.
[3] V. Barnett, The ordering of multivariate data, Journal of the Royal Statistical Society: Series A (General), 139 (1976), pp. 318–344.
[4] J. Beirlant, S. Buitendag, E. del Barrio, M. Hallin, and F. Kamper, Center-outward quantiles and the measurement of multivariate risk, Insurance: Mathematics and Economics, 95 (2020), pp. 79–100.
[5] J.-D. Benamou, W. L. Ijzerman, and G. Rukhaia, An entropic optimal transport numerical approach to the reflector problem, Methods and Applications of Analysis, (2020).
[6] B. Bercu and J. Bigot, Asymptotic distribution and convergence rates of stochastic algorithms for entropic optimal transportation between probability measures, The Annals of Statistics, 49 (2021), pp. 968 – 987.
[7] B. Bercu, J. Bigot, and G. Thurin, Stochastic optimal transport in banach spaces for regularized estimation of multivariate quantiles. arXiv, 2023.
[8] E. Bernton, P. Ghosal, and M. Nutz, Entropic optimal transport : Geometry and large deviations, Duke Mathematical Journal, 171 (2022), pp. 3363–3400.
[9] F. Bigi, G. Fraux, N. J. Browning, and M. Ceriotti, Fast evaluation of spherical harmonics with sphericart, The Journal of Chemical Physics, 159 (2023).
[10] G. Carlier, V. Chernozhukov, G. De Bie, and A. Galichon, Vector quantile regression and optimal transport, from theory to numerics, Empirical Economics, 62 (2022), pp. 35–62.
[11] P. Chaudhuri, On a geometric notion of quantiles for multivariate data, Journal of the American statistical association, 91 (1996), pp. 862–872.
[12] V. Chernozhukov, A. Galichon, M. Hallin, and M. Henry, Monge–Kantorovich depth, quantiles, ranks and signs, The Annals of Statistics, 45 (2017), pp. 223 – 256.
[13] G. S. Chirikjian and A. B. Kyatkin, Engineering applications of noncommutative harmonic analysis: with emphasis on rotation and motion groups, CRC press, 2000.
[14] S. Cohen, B. Amos, and Y. Lipman, Riemannian convex potential maps, in International Conference on Machine Learning, PMLR, 2021, pp. 2028–2038.
[15] D. Cordero-Erausquin, R. J. McCann, and M. Schmuckenschläger, A riemannian interpolation inequality à la borell, brascamp and lieb, Inventiones mathematicae, 146 (2001), pp. 219–257.
[16] M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems, 26 (2013).
[17] M. Cuturi, M. Klein, and P. Ablin, Monge, Bregman and occam: Interpretable optimal transport in high-dimensions with feature-sparse maps, vol. 202 of Proceedings of Machine Learning Research, 2023, pp. 6671–6682.
[18] M. Cuturi and G. Peyré, Computational optimal transport, Foundations and Trends® in Machine Learning, 11 (2019), pp. 355–607.
[19] E. del Barrio, A. González Sanz, and M. Hallin, Nonparametric multiple-output center-outward quantile regression, Journal of the American Statistical Association, (2024), pp. 1–43.
[20] P. Delanoë and G. Loeper, Gradient estimates for potentials of invertible gradient–map**s on the sphere, Calculus of Variations and Partial Differential Equations, 26 (2006), pp. 297–311.
[21] H. Demni, A. Messaoud, and G. C. Porzio, The cosine depth distribution classifier for directional data, Applications in Statistical Computing: From Music Data Analysis to Industrial Quality Improvement, (2019), pp. 49–60.
[22] H. Demni and G. C. Porzio, Directional dd-classifiers under non-rotational symmetry, in 2021 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), 2021, pp. 1–6.
[23] J.-L. Dortet-Bernadet and N. Wicker, Model-based clustering on the unit sphere with an illustration using gene expression profiles, Biostatistics, 9 (2008), pp. 66–80.
[24] Y. Fan, M. Henry, B. Pass, and J. A. Rivero, Lorenz map, inequality ordering and curves based on multidimensional rearrangements. arXiv, 2022.
[25] O. Ferreira, A. Iusem, and S. Németh, Concepts and techniques of optimization on the sphere, Top, 22 (2014), pp. 1148–1170.
[26] J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré, Interpolating between optimal transport and mmd using sinkhorn divergences, in The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, 2019, pp. 2681–2690.
[27] A. Figalli, Y.-H. Kim, and R. J. McCann, When is multidimensional screening a convex program?, Journal of Economic Theory, 146 (2011), pp. 454–478.
[28] M. Frungillo, Discrete approximation of optimal transport on compact spaces, arXiv preprint arXiv:2401.14538, (2024).
[29] E. García-Portugués, D. Paindaveine, and T. Verdebout, On optimal tests for rotational symmetry against new classes of hyperspherical distributions, Journal of the American Statistical Association, 115 (2020), pp. 1873–1887.
[30] A. Genevay, Entropy-regularized Optimal Transport for Machine Learning, PhD thesis, Université Paris sciences et lettres, 2019.
[31] A. Genevay, M. Cuturi, G. Peyré, and F. Bach, Stochastic Optimization for Large-scale Optimal Transport, Advances in neural information processing systems, 29 (2016).
[32] P. Ghosal and B. Sen, Multivariate ranks and quantiles using optimal transport: Consistency, rates, and nonparametric testing, The Annals of Statistics, 50 (2022), pp. 1012–1037.
[33] Z. Goldfeld, K. Kato, G. Rioux, and R. Sadhu, Limit theorems for entropic optimal transport maps and sinkhorn divergence, Electronic Journal of Statistics, 18 (2024), pp. 980–1041.
[34] M. Hallin, From mahalanobis to bregman via monge and kantorovich: Towards a “general generalized distance”, Sankhya B, 80 (2018), pp. 135–146.
[35] M. Hallin, Measure Transportation and Statistical Decision Theory, Annual Review of Statistics and Its Application, 9 (2022), pp. 401–424.
[36] M. Hallin, E. del Barrio, J. Cuesta-Albertos, and C. Matrán, Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach, The Annals of Statistics, 49 (2021), pp. 1139 – 1165.
[37] M. Hallin, H. Liu, and T. Verdebout, Nonparametric measure-transportation-based methods for directional data, Journal of the Royal Statistical Society Series B: Statistical Methodology, (2024).
[38] M. Hallin and G. Mordant, Center-outward multiple-output lorenz curves and gini indices a measure transportation approach, working papers ecares, ULB – Universite Libre de Bruxelles, 2022.
[39] M. Hallin, D. Vecchia, and H. Liu, Rank-based testing for semiparametric var models: A measure transportation approach, Bernoulli, 29 (2023).
[40] B. F. Hamfeldt and A. G. Turnquist, A convergence framework for optimal transport on the sphere, Numerische Mathematik, 151 (2022), pp. 627–657.
[41] K. Hauch and C. Redenbach, Quantiles and depth for directional data from elliptically symmetric distributions, arXiv preprint arXiv:2210.06098, (2022).
[42] H. Kalf, On the expansion of a function in terms of spherical harmonics in arbitrary dimensions, Bulletin of the Belgian Mathematical Society-Simon Stevin, 2 (1995), pp. 361–380.
[43] O. D. Kellogg, Foundations of potential theory, vol. 31, Springer Science & Business Media, 2012.
[44] D. Konen and D. Paindaveine, Spatial quantiles on the hypersphere, The Annals of Statistics, 51 (2023), pp. 2221–2245.
[45] S. Kunis and D. Potts, Fast spherical fourier algorithms, Journal of Computational and Applied Mathematics, 161 (2003), pp. 75–98.
[46] C. Ley, C. Sabbah, and T. Verdebout, A new concept of quantiles for directional data and the angular mahalanobis depth, Electronic Journal of Statistics [electronic only], 8 (2014).
[47] C. Ley and T. Verdebout, Modern directional statistics, CRC Press, 2017.
[48] , Applied directional statistics: modern methods and case studies, CRC Press, 2018.
[49] R. Y. Liu, On a notion of data depth based on random simplices, The Annals of Statistics, (1990), pp. 405–414.
[50] R. Y. Liu, J. M. Parelius, and K. Singh, Multivariate analysis by data depth: descriptive statistics, graphics and inference,(with discussion and a rejoinder by liu and singh), The annals of statistics, 27 (1999), pp. 783–858.
[51] R. Y. Liu and K. Singh, Ordering directional data: concepts of data depth on circles and spheres, The Annals of Statistics, 20 (1992), pp. 1468–1484.
[52] G. Loeper, Regularity of optimal maps on the sphere: The quadratic cost and the reflector antenna, Archive for rational mechanics and analysis, 199 (2011), pp. 269–289.
[53] G. Loeper and C. Villani, Regularity of optimal transport in curved geometry: The nonfocal case, Duke Mathematical Journal, 151 (2010), pp. 431 – 485.
[54] K. V. Mardia, P. E. Jupp, and K. Mardia, Directional statistics, vol. 2, Wiley Online Library, 2000.
[55] D. Marinucci, D. Pietrobon, A. Balbi, P. Baldi, P. Cabella, G. Kerkyacharian, P. Natoli, D. Picard, and N. Vittorio, Spherical needlets for cosmic microwave background data analysis, Monthly Notices of the Royal Astronomical Society, 383 (2008), pp. 539–545.
[56] S. B. Masud, M. Werenski, J. M. Murphy, and S. Aeron, Multivariate soft rank via entropic optimal transport: sample efficiency and generative modeling, Journal of Machine Learning Research, 24 (2023), pp. 1–65.
[57] R. J. McCann, Polar factorization of maps on riemannian manifolds, Geometric & Functional Analysis GAFA, 11 (2001), pp. 589–608.
[58] V. Michel, Lectures on constructive approximation: Fourier, spline, and wavelet methods on the real line, the sphere, and the ball, Springer Science & Business Media, 2012.
[59] N. Miolane, N. Guigui, A. L. Brigant, J. Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, H. Zaatiti, H. Hajri, Y. Cabanes, T. Gerald, P. Chauchat, C. Shewmake, D. Brooks, B. Kainz, C. Donnat, S. Holmes, and X. Pennec, Geomstats: A python package for riemannian geometry in machine learning, Journal of Machine Learning Research, 21 (2020), pp. 1–9.
[60] K. Mosler, Depth statistics, Robustness and complex data structures: Festschrift in Honour of Ursula Gather, (2013), pp. 17–34.
[61] S. Nagy, H. Demni, D. Buttarazzi, and G. C. Porzio, Theory of angular depth for classification of directional data, Advances in Data Analysis and Classification, (2023).
[62] S. Nagy and P. Laketa, Theoretical properties of angular halfspace depth, arXiv preprint arXiv:2402.08285, (2024).
[63] Z. Niu and B. B. Bhattacharya, Distribution-free joint independence testing and robust independent component analysis using optimal transport. arXiv, 2022.
[64] M. Nutz and J. Wiesel, Entropic optimal transport: convergence of potentials, Probability Theory and Related Fields, 184 (2022), pp. 401–424.
[65] G. Pandolfo and A. D’ambrosio, Clustering directional data through depth functions, Computational Statistics, 38 (2023), pp. 1487–1506.
[66] G. Pandolfo, D. Paindaveine, and G. C. Porzio, Distance-based depths for directional data, Canadian Journal of Statistics, 46 (2018), pp. 593–609.
[67] M. Pegoraro, S. Vedula, A. A. Rosenberg, I. Tallini, E. Rodolà, and A. Bronstein, Vector quantile regression on manifolds, in ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
[68] A. Pewsey and E. García-Portugués, Recent advances in directional statistics, Test, 30 (2021), pp. 1–58.
[69] A.-A. Pooladian and J. Niles-Weed, Entropic estimation of optimal transport maps, 2021.
[70] P. Rigollet and A. J. Stromme, On the sample complexity of entropic optimal transport. arXiv, 2022.
[71] P. J. Rousseeuw and A. Struyf, Characterizing angular symmetry and regression symmetry, Journal of Statistical Planning and Inference, 122 (2004), pp. 161–173.
[72] V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel, Large-Scale Optimal Transport and Map** Estimation, in ICLR 2018 - International Conference on Learning Representations, 2018, pp. 1–15.
[73] T. Sei, Gradient modeling for multivariate quantitative data, Annals of the Institute of Statistical Mathematics, 63 (2011), pp. 675–688.
[74] , A jacobian inequality for gradient maps on the sphere and its application to directional statistics, Communications in Statistics-Theory and Methods, 42 (2013), pp. 2525–2542.
[75] R. Serfling, Depth functions in nonparametric multivariate inference, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 72 (2006), p. 1.
[76] H. Shi, M. Drton, M. Hallin, and F. Han, Center-outward sign- and rank-based quadrant, spearman, and kendall tests for multivariate independence., working papers ecares, ULB – Universite Libre de Bruxelles, 2021.
[77] H. Shi, M. Drton, and F. Han, Distribution-free consistent independence tests via center-outward ranks and signs, Journal of the American Statistical Association, 117 (2022), pp. 395–410.
[78] C. G. Small, Measures of centrality for multivariate and directional distributions, Canadian Journal of Statistics, 15 (1987), pp. 31–39.
[79] S. Sommer, T. Fletcher, and X. Pennec, Introduction to differential and riemannian geometry, in Riemannian Geometric Statistics in Medical Image Analysis, Elsevier, 2020, pp. 3–37.
[80] A. Stromme, Sampling from a schrödinger bridge, in International Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 4058–4067.
[81] J. W. Tukey, Mathematics and the picturing of data, Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), 2 (1975), pp. 523–531.
[82] C. Villani, Topics in optimal transportation, vol. 58 of Graduate Studies in Mathematics, American Mathematical Society, 2003.
[83] G. T. von Nessi, On the regularity of optimal transportation potentials on round spheres, Acta applicandae mathematicae, 123 (2013), pp. 239–259.
[84] M. A. Wieczorek and M. Meschede, Shtools: Tools for working with spherical harmonics, Geochemistry, Geophysics, Geosystems, 19 (2018), pp. 2574–2592.
[85] Y. Zuo and R. Serfling, General notions of statistical depth function, The Annals of Statistics, (2000), pp. 461–482.