\pagerange

Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imagingC

Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging

Tobías I. Liaudat1,2,3 E-mail: [email protected]    Matthijs Mars2    Matthew A. Price2    Marcelo Pereyra4,5    Marta M. Betcke1    and \newauthorJason D. McEwen2,6
1Department of Computer Science
E-mail: [email protected]
   University College London (UCL)    London WC1E 6BT    UK
2Mullard Space Science Laboratory (MSSL)
   University College London (UCL)    Holmbury St Mary    Dorking    Surrey RH5 6NT    UK
3 IRFU
   CEA    Université Paris-Saclay    F-91191 Gif-sur-Yvette    France
4School of Mathematical and Computer Sciences
   Heriot-Watt University    Edinburgh    EH14 4AS    Scotland    UK
5Maxwell Institute for Mathematical Sciences
   Bayes Centre    47 Potterrow    Edinburgh    Scotland    UK
6Alan Turing Institute
   Euston Road    London NW1 2DB    UK
(Accepted —. Received —; in original form —; 2023)
Abstract

Next-generation radio interferometers like the Square Kilometer Array have the potential to unlock scientific discoveries thanks to their unprecedented angular resolution and sensitivity. One key to unlocking their potential resides in handling the deluge and complexity of incoming data. This challenge requires building radio interferometric (RI) imaging methods that can cope with the massive data sizes and provide high-quality image reconstructions with uncertainty quantification (UQ). This work proposes a method coined QuantifAI to address UQ in RI imaging with data-driven (learned) priors for high-dimensional settings. Our model, rooted in the Bayesian framework, uses a physically motivated model for the likelihood. The model exploits a data-driven convex prior potential, which can encode complex information learned implicitly from simulations and guarantee the log-concavity of the posterior. We leverage probability concentration phenomena of high-dimensional log-concave posteriors to obtain information about the posterior, avoiding MCMC sampling techniques. We rely on convex optimisation methods to compute the MAP estimation, which is known to be faster and better scale with dimension than MCMC strategies. QuantifAI allows us to compute local credible intervals and perform hypothesis testing of structure on the reconstructed image. We propose a novel fast method to compute pixel-wise uncertainties at different scales, which uses 3333 and 6666 orders of magnitude less likelihood evaluations than other UQ methods like length of the credible intervals and Monte Carlo posterior sampling, respectively. We demonstrate our method by reconstructing RI images in a simulated setting and carrying out fast and scalable UQ, which we validate with MCMC sampling. Our method shows an improved image quality and more meaningful uncertainties than the benchmark method based on a sparsity-promoting prior. QuantifAI’s source code is available from github.com/astro-informatics/QuantifAI.

keywords:
Machine Learning – Algorithms – Data methods

1 Introduction

Radio astronomy plays a crucial role in expanding our understanding of the Universe, offering a unique perspective on astrophysical and cosmological phenomena. Among the transformative tools in an astronomer’s toolkit, radio interferometric (RI) imaging stands out as an indispensable technique. Aperture synthesis and radio interferometry (Thompson et al., 2017) allow us to achieve high angular resolutions providing immense power to resolve objects. Furthermore, radio frequency signals are only weakly attenuated by our atmosphere, allowing for observations at the Earth’s surface. The unparalleled angular resolution, high sensitivity and the different phenomena emitting in the radio wavelength regime make RI an ideal candidate to better help us understand our Universe.

The advent of the Square Kilometre Array (SKA, Dewdney et al., 2009) heralds a new era in radio astronomy (Braun et al., 2015) spanning the study from the epoch of reionisation and fast radio bursts to galaxy evolution and dark energy. SKA’s vast collecting area and sensitivity promise to provide a leap forward in our observational capabilities, opening doors to discoveries. However, this transformative potential comes with the formidable computational challenge of processing and making sense of the unprecedented volume of SKA-generated data. Develo** and implementing algorithms that can efficiently handle SKA’s data deluge is a challenge. In addition, achieving the high reconstruction performance required to unlock SKA’s full potential is a significant obstacle in the SKA’s data processing requirements.

The aperture synthesis techniques in RI probe the sky by acquiring specific Fourier measurements, which results in incomplete coverage of the Fourier domain of the sky’s image of interest. The incomplete Fourier coverage makes the problem of estimating the underlying sky image, which we know as RI imaging, an ill-posed inverse problem, which is further complexified by the observational noise. Having a way to quantify the uncertainty in the image reconstructions becomes essential given the uncertainties involved in the RI imaging problem. To make scientifically sound inferences and informed decisions, we need the ability to quantify these uncertainties rigorously. This motivates the development of uncertainty quantification (UQ) methods tailored to the complexities of radio interferometric data, where scalability, i.e., the computational complexity with respect to the amount of data processed, and performance play a central role. We need to ensure that our reconstructions are not only insightful but also trustworthy.

In a nutshell, we want to develop RI imaging methods that can deliver precision with uncertainty quantification and that are highly scalable. Most existing methods only tackle some of these three requirements. The widely used CLEAN algorithm (Högbom, 1974) built its success on scalability and fast inference. CLEAN and its extensions (Cornwell, 2008; Offringa et al., 2014; Offringa & Smirnov, 2017) have been continuously used in many RI imaging pipelines since its inception. Despite offering limited imaging quality and reconstruction artefacts compared to other approaches, CLEAN stands out due to its scalability. More recent approaches leverage compressed sensing theory, relying on sparse priors (often in wavelet representations) and convex optimisation techniques (Wiaux et al., 2009; McEwen & Wiaux, 2011; Carrillo et al., 2012, 2014; Dabbech, A. et al., 2015; Dabbech et al., 2018; Pratley et al., 2017). These methods have been shown to improve the reconstruction quality at the expense of increased computational complexity. Considerable work has been directed to parallelisation and acceleration efforts for sparsity-based methods (Onose et al., 2016; Pratley et al., 2019b, a; Thouvenin et al., 2022a, b).

The deep learning revolution has introduced a powerful way to encode complex image priors in neural networks, which may be used to solve complex high-dimensional inverse problems. This data-driven or learned paradigm has gained much traction across imaging landscape problems, including RI imaging (Allam, 2016; Terris et al., 2022; Aghabiglou et al., 2023; Mars et al., 2023). Learned methods can improve the reconstruction quality with respect to handcrafted priors, such as sparsity-based wavelet priors, as well as provide acceleration (Terris et al., 2022; Mars et al., 2023, 2024; Aghabiglou et al., 2024) to convex optimisation-based methods.

Unfortunately, none of the RI imaging methods mentioned above, learned, sparsity-based or CLEAN-based, provide UQ tools for a given model. Cai et al. (2018a, b) proposed methods for Bayesian UQ on RI imaging problems. Cai et al. (2018a) leverages proximal MCMC methods (Pereyra, 2016) to provide support for sparsity-promoting priors. The proposed method allows them to reconstruct the image and provide UQ by sampling the posterior probability distribution. The drawback of the method is the high computational cost suffered by all MCMC sampling techniques. The companion paper, Cai et al. (2018b), overcomes the need for posterior sampling with maximum-a-posteriori (MAP) based UQ (Pereyra, 2017) relying on convex optimisation techniques. The second method (Cai et al., 2018b) provides a significant speed-up with respect to the sampling-based method (Cai et al., 2018a), but its reconstruction quality is limited to sparsity-promoting priors.

Other RI imaging methods have addressed UQ for their reconstructions. Dia et al. (2023) proposed to use score-based generative models as priors (Song et al., 2020; Ho et al., 2020), which by employing the convolved likelihood approximation (Remy et al., 2023; Adam et al., 2022) are able to sample from the posterior. However, the method is computationally costly, and the sampling relies on MCMC methods. The Bayesian RI imaging method comrade (Tiede, 2022) was developed for very-long-baseline interferometry (VLBI) aiming to image black holes and active galactic nuclei. The comrade method relies on MCMC sampling methods like nested sampling (Ashton et al., 2022) and Hamiltonian Monte Carlo (Xu et al., 2020) to sample from the posterior. Posterior sampling based on MCMC methods can be enough for VLBI but cannot yet cope with the dimensions of SKA-like images.

An alternative Bayesian RI imaging algorithm is the resolve method and its upgrades (Arras et al., 2018, 2021, 2022; Roth et al., 2023; Knollmüller et al., 2023), introduced initially in Junklewitz, H. et al. (2016), which can produce uncertainty maps from approximate posterior samples. The model is based on Gaussian random fields with different analytical priors used to promote certain aspects of the reconstruction, like positivity and spatial and temporal correlations. One novelty of the method is to approximate the true posterior probability distribution with variational inference and, therefore, be able to sample from the approximated distribution without resorting to MCMC methods that do not scale well in very high dimensions. The approximated posterior follows a multivariate Gaussian distribution, where their parameters are learnt by trying to maximise the information overlap between the true posterior and the model by minimising a Kullback-Leibler (KL-) divergence. Given that the full covariance matrix scales with the squared of the number of pixels, the authors exploit an approximation in the vicinity of the mean estimate. The method is based on the Metric Gaussian Variational Inference (MGVI) approach from Knollmüller & Enßlin (2019). resolve has been used for reconstructions with VLBI observations of M87* (Arras et al., 2022) and Sagittarius A* (Knollmüller et al., 2023) from the Event Horizon Telescope (EHT) (Event Horizon Telescope Collaboration et al., 2019a, b), as well as observations of Cygnus A from the Very Large Array (VLA) (Arras et al., 2018, 2021; Roth et al., 2023).

In this article, we delve into the forefront of RI imaging and propose a method coined QuantifAI, based on learned convex priors, capable of delivering high-quality reconstructions with uncertainty quantification and being highly scalable. The method relies on the mathematically principled Bayesian framework to provide an understanding of the uncertainties through the posterior distribution. By restricting our model to log-concave posteriors, we can exploit recent MAP-based UQ techniques (Pereyra, 2017), providing scalable optimisation-based UQ. We build upon recent advances in neural-network-based convex regularisers (Goujon et al., 2023b), allowing us to improve the reconstruction quality and obtain more meaningful uncertainties with respect to Cai et al. (2018b). On top of the hypothesis tests of structure on the reconstructed image, we propose a novel fast method to estimate pixel-wise uncertainties as a function of scale.

The remainder of this article is organised as follows. In Section 2, we start by reviewing the RI imaging and techniques for the resulting inverse problem. Section 3 describes QuantifAI, the proposed method, and the RI image reconstruction algorithm. In Section 4, we introduce the core of our scalable UQ and the different UQ techniques it allows. The experimental results, including the performance of QuantifAI reconstruction and its UQ techniques, are presented in Section 5. Section 6 presents experimental results using more realistic observation based on simulated ungridded visibility patterns from the MeerKAT radio telescope. In Section 7 we provide a discussion on some limitations and possible extensions of the proposed methodology. We provide concluding remarks and present some future perspectives in Section 8.

2 Radio interferometric imaging

In this section, we start by reviewing the RI imaging inverse problem and discuss approaches to tackle it, including sparsity-based regularisation, the CLEAN method, and learned approaches. We then introduce the Bayesian framework elements needed such as MAP estimation and proximal MCMC sampling algorithms that will be later used as validation.

2.1 Radio interferometry

The interferometric measurement equation for a radio telescope (Thompson et al., 2017) in the monochromatic setting relates our observations represented by the visibility function 𝒴𝒴\mathcal{Y}caligraphic_Y to the sky brightness 𝒳𝒳\mathcal{X}caligraphic_X, which we want to reconstruct,

𝒴(\displaystyle\mathcal{Y}(caligraphic_Y ( u,v,w)=𝒳(l,m)𝒜(l,m)1l2m2\displaystyle u,v,w)=\iint\frac{\mathcal{X}(l,m)\,\mathcal{A}(l,m)}{\sqrt{1-l^% {2}-m^{2}}}italic_u , italic_v , italic_w ) = ∬ divide start_ARG caligraphic_X ( italic_l , italic_m ) caligraphic_A ( italic_l , italic_m ) end_ARG start_ARG square-root start_ARG 1 - italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (1)
×exp[2πiw(1l2m21)]exp[2πi(lu+mv)]dldm,absent2𝜋i𝑤1superscript𝑙2superscript𝑚212𝜋i𝑙𝑢𝑚𝑣d𝑙d𝑚\displaystyle\times\exp\left[-2\pi\text{i}w\left(\sqrt{1-l^{2}-m^{2}}-1\right)% \right]\exp\left[-2\pi\text{i}\left(lu+mv\right)\right]\text{d}l\text{d}m\,,× roman_exp [ - 2 italic_π i italic_w ( square-root start_ARG 1 - italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 1 ) ] roman_exp [ - 2 italic_π i ( italic_l italic_u + italic_m italic_v ) ] d italic_l d italic_m ,

where 𝐮=(u,v,w)𝐮𝑢𝑣𝑤\mathbf{u}=(u,v,w)bold_u = ( italic_u , italic_v , italic_w ) are the interferometer baseline coordinates with units depending on the observation wavelength, 𝐥=(l,m,n)𝐥𝑙𝑚𝑛\mathbf{l}=(l,m,n)bold_l = ( italic_l , italic_m , italic_n ) are cosine sky coordinates restricted to the unit sphere, and 𝒜𝒜\mathcal{A}caligraphic_A includes direction-dependent effects (DDEs) like the primary beam of the dishes. The previous general model allows us to consider different DDEs through 𝒜𝒜\mathcal{A}caligraphic_A and non-coplanar effects through the exponential term in w𝑤witalic_w. These effects become considerable when considering wide fields of view and long baselines. There exists a rich body of literature incorporating such effects, e.g., Smirnov, O. M. (2011a, b, c, d); Thompson et al. (2017), and there are scalable algorithms that take them into account, e.g., Pratley et al. (2019b).

In this article, for the sake of simplicity but without loss of generality, we assume the coplanar setting, where the antennas are located in the same w𝑤witalic_w plane. We also assume that we observe a small field of view such that 1l2m211superscript𝑙2superscript𝑚211-l^{2}-m^{2}\approx 11 - italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≈ 1. Consequently, we have exp[2πiw(1l2m21)]12𝜋i𝑤1superscript𝑙2superscript𝑚211\exp\left[-2\pi\text{i}w\left(\sqrt{1-l^{2}-m^{2}}-1\right)\right]\approx 1roman_exp [ - 2 italic_π i italic_w ( square-root start_ARG 1 - italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 1 ) ] ≈ 1, and Equation 1 reduces to

𝒴(u,v)𝒳(l,m)𝒜(l,m)exp[2πi(lu+mv)]dldm.𝒴𝑢𝑣double-integral𝒳𝑙𝑚𝒜𝑙𝑚2𝜋i𝑙𝑢𝑚𝑣d𝑙d𝑚\mathcal{Y}(u,v)\approx\iint\mathcal{X}(l,m)\,\mathcal{A}(l,m)\exp\left[-2\pi% \text{i}\left(lu+mv\right)\right]\text{d}l\text{d}m\,.caligraphic_Y ( italic_u , italic_v ) ≈ ∬ caligraphic_X ( italic_l , italic_m ) caligraphic_A ( italic_l , italic_m ) roman_exp [ - 2 italic_π i ( italic_l italic_u + italic_m italic_v ) ] d italic_l d italic_m . (2)

From the previous equation, we can notice the remarkable result of 𝒴(u,v)=(𝒜𝒳)(u,v)𝒴𝑢𝑣𝒜𝒳𝑢𝑣\mathcal{Y}(u,v)=\mathcal{F}(\mathcal{A}\mathcal{X})(u,v)caligraphic_Y ( italic_u , italic_v ) = caligraphic_F ( caligraphic_A caligraphic_X ) ( italic_u , italic_v ) where \mathcal{F}caligraphic_F is the two-dimensional Fourier transform.

To further simplify the problem, we will avoid using the continuous 𝒴𝒴\mathcal{Y}caligraphic_Y and 𝒳𝒳\mathcal{X}caligraphic_X and work with their discrete counterparts, 𝒙𝒙\bm{x}bold_italic_x and 𝒚𝒚\bm{y}bold_italic_y, respectively. The observational model we study for our RI imaging problem writes

𝒚=𝚽𝒙+𝒏,𝒚𝚽𝒙𝒏\bm{y}=\mathbf{\Phi}\bm{x}+\bm{n},bold_italic_y = bold_Φ bold_italic_x + bold_italic_n , (3)

where 𝒚M𝒚superscript𝑀\bm{y}\in\mathbb{C}^{M}bold_italic_y ∈ roman_ℂ start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT are the M𝑀Mitalic_M observed complex visibilities, 𝒙N𝒙superscript𝑁\bm{x}\in\mathbb{R}^{N}bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the discrete sky brightness sampled on a N𝑁Nitalic_N point grid, and 𝚽M×N𝚽superscript𝑀𝑁\mathbf{\Phi}\in\mathbb{C}^{M\times N}bold_Φ ∈ roman_ℂ start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT is the linear measurement operator that models the acquisition process. Without loss of generality, the observational and instrumental noise 𝒏M𝒏superscript𝑀\bm{n}\in\mathbb{C}^{M}bold_italic_n ∈ roman_ℂ start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is assumed to be independent and identically distributed (iid) white Gaussian noise with zero mean and standard deviation σ𝜎\sigmaitalic_σ. If the noise is not white, we can incorporate a noise whitening matrix in the 𝚽𝚽\mathbf{\Phi}bold_Φ operator such that the previous white noise assumption holds.

Each pair of antennas provides us with one visibility, which is a noisy Fourier component of the intensity image. Using an array of n𝑛nitalic_n radio antennas allows us to sample (n2)=(n2n)/2binomial𝑛2superscript𝑛2𝑛2\binom{n}{2}=(n^{2}-n)/2( FRACOP start_ARG italic_n end_ARG start_ARG 2 end_ARG ) = ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_n ) / 2 points in the uv𝑢𝑣uvitalic_u italic_v-plane (or Fourier plane). The distribution of these points depends on the configuration of the radio antenna array. If different time intervals are considered, the Earth’s rotation can be exploited to increase the number of uv𝑢𝑣uvitalic_u italic_v points. The uv𝑢𝑣uvitalic_u italic_v coverage is incomplete in all practical cases, and the measurements are noisy. Therefore, the linear operator 𝚽𝚽\mathbf{\Phi}bold_Φ is ill-posed. If we also consider a large number of measurements, recovering 𝒙𝒙\bm{x}bold_italic_x from 𝒚𝒚\bm{y}bold_italic_y becomes a challenging inverse problem.

The most basic reconstruction of 𝒙𝒙\bm{x}bold_italic_x is often referred to as the naturally weighted dirty image that we will denote from now on as 𝒙^dirtysubscript^𝒙dirty\hat{\bm{x}}_{\text{dirty}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT dirty end_POSTSUBSCRIPT and call by dirty image as calibration weigths are not currently being taken into account in this work. This estimation is obtained by applying the adjoint of 𝚽𝚽\mathbf{\Phi}bold_Φ to the visibilities 𝒚𝒚\bm{y}bold_italic_y. To obtain a higher fidelity solution to the RI imaging inverse problem, we must regularise the problem by incorporating some prior information about the desired solutions 𝒙𝒙\bm{x}bold_italic_x. A broad range of methods can be characterized by what type of prior information is used to regularise the inverse problem and which algorithm is used to compute the reconstructed image 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG.

2.2 Sparsity-based regularisation

The last two decades have brought us a great number of RI imaging methods based on sparse representations. The prior information exploited is that the solution 𝒙𝒙\bm{x}bold_italic_x is known to be sparsely represented in some bases or dictionaries. The bases are often built using multi-scale wavelets, or a dictionary is constructed with a collection of wavelets (Mallat, 2008). We can represent our image 𝒙𝒙\bm{x}bold_italic_x in a dictionary 𝚿N×L𝚿superscript𝑁𝐿\mathbf{\Psi}\in\mathbb{C}^{N\times L}bold_Ψ ∈ roman_ℂ start_POSTSUPERSCRIPT italic_N × italic_L end_POSTSUPERSCRIPT,

𝒙=𝚿𝒂=i=1L𝚿iai,𝒙𝚿𝒂superscriptsubscript𝑖1𝐿subscript𝚿𝑖subscript𝑎𝑖\bm{x}=\mathbf{\Psi}\bm{a}=\sum_{i=1}^{L}\mathbf{\Psi}_{i}a_{i}\,,bold_italic_x = bold_Ψ bold_italic_a = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (4)

where 𝒂L𝒂superscript𝐿\bm{a}\in\mathbb{C}^{L}bold_italic_a ∈ roman_ℂ start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT is a vector of coefficients of 𝒙𝒙\bm{x}bold_italic_x weighting the corresponding dictionary atoms of 𝚿𝚿\mathbf{\Psi}bold_Ψ. The assumption made to regularise the inverse problem is that 𝒂𝒂\bm{a}bold_italic_a is sparse or compressible, meaning that most of the coefficients are zero-valued or near zero, respectively. An array 𝒂𝒂\bm{a}bold_italic_a is called k𝑘kitalic_k-sparse if it has only k𝑘kitalic_k non-zero elements, which can be written as 𝒂0=ksubscriptnorm𝒂0𝑘\|\bm{a}\|_{0}=k∥ bold_italic_a ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_k, where 0\|\cdot\|_{0}∥ ⋅ ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the 0subscript0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT pseudonorm.

Sparsity should be ideally enforced through the 0subscript0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT pseudonorm, which is non-convex. Consequently, a convex relaxation to the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm is used, which is a sparsity-promoting norm. The optimisation problem is formulated such that its solution coincides with the inverse problem solution. Therefore, the inverse problem can be tackled with an optimisation algorithm. The optimisation objective comprises two competing terms: (i) a data-fidelity term f()𝑓f(\cdot)italic_f ( ⋅ ) that promotes consistency with the observed visibilities and depends on the statistics of the noise 𝒏𝒏\bm{n}bold_italic_n; and (ii) a regularisation term r()𝑟r(\cdot)italic_r ( ⋅ ) that encodes our prior knowledge of 𝒙𝒙\bm{x}bold_italic_x. The optimisation problem reads

𝒙^=argmin𝒙Nf(𝒙)+kλkrk(𝒙),^𝒙subscriptargmin𝒙superscript𝑁𝑓𝒙subscript𝑘subscript𝜆𝑘subscript𝑟𝑘𝒙\hat{\bm{x}}=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}f(\bm{x})+\sum_{k% }\lambda_{k}r_{k}(\bm{x})\,,over^ start_ARG bold_italic_x end_ARG = roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_italic_x ) + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) , (5)

where we are using a sum of regularisation terms rksubscript𝑟𝑘r_{k}italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, each with its corresponding regularisation strength parameter λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Substituting the RI data fidelity and sparsity-enforcing regularisation in an overdetermined wavelet dictionary ΨΨ\Psiroman_Ψ terms into Equation 3 we obtain

𝒙^=argmin𝒙N12σ2𝒚Φ𝒙22+λΨ𝒙1,^𝒙subscriptargmin𝒙superscript𝑁12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22𝜆subscriptnormsuperscriptΨ𝒙1\hat{\bm{x}}=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}\frac{1}{2\sigma^% {2}}\left\|\bm{y}-\Phi\bm{x}\right\|_{2}^{2}+\lambda\big{\|}\Psi^{\dagger}\bm{% x}\big{\|}_{1}\,,over^ start_ARG bold_italic_x end_ARG = roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (6)

where σ𝜎\sigmaitalic_σ is the noise standard deviation.

The previous formulation is referred to as unconstrained. Other works consider the constrained formulation, which minimises the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT term with respect to a hard 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-ball constraint over 𝒚𝒚\bm{y}bold_italic_y with a radius of ϵitalic-ϵ\epsilonitalic_ϵ, which is related to the noise’s σ𝜎\sigmaitalic_σ (Carrillo et al., 2012; Pratley et al., 2017). In this article, we will focus on the unconstrained formulation as it has a natural Bayesian interpretation. Obtaining the solution from Equation 6 involves solving a convex optimisation problem, where we have the sum of a differentiable and a non-differentiable term. Proximal algorithms (Parikh & Boyd, 2014) are well suited to tackle such optimisation problems. Recent developments brought us a wide collection of proximal optimisation algorithms, such as the forward-backward (FB) algorithm (Combettes & Pesquet, 2009), the FISTA algorithm (Beck & Teboulle, 2009), the alternating direction method of multipliers (ADMM, Boyd et al., 2011), and the primal-dual forward-backward algorithm (Chambolle & Pock, 2011; Condat, 2013), to mention a few. A rich literature exists exploiting the aforementioned concepts to tackle the RI imaging problem (Wiaux et al., 2009; McEwen & Wiaux, 2011; Carrillo et al., 2012, 2014; Onose et al., 2016; Pratley et al., 2017, 2019b, 2019a; Pratley & McEwen, 2019; Cai et al., 2018b). For example, the Sparsity Averaging Reweighted Analysis (SARA) family of methods (Carrillo et al., 2012) use an over-complete dictionary composed of a concatenation of the Dirac basis and the first eight Daubechies wavelets (Daubechies, 1992) and has shown good performance in RI imaging.

2.3 CLEAN

Precursor of RI image reconstructions, the CLEAN algorithm (Högbom, 1974) is a highly successful RI imaging method and it is still being used (Event Horizon Telescope Collaboration et al., 2019a, b) despite various negative characteristics. The CLEAN algorithm is a non-linear iterative method that assumes a sparse sky model. CLEAN iteratively removes the contribution of the brightest source convolved with the instrument’s point spread function or dirty beam. This method can be interpreted as a matching pursuit algorithm (Wiaux et al., 2009), or an 0subscript0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT regularisation with a basis composed of a sum of Dirac spikes. Several extensions of CLEAN have been developed over time (Bhatnagar & Cornwell, 2004; Cornwell, 2008; Stewart et al., 2011; Offringa et al., 2014; Offringa & Smirnov, 2017) achieving better reconstruction performance. See Rau et al. (2009) for a review of CLEAN-based algorithms.

On top of a very early introduction, the success of CLEAN resides in its scalability. However, CLEAN has been shown to produce artefacts when point sources do not well describe the underlying sky model limiting CLEAN’s image quality and justifying the need for more advanced techniques, e.g. based on Section 2.2. CLEAN often requires manual intervention, making its use less practical. Furthermore, CLEAN and its extensions do not provide meaningful uncertainty quantification of its reconstruction.

2.4 Learned approaches

The advent of deep learning models has affected many imaging applications, and RI imaging is no exception. Handcrafted models and priors are limited in the information they can capture or represent with respect to recent, more expressive neural networks. Learned or data-driven methods can encode complex information existing in the data, e.g., astrophysical simulations, used in their training. In general, these approaches produce reconstructions with improved quality, a computational speed-up, or both. These reasons make learned approaches very relevant to the RI imaging reconstruction problem. However, there are issues regarding the robustness of learned methods to data distribution shifts (Hendrycks et al., 2020) and scalable methods for uncertainty quantification to the reconstruction.

Allam (2016) proposed a learned method for RI imaging based on convolutional neural networks (Dong et al., 2016) originally considered for super-resolution. The approach consists of learning to post-process dirty images with variants for both, known and unknown PSF. More recently, Gheller & Vazza (2021) proposed to use a convolutional denoising autoencoder to learn to post-process radio images, e.g., the dirty image or CLEAN’s output. Connor et al. (2022) proposed a residual deep neural network (DNN) coined POLISH that works as a learned post-processing and super-resolution network. The DNN is based on the architecture proposed in Yu et al. (2018) and takes as input dirty images at different wavelengths and resolutions. POLISH outputs a clean image at a higher resolution for each wavelength and shows a better reconstruction quality than CLEAN. The proposed method has been applied to simulations from the upcoming Deep Synoptic Array-2000 (Hallinan et al., 2019) and real data from the Very Large Array (VLA, Perley et al., 2011).

The Plug-and-Play (PnP) framework (Venkatakrishnan et al., 2013) provides a way to incorporate a deep learning model into a modern optimisation algorithm. The central idea is to replace a proximal regularisation term with a denoising deep neural network. Ryu et al. (2019) studied conditions for the convergence of PnP algorithms. Pesquet et al. (2021) proposed a new term for the denoiser’s training loss that enforces the firm nonexpansiveness of the denoiser, which is usually deep learning-based. This training procedure allows the denoiser to suit a PnP framework with theoretical convergence conditions. The PnP framework with the nonexpansiveness enforced to the deep learning-based denoiser has been applied to the RI imaging problem in Terris et al. (2022), where the approach has been called AIRI for Artificial Intelligence for Regularisation in RI imaging. The approach achieved similar or better performance than competing prior-based approaches whilst providing a significant acceleration potential. The AIRI method was later validated on observations from the Australian Square Kilometre Array Pathfinder (ASKAP, Wilber et al., 2023).

Two learned approaches for an interferometric-based imager named Segmented Planar Imaging Detector for Electro-Optical Reconnaissance (SPIDER) were proposed by Mars et al. (2023). The first approach consists of a learned post-processing step from the dirty reconstruction based on a convolutional U-Net architecture (Ronneberger et al., 2015). The second approach consists of a learned multiscale iterative method coined GU-Net, which incorporates the measurement operator to include measurement information at the different steps and scales of the method. GU-Net is more efficient than standard unrolling methods due to its multi-scale nature. The numerical results show an improved reconstruction quality and a faster convergence than proximal optimisation-based methods. In the following work (Mars et al., 2024), the GU-Net was applied to the RI imaging problem. The variations of the uv𝑢𝑣uvitalic_u italic_v-coverage are handled by training the neural network on a broad distribution of simulated uv𝑢𝑣uvitalic_u italic_v-coverages and subsequently fine-tuning the network for a specific sampling distribution.

Aghabiglou et al. (2023) recently proposed a series of DNNs that combines notions of PnP algorithms and unrolled optimisation methods (Adler & Öktem, 2018; Monga et al., 2019). Each DNN is trained to transform a back-projected residual into an image residual, thus ideally improving the reconstruction of the previous iteration. The results show a significant speed-up with respect to AIRI or SARA-based methods while maintaining a similar reconstruction quality. Other recent approaches based on deep neural networks include: Wang et al. (2023) who proposed a denoising diffusion probabilistic model conditioned on the visibilities and the dirty reconstruction; and Schmidt et al. (2022) who proposed a convolutional neural network based on residual blocks that intend to inpaint the measurements, or recover the entire uv𝑢𝑣uvitalic_u italic_v plane from an incomplete coverage.

2.5 Bayesian framework

Bayesian inference provides a principled statistical framework to solve the inverse problem in Equation 3 with statistical guarantees. This framework builds upon Bayes’ famous theorem,

p(𝒙|𝒚)Posterior=p(𝒚|𝒙)Likelihoodp(𝒙)PriorNp(𝒚|𝒙)p(𝒙)d𝒙Bayesian evidence.subscript𝑝conditional𝒙𝒚Posteriorsuperscript𝑝conditional𝒚𝒙Likelihoodsuperscript𝑝𝒙Priorsubscriptsubscriptsuperscript𝑁𝑝conditional𝒚𝒙𝑝𝒙differential-d𝒙Bayesian evidence\underbrace{p(\bm{x}|\bm{y})}_{\text{Posterior}}=\frac{\overbrace{p(\bm{y}|\bm% {x})}^{\text{Likelihood}}\overbrace{p(\bm{x})}^{\text{Prior}}}{\underbrace{% \int_{\mathbb{R}^{N}}p(\bm{y}|\bm{x})p(\bm{x}){\rm d}\bm{x}}_{\text{Bayesian % evidence}}}\,.under⏟ start_ARG italic_p ( bold_italic_x | bold_italic_y ) end_ARG start_POSTSUBSCRIPT Posterior end_POSTSUBSCRIPT = divide start_ARG over⏞ start_ARG italic_p ( bold_italic_y | bold_italic_x ) end_ARG start_POSTSUPERSCRIPT Likelihood end_POSTSUPERSCRIPT over⏞ start_ARG italic_p ( bold_italic_x ) end_ARG start_POSTSUPERSCRIPT Prior end_POSTSUPERSCRIPT end_ARG start_ARG under⏟ start_ARG ∫ start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_y | bold_italic_x ) italic_p ( bold_italic_x ) roman_d bold_italic_x end_ARG start_POSTSUBSCRIPT Bayesian evidence end_POSTSUBSCRIPT end_ARG . (7)

Bayes’ theorem relates the posterior distribution to the likelihood and prior terms that are the main constituents of a Bayesian model. The likelihood is associated with the data-fidelity term depending on the observational model and the noise statistics. The prior models expected properties of the solution 𝒙𝒙\bm{x}bold_italic_x, for example, smoothness, and piecewise regularity. This prior knowledge regularises the estimation problem.

The term in the denominator, commonly known as the Bayesian evidence, does not depend on 𝒙𝒙\bm{x}bold_italic_x as we are marginalising over that variable, and it describes the likelihood of the observed data based on the modelling assumptions. The Bayesian evidence is crucial for making Bayesian model comparison (Robert, 2007), which provides us with a consistent way to compare models. Such high dimensional integrals can be effectively estimated by, for example, nested sampling techniques (Skilling, 2006; Ashton et al., 2022), or the recently introduced learned harmonic mean estimator (McEwen et al., 2021; Spurio Mancini et al., 2022; Polanska et al., 2023). Recent developments have focused on nested sampling to compute the model evidence in high-dimensional imaging problems with sparsity-based handcrafted priors (Cai et al., 2022) and deep learning-based priors (McEwen et al., 2023). Carrying out model selection is out of the scope of this work.

Under the Bayesian framework, we have the posterior distribution p(𝒙|𝒚)𝑝conditional𝒙𝒚p(\bm{x}|\bm{y})italic_p ( bold_italic_x | bold_italic_y ) which assigns a probability to each possible solution 𝒙𝒙\bm{x}bold_italic_x given some observations 𝒚𝒚\bm{y}bold_italic_y and a model \mathcal{M}caligraphic_M consisting of the likelihood and prior terms. In imaging settings, explaining the information contained in the posterior distribution is not trivial due to its high-dimensional nature. The posterior distribution is generally characterised by samples computed by MCMC sampling. Efficiently sampling from high-dimensional posterior distributions is a current research topic, see e.g. Klatzer et al. (2023). Once p(𝒙|𝒚)𝑝conditional𝒙𝒚p(\bm{x}|\bm{y})italic_p ( bold_italic_x | bold_italic_y ) is defined, we can say that the reconstruction method will be a point estimator of the posterior that will provide us with 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG. There are several choices for point estimators (Robert, 2007; Arridge et al., 2019), each with advantages and drawbacks. Some examples are a sample from the posterior, 𝒙^p(𝒙|𝒚)similar-to^𝒙𝑝conditional𝒙𝒚\hat{\bm{x}}\sim p(\bm{x}|\bm{y})over^ start_ARG bold_italic_x end_ARG ∼ italic_p ( bold_italic_x | bold_italic_y ), the maximum-a-posteriori estimator, 𝒙^=argmax𝒙p(𝒙|𝒚)^𝒙subscriptargmax𝒙𝑝conditional𝒙𝒚\hat{\bm{x}}=\operatorname*{argmax}_{\bm{x}}p(\bm{x}|\bm{y})over^ start_ARG bold_italic_x end_ARG = roman_argmax start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_p ( bold_italic_x | bold_italic_y ), or the posterior mean, 𝒙^=𝔼[𝒙|𝒚]^𝒙𝔼delimited-[]conditional𝒙𝒚\hat{\bm{x}}=\mathbb{E}[\bm{x}|\bm{y}]over^ start_ARG bold_italic_x end_ARG = roman_𝔼 [ bold_italic_x | bold_italic_y ].

The posterior also provides us with consistent ways of quantifying the uncertainty of the chosen point estimate or reconstruction (Robert, 2007). For example, one way to represent uncertainty is to compute the posterior standard deviation. The pixels with a higher standard deviation are less constrained by the data and the prior allowing for more significant fluctuations.

One of the most significant drawbacks of Bayesian imaging methods is that they are known to be computationally expensive, even if there is a continuous effort targetting the scalability of these methods (Pereyra, 2016; Durmus et al., 2018; Pereyra et al., 2020, 2022; Klatzer et al., 2023).

2.5.1 Maximum-a-posteriori estimation

The MAP estimator is particularly interesting in high-dimensional problems like RI imaging as its formulation allows us to bypass the need for sampling from the posterior. Consequently, its computational footprint is significantly reduced. The likelihood and prior terms can be rewritten as p(𝒚|𝒙)=exp[f(𝒙,𝒚)]𝑝conditional𝒚𝒙𝑓𝒙𝒚p(\bm{y}|\bm{x})=\exp[-f(\bm{x},\bm{y})]italic_p ( bold_italic_y | bold_italic_x ) = roman_exp [ - italic_f ( bold_italic_x , bold_italic_y ) ] and p(𝒙)=exp[g(𝒙)]𝑝𝒙𝑔𝒙p(\bm{x})=\exp[-g(\bm{x})]italic_p ( bold_italic_x ) = roman_exp [ - italic_g ( bold_italic_x ) ], respectively. The functions f𝑓fitalic_f and g𝑔gitalic_g are the likelihood and prior potentials. Using Bayes’ theorem in Equation 7, we can rewrite the MAP estimation as follows

𝒙^MAP=argmax𝒙Np(𝒙|𝒚)=argmax𝒙Np(𝒚|𝒙)p(𝒙).subscript^𝒙MAPsubscriptargmax𝒙superscript𝑁𝑝conditional𝒙𝒚subscriptargmax𝒙superscript𝑁𝑝conditional𝒚𝒙𝑝𝒙\hat{\bm{x}}_{\text{MAP}}=\operatorname*{argmax}_{\bm{x}\in\mathbb{R}^{N}}p(% \bm{x}|\bm{y})=\operatorname*{argmax}_{\bm{x}\in\mathbb{R}^{N}}p(\bm{y}|\bm{x}% )p(\bm{x})\,.over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = roman_argmax start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_x | bold_italic_y ) = roman_argmax start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_y | bold_italic_x ) italic_p ( bold_italic_x ) . (8)

The previous optimisation problem can be reformulated using the monotonicity of the logarithm as follows

𝒙^MAPsubscript^𝒙MAP\displaystyle\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT =argmin𝒙Nlogp(𝒚|𝒙)logp(𝒙)absentsubscriptargmin𝒙superscript𝑁𝑝conditional𝒚𝒙𝑝𝒙\displaystyle=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}-\log p(\bm{y}|% \bm{x})-\log p(\bm{x})= roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - roman_log italic_p ( bold_italic_y | bold_italic_x ) - roman_log italic_p ( bold_italic_x )
=argmin𝒙Nf(𝒙,𝒚)+g(𝒙).absentsubscriptargmin𝒙superscript𝑁𝑓𝒙𝒚𝑔𝒙\displaystyle=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}f(\bm{x},\bm{y})% +g(\bm{x})\,.= roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_y ) + italic_g ( bold_italic_x ) . (9)

One advantage of the MAP estimator is that Equation 9 can be tackled efficiently with optimisation algorithms. We refer the reader to Pereyra (2019) for a deeper analysis of MAP estimation.

Coming back to the RI imaging inverse problem from Equation 3, we can define a (white) Gaussian likelihood,

p(𝒚|𝒙)exp[12σ2𝒚Φ𝒙22]proportional-to𝑝conditional𝒚𝒙12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22p(\bm{y}|\bm{x})\propto\exp\left[-\frac{1}{2\sigma^{2}}\left\|\bm{y}-\Phi\bm{x% }\right\|_{2}^{2}\right]\,italic_p ( bold_italic_y | bold_italic_x ) ∝ roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (10)

and a sparsity-inducing Laplace-type prior defined as

p(𝒙)exp[λΨ𝒙1].proportional-to𝑝𝒙𝜆subscriptnormsuperscriptΨ𝒙1p(\bm{x})\propto\exp\left[-\lambda\big{\|}\Psi^{\dagger}\bm{x}\big{\|}_{1}% \right]\,.italic_p ( bold_italic_x ) ∝ roman_exp [ - italic_λ ∥ roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] . (11)

Upon substitution of Equations 10 and 11 into Equation 9, the MAP optimisation problem coincides with the one in Equation 6. Therefore, the MAP reconstruction, 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT, matches 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG from Equation 6. Hence, sparsity-based approaches are MAP estimations with a prior based on the sparsity-promoting 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm in a given dictionary, e.g., wavelets.

2.5.2 Uncertainty quantification: more than a point estimate

Computing a good reconstruction for an inverse problem in the form of Equation 3 can itself be challenging. Moreover, the reconstruction is often insufficient for many scientific applications that require further quantification of the result. This demand opens the door to uncertainty quantification, which provides more than a point estimate. The Bayesian framework provides us with formidable tools to do uncertainty quantification. For example, if we choose the MAP estimator as our reconstruction following the model in Section 2.5.1, we obtain the same reconstruction as in Section 2.2, which is the solution of Equation 6. However, with the Bayesian framework, we can sample from the posterior and estimate the posterior standard deviation, perform a Bayesian hypothesis test of some image structure (Cai et al., 2018a; Price et al., 2021b), or compute other pixel-wise uncertainty measurements like local credible intervals (LCI, Cai et al., 2018a; Price et al., 2019).

2.5.3 Bayesian inference via MCMC sampling

Recent developments (Durmus et al., 2022) have considerably reduced the computational complexity of sampling high-dimensional posterior distributions in imaging inverse problems. Proximal MCMC sampling algorithms (Pereyra, 2016; Durmus et al., 2018) extend the class of posterior distributions that can be studied by allowing the use of non-smooth terms. Sparse regularisers have been widely used in RI imaging (Carrillo et al., 2012, 2014; Pratley et al., 2017; Cai et al., 2018a), and are usually enforced through a non-smooth 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT term.

Let us note π𝜋\piitalic_π the target probability distribution that we are interested in sampling from, which in our case will be the posterior p(𝒙|𝒚)𝑝conditional𝒙𝒚p(\bm{x}|\bm{y})italic_p ( bold_italic_x | bold_italic_y ). We consider a Langevin diffusion process on Nsuperscript𝑁\mathbb{R}^{N}roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT such that its stationary distribution is π𝜋\piitalic_π. Assuming that π𝒞1𝜋superscript𝒞1\pi\in\mathcal{C}^{1}italic_π ∈ caligraphic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT with Lipschitz gradients, we write the Langevin diffusion as the following stochastic process

dL(t)=12logπ[L(t)]dt+dW(t),L(0)=l0,formulae-sequence𝑑𝐿𝑡12𝜋delimited-[]𝐿𝑡𝑑𝑡𝑑𝑊𝑡𝐿0subscript𝑙0dL(t)=\frac{1}{2}\nabla\log\pi[L(t)]dt+dW(t),\quad L(0)=l_{0}\,,italic_d italic_L ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∇ roman_log italic_π [ italic_L ( italic_t ) ] italic_d italic_t + italic_d italic_W ( italic_t ) , italic_L ( 0 ) = italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (12)

where W𝑊Witalic_W is a N𝑁Nitalic_N-dimensional Brownian motion. A usual discrete-time approximation of the Langevin diffusion consists of a forward Euler approximation with a step size δ𝛿\deltaitalic_δ, known as the Euler-Maruyama approximation (Kloeden & Platen, 2011). The resulting algorithm is known as the unadjusted Langevin algorithm (ULA),

𝒍(m+1)=𝒍(m)+δlogπ[𝒍(m)]+2δ𝒘(m+1),superscript𝒍𝑚1superscript𝒍𝑚𝛿𝜋delimited-[]superscript𝒍𝑚2𝛿superscript𝒘𝑚1\bm{l}^{(m+1)}=\bm{l}^{(m)}+\delta\nabla\log\pi[\bm{l}^{(m)}]+\sqrt{2\delta}% \bm{w}^{(m+1)}\,,bold_italic_l start_POSTSUPERSCRIPT ( italic_m + 1 ) end_POSTSUPERSCRIPT = bold_italic_l start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT + italic_δ ∇ roman_log italic_π [ bold_italic_l start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ] + square-root start_ARG 2 italic_δ end_ARG bold_italic_w start_POSTSUPERSCRIPT ( italic_m + 1 ) end_POSTSUPERSCRIPT , (13)

where 𝒘(m+1)𝒩(0,𝐈N)similar-tosuperscript𝒘𝑚1𝒩0subscript𝐈𝑁\bm{w}^{(m+1)}\sim\mathcal{N}(0,\mathbf{I}_{N})bold_italic_w start_POSTSUPERSCRIPT ( italic_m + 1 ) end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) is the discrete counterpart of W(t)𝑊𝑡W(t)italic_W ( italic_t ). The ULA-based Markov chain converges to π𝜋\piitalic_π with an asymptotic bias due to discretisation. The bias can be accounted for with a subsequent Metropolis-Hasting (MH) accept-reject step. Adding the MH step corrects the bias but increases the algorithm’s computational complexity. The ULA algorithm with the subsequent MH step is known as the Metropolis-adjusted Langevin algorithm (MALA).

The ULA algorithm requires the target density π𝜋\piitalic_π to be continuously differentiable with Lipschitz gradients. Let us now consider π(𝒙)exp[f(𝒙)g(𝒙)]proportional-to𝜋𝒙𝑓𝒙𝑔𝒙\pi(\bm{x})\propto\exp[-f(\bm{x})-g(\bm{x})]italic_π ( bold_italic_x ) ∝ roman_exp [ - italic_f ( bold_italic_x ) - italic_g ( bold_italic_x ) ], where f𝒞1𝑓superscript𝒞1f\in\mathcal{C}^{1}italic_f ∈ caligraphic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT with Lipschitz gradient and g𝑔gitalic_g is non-smooth but is a lower semicontinuous convex function that admits a proximal operator (Parikh & Boyd, 2014). Proximal MCMC algorithms (Pereyra, 2016) relax this assumption by approximating g𝑔gitalic_g, a non-smooth term in π𝜋\piitalic_π, by its Moreau-Yosida envelope gγsuperscript𝑔𝛾g^{\gamma}italic_g start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. The Moreau-Yosida approximation satisfies

gγ(𝒙)=1γ(𝒙proxγg(𝒙)),superscript𝑔𝛾𝒙1𝛾𝒙superscriptsubscriptprox𝛾𝑔𝒙\displaystyle\nabla g^{\gamma}(\bm{x})=\frac{1}{\gamma}\left(\bm{x}-\text{prox% }_{\gamma}^{g}\left(\bm{x}\right)\right)\,,∇ italic_g start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( bold_italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_italic_x - prox start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_italic_x ) ) , (14)
proxγg(𝒙):=argmin𝒖N{g(𝒖)+12γ𝒖𝒙22},assignsuperscriptsubscriptprox𝛾𝑔𝒙subscriptargmin𝒖superscript𝑁𝑔𝒖12𝛾subscriptsuperscriptnorm𝒖𝒙22\displaystyle\text{prox}_{\gamma}^{g}(\bm{x}):=\operatorname*{argmin}_{\bm{u}% \in\mathbb{R}^{N}}\left\{g(\bm{u})+\frac{1}{2\gamma}\left\|\bm{u}-\bm{x}\right% \|^{2}_{2}\right\}\,,prox start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_italic_x ) := roman_argmin start_POSTSUBSCRIPT bold_italic_u ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_g ( bold_italic_u ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_italic_u - bold_italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , (15)

where γ𝛾\gammaitalic_γ is the Moreau-Yosida approximation parameter and the proximal operator may or may not have a closed-form expression. Consequently, the non-smooth target density π𝜋\piitalic_π is approximated by the smooth πγsubscript𝜋𝛾\pi_{\gamma}italic_π start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, which replaces the g𝑔gitalic_g term with its Moreau-Yosida approximation gγsuperscript𝑔𝛾g^{\gamma}italic_g start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. The Markov chain targeting πγsubscript𝜋𝛾\pi_{\gamma}italic_π start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT writes

𝒍(m+1)=(1δγ)𝒍(m)+δγproxγgsuperscript𝒍𝑚11𝛿𝛾superscript𝒍𝑚𝛿𝛾superscriptsubscriptprox𝛾𝑔\displaystyle\bm{l}^{(m+1)}=\left(1-\frac{\delta}{\gamma}\right)\bm{l}^{(m)}+% \frac{\delta}{\gamma}\,\text{prox}_{\gamma}^{g}bold_italic_l start_POSTSUPERSCRIPT ( italic_m + 1 ) end_POSTSUPERSCRIPT = ( 1 - divide start_ARG italic_δ end_ARG start_ARG italic_γ end_ARG ) bold_italic_l start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT + divide start_ARG italic_δ end_ARG start_ARG italic_γ end_ARG prox start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT (𝒍(m))superscript𝒍𝑚\displaystyle\left(\bm{l}^{(m)}\right)( bold_italic_l start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) (16)
δf(𝒍(m))+2δ𝒘(m+1),𝛿𝑓superscript𝒍𝑚2𝛿superscript𝒘𝑚1\displaystyle-\delta\nabla f\left(\bm{l}^{(m)}\right)+\sqrt{2\delta}\bm{w}^{(m% +1)}\,,- italic_δ ∇ italic_f ( bold_italic_l start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_δ end_ARG bold_italic_w start_POSTSUPERSCRIPT ( italic_m + 1 ) end_POSTSUPERSCRIPT ,

and it is known as Moreau-Yosida regularised ULA (MYULA). If we add an MH step targetting the non-differentiable distribution π𝜋\piitalic_π, the MCMC algorithm is known as Proximal MALA (Px-MALA). The proximal MCMC algorithms previously mentioned can be further accelerated by replacing the Euler-Maruyama approximation with the more involved Runge-Kutta-Chebyshev approximation (Abdulle et al., 2018), giving rise to the SK-ROCK (Pereyra et al., 2020) algorithm.

Cai et al. (2018a) exploited the MYULA and Px-MALA algorithms to sample from the posterior in the RI imaging problem. The model is based on a Gaussian likelihood as in Equation 10 and a sparsity promoting prior akin to Equation 11. However, the framework can be used with more complex noise models (Melidonis et al., 2023), e.g. Poisson noise. In Cai et al. (2018a), the RI image reconstruction relies on the minimum mean squared error (MMSE) estimator based on the posterior mean, while in Cai et al. (2018b), the MAP is considered.

3 Scalable Bayesian data-driven imaging with uncertainty quantification

QuantifAI111Code available at github.com/astro-informatics/QuantifAI, a scalable Bayesian data-driven method with uncertainty quantification is motivated by three principles:

  • 1.

    Scalability: The RI imaging inverse problem demands scalability for a method to be useful in real astronomical data scenarios such as SKA. The most time-consuming operation is evaluating the measurement operator ΦΦ\Phiroman_Φ in the likelihood function. It is, therefore, essential to minimise the number of likelihood evaluations. For these reasons, we limit ourselves to the MAP estimator for our reconstruction corresponding to the solution of a convex optimization problem which converges quickly. We need to avoid sampling-based approaches as they are prohibitively expensive in terms of computations.

  • 2.

    High-quality reconstructions: To improve the quality of our reconstruction, we consider data-driven or learned priors that can better encode the expected image structures. In Section 2.4, we have already seen that data-driven approaches can better represent complex imaging priors and provide reconstructions superior to handcrafted priors, such as sparsity-promoting priors based on wavelet dictionaries.

  • 3.

    Uncertainty quantification: There are many ways to quantify uncertainty based on sampling the posterior distribution. However, using sampling-based methods is prohibitively expensive, and one of our key criteria is computational scalability. Therefore, we need to restrict ourselves to log-concave posteriors, which is equivalent to saying that the addition of our potentials f+g𝑓𝑔f+gitalic_f + italic_g has to be convex, and to explicit potentials. As we will later describe in more detail in Section 4, the first restriction enables the use of efficient methods relying on the concentration of probability for high-dimensional log-concave distribution (Pereyra, 2017). Consequently, we can use approximate posterior information bypassing sampling methods. These methods are orders of magnitude faster resulting in a scalable Bayesian UQ method. In a nutshell, we require the posterior potential to be convex and explicit for scalable UQ. The likelihood is typically convex for RI imaging problems so we will enforce the prior potential g𝑔gitalic_g to be convex and explicit. The requirement of explicit potentials will be explained in Section 4.

We continue by introducing the data-driven convex regularisers and the optimisation algorithm used to compute the MAP estimation for the proposed method.

3.1 Learned convex regularisers

As stated before, we need an expressive regulariser that is convex and has an explicit potential. Modern regularisers based on deep neural networks, like convolutional neural networks, used in RI imaging reconstruction methods satisfy neither of the two constraints. This last constraint, i.e., with an explicit potential required by the UQ approach, excludes a range of denoisers whose potentials are defined implicitly. PnP approaches (Terris et al., 2022) only require the denoising of the image without explicitly computing the regularisation potential. For example, a typical iteration from a PnP algorithm writes

𝒙k+1=D(𝒙kγf(𝒙k)),(k),subscript𝒙𝑘1Dsubscript𝒙𝑘𝛾𝑓subscript𝒙𝑘for-all𝑘\bm{x}_{k+1}=\text{D}(\bm{x}_{k}-\gamma\nabla f(\bm{x}_{k}))\,,\quad(\forall k% \in\mathbb{N})\,,bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_γ ∇ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , ( ∀ italic_k ∈ roman_ℕ ) , (17)

where D is the denoiser, f𝑓fitalic_f is the data-fidelity term, γ𝛾\gammaitalic_γ is the step size, and k𝑘kitalic_k is the iteration number. The algorithm’s convergence can be assured if D and the stepsize satisfies some conditions (Pesquet et al., 2021; Ryu et al., 2019). Even if the denoiser D is convex, we cannot use it for our approach as we must evaluate the potential.

Mukherjee et al. (2020) proposed a learned convex regulariser parametrised by the architecture of a deep input-convex neural network (ICNN, Amos et al., 2016), which is convex by construction. The training of the regulariser is done with an adversarial framework introduced by Lunz et al. (2018).

Very recently, a learnable convex-ridge regulariser neural network (CRR-NN222https://github.com/axgoujon/convex_ridge_regularizers, Goujon et al., 2023b) has been proposed, which comes with the required properties of being convex and having an explicit potential. In addition, the model focuses on being reliable and interpretable while still being expressive enough to provide excellent reconstruction quality. The CRR-NN regulariser, R𝜽subscript𝑅𝜽R_{\bm{\theta}}italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, has the form

R𝜽:N,R𝜽(𝒙)=i=1NCkψi((𝒉i𝒙)[k]),:subscript𝑅𝜽formulae-sequencemaps-tosuperscript𝑁subscript𝑅𝜽𝒙superscriptsubscript𝑖1subscript𝑁Csubscript𝑘subscript𝜓𝑖subscript𝒉𝑖𝒙delimited-[]𝑘R_{\bm{\theta}}:\mathbb{R}^{N}\mapsto\mathbb{R},\quad R_{\bm{\theta}}(\bm{x})=% \sum_{i=1}^{N_{\text{C}}}\sum_{k}\psi_{i}\left(\left(\bm{h}_{i}\ast\bm{x}% \right)[k]\right),italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT : roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ↦ roman_ℝ , italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∗ bold_italic_x ) [ italic_k ] ) , (18)

where 𝒉nsubscript𝒉𝑛\bm{h}_{n}bold_italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are learnable 2D convolution kernels, (𝒉i𝒙)[k]subscript𝒉𝑖𝒙delimited-[]𝑘(\bm{h}_{i}\ast\bm{x})[k]( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∗ bold_italic_x ) [ italic_k ] denotes the k𝑘kitalic_k-th pixel of the resulting convolution, NCsubscript𝑁CN_{\text{C}}italic_N start_POSTSUBSCRIPT C end_POSTSUBSCRIPT is the number of channels or kernels, ψi::subscript𝜓𝑖maps-to\psi_{i}:\mathbb{R}\mapsto\mathbb{R}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_ℝ ↦ roman_ℝ are learnable non-linear convex profile functions with a Lipschitz continuous derivative, i.e., ψiC1,1()subscript𝜓𝑖superscript𝐶11\psi_{i}\in C^{1,1}(\mathbb{R})italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_C start_POSTSUPERSCRIPT 1 , 1 end_POSTSUPERSCRIPT ( roman_ℝ ), and 𝜽𝜽\bm{\theta}bold_italic_θ in R𝜽subscript𝑅𝜽R_{\bm{\theta}}italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT represents all learnable parameters. The convexity constraint on the learnable activation functions, ψisubscript𝜓𝑖\psi_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, is enforced by making the pointwise σi::subscript𝜎𝑖\sigma_{i}:\mathbb{R}\to\mathbb{R}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_ℝ → roman_ℝ monotonically increasing, with ψi=σisuperscriptsubscript𝜓𝑖subscript𝜎𝑖\psi_{i}^{\prime}=\sigma_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where σiC0,1()subscript𝜎𝑖subscriptsuperscript𝐶01\sigma_{i}\in C^{0,1}_{\uparrow}(\mathbb{R})italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_C start_POSTSUPERSCRIPT 0 , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT ( roman_ℝ ), and C0,1()subscriptsuperscript𝐶01C^{0,1}_{\uparrow}(\mathbb{R})italic_C start_POSTSUPERSCRIPT 0 , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT ( roman_ℝ ) is the set of scalar Lipschitz continuous and increasing functions on \mathbb{R}roman_ℝ. The σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT functions are chosen as learnable linear splines, which have been shown to be more expressive than ReLU functions in Propositions 3.3 and 3.5 from Goujon et al. (2023b, §III.B). The main difference between a prior based on the CRR-NN and a wavelet dictionary is that the kernels (or filters) and the activation (or thresholding) functions are learnt in the first one. In the second one, they are fixed or handcrafted. We refer the reader to Goujon et al. (2023b, Figs. 5 and 6) for examples of learned kernels 𝒉nsubscript𝒉𝑛\bm{h}_{n}bold_italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and activation functions for two trainings with two different noise levels. See Goujon et al. (2023b); Bohra et al. (2020) for more information on learnable splines.

In the spirit of PnP approaches, the CRR-NN training is based on the denoising problem that reads

𝒙=argmin𝒙N12𝒙𝒚22+λR𝜽(𝒙),superscript𝒙subscriptargmin𝒙superscript𝑁12superscriptsubscriptnorm𝒙𝒚22𝜆subscript𝑅𝜽𝒙\bm{x}^{\ast}=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}\frac{1}{2}\|\bm% {x}-\bm{y}\|_{2}^{2}+\lambda R_{\bm{\theta}}(\bm{x})\,,bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) , (19)

where 𝒚𝒚\bm{y}bold_italic_y is a noisy version of 𝒙𝒙\bm{x}bold_italic_x, and λ𝜆\lambdaitalic_λ is a parameter controlling the regularisation strength. The denoising problem is addressed through the fixed point of the problem, which given the convexity assumptions, is unique. A gradient step of Equation 19 reads

TR𝜽,λ,α(𝒙)=𝒙α((𝒙𝒚)+λR𝜽(𝒙)),subscript𝑇subscript𝑅𝜽𝜆𝛼𝒙𝒙𝛼𝒙𝒚𝜆subscript𝑅𝜽𝒙T_{R_{\bm{\theta}},\lambda,\alpha}(\bm{x})=\bm{x}-\alpha((\bm{x}-\bm{y})+% \lambda\nabla R_{\bm{\theta}}(\bm{x}))\,,italic_T start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_λ , italic_α end_POSTSUBSCRIPT ( bold_italic_x ) = bold_italic_x - italic_α ( ( bold_italic_x - bold_italic_y ) + italic_λ ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) ) , (20)

where α𝛼\alphaitalic_α is the stepsize. Convergence can be guaranteed if the stepsize satisfies α(0,2/(1+λLip(R𝜽)))𝛼021𝜆Lipsubscript𝑅𝜽\alpha\in(0,2/(1+\lambda\,\text{Lip}(\nabla R_{\bm{\theta}})))italic_α ∈ ( 0 , 2 / ( 1 + italic_λ Lip ( ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ) ), where Lip()Lip\text{Lip}(\cdot)Lip ( ⋅ ) denotes the Lipschitz constant. By composing t𝑡titalic_t gradient descent updates of Equation 20, i.e., a t𝑡titalic_t-fold composition, we obtain a multi-gradient step denoiser that we denote TR𝜽,λ,αtsuperscriptsubscript𝑇subscript𝑅𝜽𝜆𝛼𝑡T_{R_{\bm{\theta}},\lambda,\alpha}^{t}italic_T start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_λ , italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT following the notation of Goujon et al. (2023b).

The denoising problem in Equation 19 can be formulated as a fix point problem for the t𝑡titalic_t-step denoiser TR𝜽,λ,αtsuperscriptsubscript𝑇subscript𝑅𝜽𝜆𝛼𝑡T_{R_{\bm{\theta}},\lambda,\alpha}^{t}italic_T start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_λ , italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT as follows,

TR𝜽,λ,αt(𝒚)𝒙.superscriptsubscript𝑇subscript𝑅𝜽𝜆𝛼𝑡𝒚𝒙T_{R_{\bm{\theta}},\lambda,\alpha}^{t}(\bm{y})\approx\bm{x}\,.italic_T start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_λ , italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_y ) ≈ bold_italic_x . (21)

We build the CRR-NN training by penalising the residual of the fix point problem in Equation 21 with a loss function \mathcal{L}caligraphic_L, for a training set of pairs of noiseless and noisy images {𝒙(m),𝒚(m)}m=1Msuperscriptsubscriptsuperscript𝒙𝑚superscript𝒚𝑚𝑚1𝑀\{\bm{x}^{(m)},\bm{y}^{(m)}\}_{m=1}^{M}{ bold_italic_x start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, and reads

𝜽t,λtargmin𝜽,λm=1M(TR𝜽,λ,αt(𝒚(m)),𝒙(m)).superscriptsubscript𝜽𝑡superscriptsubscript𝜆𝑡subscriptargmin𝜽𝜆superscriptsubscript𝑚1𝑀superscriptsubscript𝑇subscript𝑅𝜽𝜆𝛼𝑡superscript𝒚𝑚superscript𝒙𝑚\bm{\theta}_{t}^{\ast},\lambda_{t}^{\ast}\in\operatorname*{argmin}_{\bm{\theta% },\lambda}\sum_{m=1}^{M}\mathcal{L}\left(T_{R_{\bm{\theta}},\lambda,\alpha}^{t% }(\bm{y}^{(m)}),\bm{x}^{(m)}\right)\,.bold_italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_argmin start_POSTSUBSCRIPT bold_italic_θ , italic_λ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT caligraphic_L ( italic_T start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_λ , italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) , bold_italic_x start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) . (22)

After having trained the denoiser, we define our prior potential as

g(𝒙)=λμR𝜽(μ𝒙),𝑔𝒙𝜆𝜇subscript𝑅𝜽𝜇𝒙g(\bm{x})=\frac{\lambda}{\mu}R_{\bm{\theta}}(\mu\bm{x})\,,italic_g ( bold_italic_x ) = divide start_ARG italic_λ end_ARG start_ARG italic_μ end_ARG italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_x ) , (23)

where we have dropped the 𝜽t,λtsuperscriptsubscript𝜽𝑡superscriptsubscript𝜆𝑡\bm{\theta}_{t}^{\ast},\lambda_{t}^{\ast}bold_italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT notation for 𝜽,λ𝜽𝜆\bm{\theta},\lambdabold_italic_θ , italic_λ and added a scaling parameter, μ𝜇\muitalic_μ, to boost performance following Goujon et al. (2023b). For the optimisation algorithm, we need the Lipschitz constant of the gradient of the potential in Equation 23, which can be expressed as

Lip(g)=λμLip(R𝜽)λμ𝐖TΣ𝐖,Lip𝑔𝜆𝜇Lipsubscript𝑅𝜽𝜆𝜇normsuperscript𝐖𝑇subscriptΣ𝐖\text{Lip}(\nabla g)=\lambda\,\mu\,\text{Lip}(\nabla R_{\bm{\theta}})\leq% \lambda\,\mu\,\|\mathbf{W}^{T}\Sigma_{\infty}\mathbf{W}\|\,,Lip ( ∇ italic_g ) = italic_λ italic_μ Lip ( ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ≤ italic_λ italic_μ ∥ bold_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT bold_W ∥ , (24)

which is calculated in Goujon et al. (2023b, Prop. IV.1), and Σ=diag(σ1,,σNC)subscriptΣdiagsubscriptnormsuperscriptsubscript𝜎1subscriptnormsuperscriptsubscript𝜎subscript𝑁𝐶\Sigma_{\infty}=\text{diag}(\|{\sigma}_{1}^{\prime}\|_{\infty},\ldots,\|\sigma% _{N_{C}}^{\prime}\|_{\infty})roman_Σ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = diag ( ∥ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , … , ∥ italic_σ start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ), and 𝐖=[𝒘1𝒘NC]T𝐖superscriptdelimited-[]subscript𝒘1subscript𝒘subscript𝑁𝐶𝑇\mathbf{W}=[\bm{w}_{1}\cdots\bm{w}_{N_{C}}]^{T}bold_W = [ bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ bold_italic_w start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where 𝒘isubscript𝒘𝑖\bm{w}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to the filter 𝒉isubscript𝒉𝑖\bm{h}_{i}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: 𝒉i𝒙=𝒘iT𝒙subscript𝒉𝑖𝒙superscriptsubscript𝒘𝑖𝑇𝒙\bm{h}_{i}\ast\bm{x}=\bm{w}_{i}^{T}\bm{x}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∗ bold_italic_x = bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_x.

3.2 Computing our reconstruction: the MAP

In our case, computing the MAP reduces to solving a convex optimisation problem. Following Equation 9, the optimisation problem we address is the following one,

𝒙^MAP=argmin𝒙N12σ2𝒚Φ𝒙22+λμR𝜽(μ𝒙)+ιN(𝒙),subscript^𝒙MAPsubscriptargmin𝒙superscript𝑁12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22𝜆𝜇subscript𝑅𝜽𝜇𝒙subscript𝜄superscript𝑁𝒙\hat{\bm{x}}_{\text{MAP}}=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}% \frac{1}{2\sigma^{2}}\left\|\bm{y}-\Phi\bm{x}\right\|_{2}^{2}+\frac{\lambda}{% \mu}R_{\bm{\theta}}(\mu\bm{x})+\iota_{\mathbb{R}^{N}}(\bm{x})\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG italic_μ end_ARG italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_x ) + italic_ι start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) , (25)

where in addition we include ιNsubscript𝜄superscript𝑁\iota_{\mathbb{R}^{N}}italic_ι start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, an indicator function enforcing the reconstructed image to be real. The proximal operator of the indicator function to a convex set is known and it amounts to a projection onto that convex set. In the the last term of Equation 25 the proximal operator of the reality constraint is the projection of the vector to the real number, which is written as Re()Re\text{Re}(\cdot)Re ( ⋅ ). We have assumed a (white) Gaussian likelihood and the prior term is based on a previously trained CRR-NN. The CRR-NN is smooth with Lipschitz continuous gradients. However, the non-smoothness of the reality enforcing constraint forces us to rely on proximal algorithms (Parikh & Boyd, 2014) instead of an accelerated gradient descent method (Nesterov, 2018). In this case, we use the FISTA algorithm (Beck & Teboulle, 2009).

For the optimisation, we need the gradient of the likelihood and prior terms

𝒙f(𝒙,𝒚)=1σ2(Φ(Φ𝒙𝒚)),subscript𝒙𝑓𝒙𝒚1superscript𝜎2superscriptΦΦ𝒙𝒚\displaystyle\nabla_{\bm{x}}f(\bm{x},\bm{y})=\frac{1}{\sigma^{2}}(\Phi^{% \dagger}(\Phi\bm{x}-\bm{y}))\,,∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( roman_Φ bold_italic_x - bold_italic_y ) ) , (26)
g(𝒙)=λR𝜽(μ𝒙),𝑔𝒙𝜆subscript𝑅𝜽𝜇𝒙\displaystyle\nabla g(\bm{x})=\lambda\nabla R_{\bm{\theta}}(\mu\bm{x})\,,∇ italic_g ( bold_italic_x ) = italic_λ ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_x ) , (27)

where, in our case, ()superscript(\cdot)^{\dagger}( ⋅ ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT is the complex conjugate transpose.

To ensure the algorithm’s convergence we use the stepsize τ=1/L𝜏1𝐿\tau=1/Litalic_τ = 1 / italic_L, where L=Lip(𝒙f(𝒙,𝒚)+g(𝒙))𝐿Lipsubscript𝒙𝑓𝒙𝒚𝑔𝒙L=\text{Lip}(\nabla_{\bm{x}}f(\bm{x},\bm{y})+\nabla g(\bm{x}))italic_L = Lip ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_y ) + ∇ italic_g ( bold_italic_x ) ). We can estimate a simple bound for the Lipschitz constant as follows

L𝐿\displaystyle Litalic_L Lip(𝒙f(𝒙,𝒚))+Lip(g(𝒙))=Llikelihood+Lprior-CRR-NN,absentLipsubscript𝒙𝑓𝒙𝒚Lip𝑔𝒙subscript𝐿likelihoodsubscript𝐿prior-CRR-NN\displaystyle\leq\text{Lip}(\nabla_{\bm{x}}f(\bm{x},\bm{y}))+\text{Lip}(\nabla g% (\bm{x}))=L_{\text{likelihood}}+L_{\text{prior-CRR-NN}}\,,≤ Lip ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_y ) ) + Lip ( ∇ italic_g ( bold_italic_x ) ) = italic_L start_POSTSUBSCRIPT likelihood end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT prior-CRR-NN end_POSTSUBSCRIPT ,
ΦΦσ2+λμ𝐖TΣ𝐖,absentnormsuperscriptΦΦsuperscript𝜎2𝜆𝜇normsuperscript𝐖𝑇subscriptΣ𝐖\displaystyle\leq\frac{\|\Phi^{\dagger}\Phi\|}{\sigma^{2}}+\lambda\,\mu\,\|% \mathbf{W}^{T}\Sigma_{\infty}\mathbf{W}\|\,,≤ divide start_ARG ∥ roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT roman_Φ ∥ end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_λ italic_μ ∥ bold_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT bold_W ∥ , (28)

where we have exploited the result from Equation 24, and ΦΦnormsuperscriptΦΦ\|\Phi^{\dagger}\Phi\|∥ roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT roman_Φ ∥ denotes the spectral norm, which in the case of a linear operator coincides with its maximum singular value. In the simplified problem we are considering in Section 2.1 with gridded visibilities, we have that ΦΦ=1normsuperscriptΦΦ1\|\Phi^{\dagger}\Phi\|=1∥ roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT roman_Φ ∥ = 1. If a more realistic linear operator should be considered, the maximum singular value could be computed iteratively via the power method (Golub & van Loan, 2013).

We initialise the optimisation with the naturally weighted dirty image, 𝒙(0)=Re(Φ𝒚)subscript𝒙0ResuperscriptΦ𝒚\bm{x}_{(0)}=\text{Re}(\Phi^{\dagger}\bm{y})bold_italic_x start_POSTSUBSCRIPT ( 0 ) end_POSTSUBSCRIPT = Re ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_y ). The optimisation procedure is summarised in Algorithm 1. We optimise for a fixed number of iterations Nmaxsubscript𝑁maxN_{\text{max}}italic_N start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, or until a tolerance criterion of ξ𝜉\xiitalic_ξ is reached. The stepsize is computed using the bound from Equation 28.

1 Input: R𝜽subscript𝑅𝜽R_{\bm{\theta}}italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, ΦΦ\Phiroman_Φ, σ𝜎\sigmaitalic_σ, μ𝜇\muitalic_μ, λ𝜆\lambdaitalic_λ, ξ𝜉\xiitalic_ξ, a(1)=1subscript𝑎11a_{(1)}=1italic_a start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT = 1, 𝒛(1)=𝒙(0)=Re(Φ𝒚)subscript𝒛1subscript𝒙0ResuperscriptΦ𝒚\bm{z}_{(1)}=\bm{x}_{(0)}=\text{Re}(\Phi^{\dagger}\bm{y})bold_italic_z start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( 0 ) end_POSTSUBSCRIPT = Re ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_y ), τ=0.98/L𝜏0.98𝐿\tau=0.98/Litalic_τ = 0.98 / italic_L.
2 Output: 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT
3 for n=1,,Nmax𝑛1subscript𝑁maxn=1,\ldots,N_{\text{max}}italic_n = 1 , … , italic_N start_POSTSUBSCRIPT max end_POSTSUBSCRIPT do
4       𝒙(n)=𝒛(n)τ(1σ2Re(Φ(Φ𝒛(n)𝒚))+λR𝜽(μ𝒛(n)))subscript𝒙𝑛subscript𝒛𝑛𝜏1superscript𝜎2ResuperscriptΦΦsubscript𝒛𝑛𝒚𝜆subscript𝑅𝜽𝜇subscript𝒛𝑛\bm{x}_{(n)}=\bm{z}_{(n)}-\tau\left(\frac{1}{\sigma^{2}}\text{Re}(\Phi^{% \dagger}(\Phi\bm{z}_{(n)}-\bm{y}))+\lambda\nabla R_{\bm{\theta}}(\mu\bm{z}_{(n% )})\right)bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT = bold_italic_z start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - italic_τ ( divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG Re ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( roman_Φ bold_italic_z start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_y ) ) + italic_λ ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_z start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ) )
5       a(n+1)=12(1+4a(n)2+1)subscript𝑎𝑛11214superscriptsubscript𝑎𝑛21a_{(n+1)}=\frac{1}{2}(1+\sqrt{4a_{(n)}^{2}+1})italic_a start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 + square-root start_ARG 4 italic_a start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG )
6       𝒛(n+1)=𝒙(n)+a(n)1a(n+1)(𝒙(n)𝒙(n1))subscript𝒛𝑛1subscript𝒙𝑛subscript𝑎𝑛1subscript𝑎𝑛1subscript𝒙𝑛subscript𝒙𝑛1\bm{z}_{(n+1)}=\bm{x}_{(n)}+\frac{a_{(n)}-1}{a_{(n+1)}}(\bm{x}_{(n)}-\bm{x}_{(% n-1)})bold_italic_z start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT + divide start_ARG italic_a start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT end_ARG ( bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT )
7       if 𝐱(n)𝐱(n1)𝐱(n1)<ξnormsubscript𝐱𝑛subscript𝐱𝑛1normsubscript𝐱𝑛1𝜉\frac{\|\bm{x}_{(n)}-\bm{x}_{(n-1)}\|}{\|\bm{x}_{(n-1)}\|}<\xidivide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT ∥ end_ARG < italic_ξ then
8             break
9       end if
10      
11 end for
12
set 𝒙^MAP=𝒙(n)subscript^𝒙MAPsubscript𝒙𝑛\hat{\bm{x}}_{\text{MAP}}=\bm{x}_{(n)}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT
Algorithm 1 FISTA (Beck & Teboulle, 2009) tackling (25)

4 Scalable uncertainty quantification

Enforcing the posterior’s convexity and explicit potential opens the door to scalable UQ methodology that was unreachable otherwise. The restriction to log-concave posteriors is the price we pay to gain great scalability. Our approach is based on the work from Pereyra (2017), which exploits concentration phenomena occurring in high-dimensional log-concave posteriors. The Bayesian high-posterior-density region can be approximated in log-concave models as the posterior probability mass tends to concentrate in particular regions on the parameter space. The approximation requires the MAP estimation, 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT, which we have already computed as it is the chosen point estimate for our reconstruction. This result allows us to estimate information from the posterior probability density function without MCMC sampling. In this Section, we introduce the main result we exploit for UQ. We then describe the proposed scalable UQ methods and how to validate our results with Langevin-based MCMC sampling algorithms.

4.1 Highest Posterior Density Regions

Let us define a posterior credible region with a credible level of 100(1α)%100percent1𝛼100(1-\alpha)\%100 ( 1 - italic_α ) % as a set CαNsubscript𝐶𝛼superscript𝑁C_{\alpha}\in\mathbb{R}^{N}italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT satisfying

p(𝒙Cα|𝒚)=𝒙Np(𝒙|𝒚)𝟙Cα(𝒙)d𝒙=1α,𝑝𝒙conditionalsubscript𝐶𝛼𝒚subscript𝒙superscript𝑁𝑝conditional𝒙𝒚subscript1subscript𝐶𝛼𝒙differential-d𝒙1𝛼p(\bm{x}\in C_{\alpha}|\bm{y})=\int_{\bm{x}\in\mathbb{R}^{N}}p(\bm{x}|\bm{y})% \mathbbm{1}_{C_{\alpha}}(\bm{x}){\rm d}\bm{x}=1-\alpha,italic_p ( bold_italic_x ∈ italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT | bold_italic_y ) = ∫ start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_x | bold_italic_y ) blackboard_1 start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) roman_d bold_italic_x = 1 - italic_α , (29)

with 𝟙Cαsubscript1subscript𝐶𝛼\mathbbm{1}_{C_{\alpha}}blackboard_1 start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT being being unity if 𝒙Cα𝒙subscript𝐶𝛼\bm{x}\in C_{\alpha}bold_italic_x ∈ italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT and zero otherwise. There are many regions satisfying the previous equation. We will focus on the highest posterior density region (HPD), which is defined as

Cα:={𝒙N:f(𝒙)+g(𝒙)γα},assignsubscript𝐶𝛼conditional-set𝒙superscript𝑁𝑓𝒙𝑔𝒙subscript𝛾𝛼C_{\alpha}:=\left\{\bm{x}\in\mathbb{R}^{N}:f(\bm{x})+g(\bm{x})\leq\gamma_{% \alpha}\right\}\,,italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT := { bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT : italic_f ( bold_italic_x ) + italic_g ( bold_italic_x ) ≤ italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT } , (30)

where f𝑓fitalic_f and g𝑔gitalic_g are the potentials of our likelihood and prior terms, and γαsubscript𝛾𝛼\gamma_{\alpha}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT is a constant that defines a level-set of the log-posterior such that Equation 29 holds. The HPD region has the property of having minimum volume (Robert, 2007, §5.5).

Our posterior p(𝒙|𝒚)=exp[f(𝒙)g(𝒙)]/Z𝑝conditional𝒙𝒚𝑓𝒙𝑔𝒙𝑍p(\bm{x}|\bm{y})=\exp[-f(\bm{x})-g(\bm{x})]/Zitalic_p ( bold_italic_x | bold_italic_y ) = roman_exp [ - italic_f ( bold_italic_x ) - italic_g ( bold_italic_x ) ] / italic_Z is log-concave on Nsuperscript𝑁\mathbb{R}^{N}roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where Z𝑍Zitalic_Z is the Bayesian evidence. Then, following Pereyra (2017, Theorem 3.1), for any α(4exp[(N/3)],1)𝛼4𝑁31\alpha\in(4\exp[(-N/3)],1)italic_α ∈ ( 4 roman_exp [ ( - italic_N / 3 ) ] , 1 ), the HPD region Cαsubscript𝐶𝛼C_{\alpha}italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT from Equation 30 is contained in

C^α={𝒙N:f(𝒙)+g(𝒙)γ^α},subscript^𝐶𝛼conditional-set𝒙superscript𝑁𝑓𝒙𝑔𝒙subscript^𝛾𝛼\hat{C}_{\alpha}=\left\{\bm{x}\in\mathbb{R}^{N}:f(\bm{x})+g(\bm{x})\leq\hat{% \gamma}_{\alpha}\right\},over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = { bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT : italic_f ( bold_italic_x ) + italic_g ( bold_italic_x ) ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT } , (31)

where

γ^α=f(𝒙^MAP)+g(𝒙^MAP)+Nτα+N,subscript^𝛾𝛼𝑓subscript^𝒙MAP𝑔subscript^𝒙MAP𝑁subscript𝜏𝛼𝑁\hat{\gamma}_{\alpha}=f(\hat{\bm{x}}_{\text{MAP}})+g(\hat{\bm{x}}_{\text{MAP}}% )+\sqrt{N}\tau_{\alpha}+N,over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT ) + italic_g ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT ) + square-root start_ARG italic_N end_ARG italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT + italic_N , (32)

with a positive constant τα=16log(3/α)subscript𝜏𝛼163𝛼\tau_{\alpha}=\sqrt{16\log(3/\alpha)}italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = square-root start_ARG 16 roman_log ( 3 / italic_α ) end_ARG independent of p(𝒙|𝒚)𝑝conditional𝒙𝒚p(\bm{x}|\bm{y})italic_p ( bold_italic_x | bold_italic_y ).

Theorem 3.2 in Pereyra (2017) gives the error analysis of the approximation, and we see that 0γ^αγαταN+N0subscript^𝛾𝛼subscript𝛾𝛼subscript𝜏𝛼𝑁𝑁0\leq\hat{\gamma}_{\alpha}-\gamma_{\alpha}\leq\tau_{\alpha}\sqrt{N}+N0 ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ≤ italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT square-root start_ARG italic_N end_ARG + italic_N. Therefore, the error upper bound grows linearly with the dimension N𝑁Nitalic_N, making C^αsubscript^𝐶𝛼\hat{C}_{\alpha}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT a stable approximation of Cαsubscript𝐶𝛼C_{\alpha}italic_C start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT. The error lower bound along with the convexity of f+g𝑓𝑔f+gitalic_f + italic_g guarantees the inclusion CC^𝐶^𝐶C\subseteq\hat{C}italic_C ⊆ over^ start_ARG italic_C end_ARG and consequently the resulting approximation is a conservative one where the errors are overestimated.

After showing the main result allowing us to do UQ bypassing posterior sampling methods, it is clear from where the constraints of the prior come. The convexity is required to guarantee a log-concave posterior, as the likelihood potential is convex. The prior potential g𝑔gitalic_g needs to be explicit to compute the approximate HPD region using Equation 32.

4.2 MAP-based UQ methods

We now explore different scalable UQ schemes based on the fast approximate implicit representation of the HPD region. For all the methods presented, we assume that we have already computed the 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT estimation and the approximated HPD region threshold, γ^αsubscript^𝛾𝛼\hat{\gamma}_{\alpha}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT.

4.2.1 Bayesian hypothesis testing of structure

A useful UQ tool is to perform a knock-out hypothesis test to asses if a surrogate image still belongs to the HPD region (Cai et al., 2018a, b; Price et al., 2021b). First, the surrogate image 𝒙sgtsubscript𝒙sgt\bm{x}_{\text{sgt}}bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT is constructed by modifying the reconstruction, 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT. Then, it suffices to check if

f(𝒙sgt)+g(𝒙sgt)γ^α.𝑓subscript𝒙sgt𝑔subscript𝒙sgtsubscript^𝛾𝛼f(\bm{x}_{\text{sgt}})+g(\bm{x}_{\text{sgt}})\leq\hat{\gamma}_{\alpha}\,.italic_f ( bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT ) + italic_g ( bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT . (33)

If the inequality is satisfied, we cannot draw conclusions on the test we made, as 𝒙sgtsubscript𝒙sgt\bm{x}_{\text{sgt}}bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT still belongs to the HPD region. However, if the inequality does not hold, we can conclude that 𝒙sgtsubscript𝒙sgt\bm{x}_{\text{sgt}}bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT is out from the HDP region with a 100(1α)%100percent1𝛼100(1-\alpha)\%100 ( 1 - italic_α ) % confidence level.

This test can answer a variety of questions about our reconstructed image. One example is to interrogate some structure in the image to see if it is a reconstruction artefact or is physically motivated. For this question, the surrogate image would be composed of an image with the region of interest artificially inpainted with surrounding information. We need to take the inpainted image as our surrogate and evaluate Equation 33 to see if the test is conclusive.

The image inpainting algorithm is built similarly as in Cai et al. (2018b) but adapted to the CRR-NN-based prior. We start by selecting a region of interest ΩDsubscriptΩD\Omega_{\text{D}}roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT, which is a subset of (typically contiguous) pixels from the image, where ΩDΩsubscriptΩDΩ\Omega_{\text{D}}\subseteq\Omegaroman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT ⊆ roman_Ω, where ΩΩ\Omegaroman_Ω denotes the set of all the image pixels. The region ΩDsubscriptΩD\Omega_{\text{D}}roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT will be inpainted with background information. We then inpaint this region iteratively minimising R𝜽subscript𝑅𝜽R_{\bm{\theta}}italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT based on the following scheme

𝒙sgt,(m+1)=𝒙^MAPsubscript𝒙sgt𝑚1subscript^𝒙MAP\displaystyle\bm{x}_{\text{sgt},(m+1)}=\hat{\bm{x}}_{\text{MAP}}bold_italic_x start_POSTSUBSCRIPT sgt , ( italic_m + 1 ) end_POSTSUBSCRIPT = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT 𝟙ΩΩDsubscript1ΩsubscriptΩD\displaystyle\mathbbm{1}_{\Omega-\Omega_{\text{D}}}blackboard_1 start_POSTSUBSCRIPT roman_Ω - roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT (34)
+(𝒙sgt,(m)αλR𝜽(μ𝒙sgt,(m)))𝟙ΩD,subscript𝒙sgt𝑚𝛼𝜆subscript𝑅𝜽𝜇subscript𝒙sgt𝑚subscript1subscriptΩD\displaystyle+\left(\bm{x}_{\text{sgt},(m)}-\alpha\lambda\nabla R_{\bm{\theta}% }\left(\mu\,\bm{x}_{\text{sgt},(m)}\right)\right)\mathbbm{1}_{\Omega_{\text{D}% }}\,,+ ( bold_italic_x start_POSTSUBSCRIPT sgt , ( italic_m ) end_POSTSUBSCRIPT - italic_α italic_λ ∇ italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_x start_POSTSUBSCRIPT sgt , ( italic_m ) end_POSTSUBSCRIPT ) ) blackboard_1 start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

where 𝟙1\mathbbm{1}blackboard_1 are indicator functions, and 𝟙ΩΩDsubscript1ΩsubscriptΩD\mathbbm{1}_{\Omega-\Omega_{\text{D}}}blackboard_1 start_POSTSUBSCRIPT roman_Ω - roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a shorthand for 𝟙Ω𝟙ΩDsubscript1Ωsubscript1subscriptΩD\mathbbm{1}_{\Omega}-\mathbbm{1}_{\Omega_{\text{D}}}blackboard_1 start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT - blackboard_1 start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT. We carry out a gradient step with the CRR-NN on the surrogate image and only update the region of interest. The hyperparameters, α𝛼\alphaitalic_α, λ𝜆\lambdaitalic_λ, and μ𝜇\muitalic_μ are set as in Algorithm 1.

Alternatively, Repetti et al. (2019) presented a more sophisticated method to perform hypothesis testing of structure, which also exploits the approximations in Equations 31-32. The method is dubbed Bayesian uncertainty quantification by optimisation (BUQO), and to answer the hypothesis test, it aims to study the intersection of two sets. The first one is defined in Equation 31 that corresponds to the MAP estimate. The second one describes the set of feasible inpainted images given a region of interest and a set of constraints of desired properties. If the set intersection is empty, the structure of interest is considered present in the image with confidence α𝛼\alphaitalic_α from Equation 31. Tang & Repetti (2023) proposed an extension of the BUQO method to inpaint with data-driven models. However, these methods involve solving an expensive optimisation problem that does not scale with the high-dimensional settings we are considering in this work.

Another example is to interrogate the reconstruction to see if the fine structure of the image is physical or likely an artefact. To construct the surrogate image we convolve the region of interest, ΩDsubscriptΩD\Omega_{\text{D}}roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT, with a Gaussian smoothing kernel G(0,Σ)𝐺0ΣG(0,\Sigma)italic_G ( 0 , roman_Σ ),

𝒙sgt=𝒙^MAP𝟙ΩΩD+(𝒙^MAPG(0,Σ))𝟙ΩD,subscript𝒙sgtsubscript^𝒙MAPsubscript1ΩsubscriptΩDsubscript^𝒙MAP𝐺0Σsubscript1subscriptΩD\bm{x}_{\text{sgt}}=\hat{\bm{x}}_{\text{MAP}}\mathbbm{1}_{\Omega-\Omega_{\text% {D}}}+\left(\hat{\bm{x}}_{\text{MAP}}\ast G(0,\Sigma)\right)\mathbbm{1}_{% \Omega_{\text{D}}}\,,bold_italic_x start_POSTSUBSCRIPT sgt end_POSTSUBSCRIPT = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT roman_Ω - roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT ∗ italic_G ( 0 , roman_Σ ) ) blackboard_1 start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT D end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (35)

where \ast denotes the 2D convolution operation and test Equation 33.

4.2.2 Local credible intervals

Local credible intervals (LCIs) provide a tool to quantify spatial uncertainty per pixel at different resolutions. The LCIs are interpreted as Bayesian error bars for each pixel or superpixel, where with superpixel, we refer to a group of contiguous pixels. Cai et al. (2018a) computed LCIs using MCMC methods and then extended it in Cai et al. (2018b) to compute them based on the approximated HPD region based on the MAP. Price et al. (2019) also exploited MAP-based LCIs in another imaging inverse problem, mass-map**, for weak gravitational lensing convergence reconstruction.

Let us write Ω={Ωi}i=1MΩsuperscriptsubscriptsubscriptΩ𝑖𝑖1𝑀\Omega=\{\Omega_{i}\}_{i=1}^{M}roman_Ω = { roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT the set of superpixels that partition the domain of 𝒙𝒙\bm{x}bold_italic_x. This partition is such that ΩiΩj=,ijformulae-sequencesubscriptΩ𝑖subscriptΩ𝑗for-all𝑖𝑗\Omega_{i}\cap\Omega_{j}=\emptyset,\forall i\neq jroman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∅ , ∀ italic_i ≠ italic_j and Ω=iΩiΩsubscript𝑖subscriptΩ𝑖\Omega=\cup_{i}\Omega_{i}roman_Ω = ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We denote the indicator of the superpixel ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as 𝜻Ωisubscript𝜻subscriptΩ𝑖\bm{\zeta}_{\Omega_{i}}bold_italic_ζ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, that is one if the pixel belongs to the superpixel ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and zero otherwise. The use of smaller or bigger superpixel sizes, i.e., 𝜻Ωi0subscriptnormsubscript𝜻subscriptΩ𝑖0\|\bm{\zeta}_{\Omega_{i}}\|_{0}∥ bold_italic_ζ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, allows us to visualise the LCIs at different scales. The calculation of the LCIs is based on computing an upper and lower bound for each superpixel. Each bound is defined by the constant value we need to add or subtract to the superpixel region so that the modified image exits the approximate HPD credible region C^αsubscript^𝐶𝛼\hat{C}_{\alpha}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT. In other words, we compute the values that saturate the HPD region for each superpixel.

We can isolate the superpixel region by defining the following surrogate image

𝒙i,ξ=𝒙^MAP(𝑰𝜻Ωi)+(ξ+𝒙¯MAP,Ωi)𝜻Ωi,subscript𝒙𝑖𝜉subscript^𝒙MAP𝑰subscript𝜻subscriptΩ𝑖𝜉subscript¯𝒙MAPsubscriptΩ𝑖subscript𝜻subscriptΩ𝑖\bm{x}_{i,\xi}=\hat{\bm{x}}_{\text{MAP}}(\bm{I}-\bm{\zeta}_{\Omega_{i}})+(\xi+% \bar{\bm{x}}_{\text{MAP},\Omega_{i}})\bm{\zeta}_{\Omega_{i}}\,,bold_italic_x start_POSTSUBSCRIPT italic_i , italic_ξ end_POSTSUBSCRIPT = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT ( bold_italic_I - bold_italic_ζ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + ( italic_ξ + over¯ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) bold_italic_ζ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (36)

where 𝒙¯MAP,Ωisubscript¯𝒙MAPsubscriptΩ𝑖\bar{\bm{x}}_{\text{MAP},\Omega_{i}}over¯ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponds to the mean value of the pixels in the superpixel ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and ξ𝜉\xi\in\mathbb{R}italic_ξ ∈ roman_ℝ. We vary the superpixel value from its mean by a uniform value ξ𝜉\xiitalic_ξ. The bounds for a superpixel ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are computed as

ξ+,Ωi=subscript𝜉subscriptΩ𝑖absent\displaystyle\xi_{+,\Omega_{i}}=italic_ξ start_POSTSUBSCRIPT + , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = maxξ{ξ|f(𝒙i,ξ,𝒚)+g(𝒙i,ξ)γ^α,ξ[0,+)},subscript𝜉conditional𝜉𝑓subscript𝒙𝑖𝜉𝒚𝑔subscript𝒙𝑖𝜉subscript^𝛾𝛼𝜉0\displaystyle\max_{\xi}\left\{\xi\,|\,f(\bm{x}_{i,\xi},\bm{y})+g(\bm{x}_{i,\xi% })\leq\hat{\gamma}_{\alpha},\,\xi\in[0,+\infty)\right\}\,,roman_max start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT { italic_ξ | italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_ξ end_POSTSUBSCRIPT , bold_italic_y ) + italic_g ( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_ξ end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_ξ ∈ [ 0 , + ∞ ) } , (37)
ξ,Ωi=subscript𝜉subscriptΩ𝑖absent\displaystyle\xi_{-,\Omega_{i}}=italic_ξ start_POSTSUBSCRIPT - , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = minξ{ξ|f(𝒙i,ξ,𝒚)+g(𝒙i,ξ)γ^α,ξ(,0]},subscript𝜉conditional𝜉𝑓subscript𝒙𝑖𝜉𝒚𝑔subscript𝒙𝑖𝜉subscript^𝛾𝛼𝜉0\displaystyle\min_{\xi}\left\{\xi\,|\,f(\bm{x}_{i,\xi},\bm{y})+g(\bm{x}_{i,\xi% })\leq\hat{\gamma}_{\alpha},\,\xi\in(-\infty,0]\right\}\,,roman_min start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT { italic_ξ | italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_ξ end_POSTSUBSCRIPT , bold_italic_y ) + italic_g ( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_ξ end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_ξ ∈ ( - ∞ , 0 ] } , (38)

where we use the threshold γ^αsubscript^𝛾𝛼\hat{\gamma}_{\alpha}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT defined in Equation 32. The calculation of each bound can be recast as a problem of finding the zero of a function. The class of root-finding algorithms is well adapted for this root-finding problem, and, in practice, we use the bisection method (Burden & Faires, 1989). Price et al. (2021a) proposed a faster way to compute the superpixel bounds by exploiting linearity that we could use to further accelerate the computation of ξ+,Ωisubscript𝜉subscriptΩ𝑖\xi_{+,\Omega_{i}}italic_ξ start_POSTSUBSCRIPT + , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ξ,Ωisubscript𝜉subscriptΩ𝑖\xi_{-,\Omega_{i}}italic_ξ start_POSTSUBSCRIPT - , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Once the bounds have been computed, we can collate the results for all superpixels and use the length of the LCIs to visualise the reconstruction uncertainty. The length of the LCI for superpixel ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined as li=ξ+,Ωiξ,Ωisubscript𝑙𝑖subscript𝜉subscriptΩ𝑖subscript𝜉subscriptΩ𝑖l_{i}=\xi_{+,\Omega_{i}}-\xi_{-,\Omega_{i}}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ξ start_POSTSUBSCRIPT + , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT - , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which we can visualise as an uncertainty image

𝝃=i(ξ+,Ωiξ,Ωi)𝜻Ωi.𝝃subscript𝑖subscript𝜉subscriptΩ𝑖subscript𝜉subscriptΩ𝑖subscript𝜻subscriptΩ𝑖\bm{\xi}=\sum_{i}\left(\xi_{+,\Omega_{i}}-\xi_{-,\Omega_{i}}\right)\bm{\zeta}_% {\Omega_{i}}\,.bold_italic_ξ = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT + , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT - , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) bold_italic_ζ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT . (39)

The choice of using the mean on 𝒙¯MAP,Ωisubscript¯𝒙MAPsubscriptΩ𝑖\bar{\bm{x}}_{\text{MAP},\Omega_{i}}over¯ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for the region of the superpixel that will be studied constitutes a deviation from the original MAP reconstruction. We find it more physical to move the averaged superpixel rather than moving the original pixels belonging to the superpixel by a constant value. This choice constitutes another approximation to the proposed scheme that has already approximated the HPD region in Equation 31. Using superpixels allows us to gain sensibility and computing time at the expense of lowering the resolution of the LCI map, which can be a sensible trade-off for very large images.

We will later validate the computed LCIs using the posterior samples obtained from computing the posterior standard deviation at different superpixel sizes. The method requires turning each posterior sample into an image with M𝑀Mitalic_M superpixels. We set the value of the superpixel to the mean of the values of belonging pixels.

4.2.3 Fast pixel uncertainty quantification at different scales

The MAP-based LCIs described in the previous section are orders of magnitude faster than their MCMC-based counterparts (Cai et al., 2018a, b). Nevertheless, to compute the LCIs, we still need to evaluate the likelihood potential, f𝑓fitalic_f, several times for each superpixel. As mentioned, evaluating the likelihood potential is by far the most time-consuming operation. The fact that we need to evaluate f𝑓fitalic_f several times for each subpixel might make the LCIs impractical for SKA-scale problems.

To overcome this issue, we propose a new approach relying on wavelet decomposition of the MAP reconstruction that reads

𝒙^MAP=𝚿𝒂^MAP=i=1L𝚿ia^MAP,i,subscript^𝒙MAP𝚿subscript^𝒂MAPsuperscriptsubscript𝑖1𝐿subscript𝚿𝑖subscript^𝑎MAP𝑖\hat{\bm{x}}_{\text{MAP}}=\mathbf{\Psi}\,\hat{\bm{a}}_{\text{MAP}}=\sum_{i=1}^% {L}\mathbf{\Psi}_{i}\,\hat{a}_{\text{MAP},i}\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = bold_Ψ over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT MAP , italic_i end_POSTSUBSCRIPT , (40)

with a wavelet dictionary 𝚿𝚿\mathbf{\Psi}bold_Ψ. We define the hard thresholding operator for 𝒂L𝒂superscript𝐿\bm{a}\in\mathbb{C}^{L}bold_italic_a ∈ roman_ℂ start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT with a threshold of ξthsubscript𝜉th\xi_{\text{th}}italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT,

Shard,ξth(𝒂)=[Shard,ξth(a1),,Shard,ξth(aL)]T,subscript𝑆hardsubscript𝜉th𝒂superscriptsubscript𝑆hardsubscript𝜉thsubscript𝑎1subscript𝑆hardsubscript𝜉thsubscript𝑎𝐿𝑇S_{\text{hard},\,\xi_{\text{th}}}(\bm{a})=\left[S_{\text{hard},\,\xi_{\text{th% }}}(a_{1}),\ldots,S_{\text{hard},\,\xi_{\text{th}}}(a_{L})\right]^{T}\,,italic_S start_POSTSUBSCRIPT hard , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_a ) = [ italic_S start_POSTSUBSCRIPT hard , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_S start_POSTSUBSCRIPT hard , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (41)

as the point-wise application of the following hard-thresholding function

Shard,ξth(ai)={0,if|ai|ξth,ai,otherwise.subscript𝑆hardsubscript𝜉thsubscript𝑎𝑖cases0ifsubscript𝑎𝑖subscript𝜉thsubscript𝑎𝑖otherwiseS_{\text{hard},\,\xi_{\text{th}}}(a_{i})=\begin{cases}0,&\text{if}\ |a_{i}|% \leq\xi_{\text{th}},\\ a_{i},&\text{otherwise}.\end{cases}italic_S start_POSTSUBSCRIPT hard , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ROW start_CELL 0 , end_CELL start_CELL if | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL otherwise . end_CELL end_ROW (42)

Let ξ^thsubscript^𝜉th\hat{\xi}_{\text{th}}over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT be the thresholded value for which the thresholded image saturate the HPD region as follows

ξ^th=maxξth{ξth|\displaystyle\hat{\xi}_{\text{th}}=\max_{\xi_{\text{th}}}\{\xi_{\text{th}}\,|over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT | f(𝒙^MAP,ξth,𝒚)+g(𝒙^MAP,ξth)γ^α,𝑓subscript^𝒙MAPsubscript𝜉th𝒚𝑔subscript^𝒙MAPsubscript𝜉thsubscript^𝛾𝛼\displaystyle\,f(\hat{\bm{x}}_{\text{MAP},\,\xi_{\text{th}}},\bm{y})+g(\hat{% \bm{x}}_{\text{MAP},\,\xi_{\text{th}}})\leq\hat{\gamma}_{\alpha},italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_y ) + italic_g ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , (43)
𝒙^MAP,ξth=𝚿Shard,ξth(𝒂^MAP),ξ[0,+)}.\displaystyle\hat{\bm{x}}_{\text{MAP},\,\xi_{\text{th}}}=\bm{\Psi}\,S_{\text{% hard},\,\xi_{\text{th}}}(\hat{\bm{a}}_{\text{MAP}}),\,\xi\in[0,+\infty)\}\,.over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Ψ italic_S start_POSTSUBSCRIPT hard , italic_ξ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT ) , italic_ξ ∈ [ 0 , + ∞ ) } .

We can compute the threshold bound with a root-finding method, as was the case for the LCIs. The advantage of this formulation is that we are solving only one root-finding problem as opposed to one per superpixel in the LCIs calculation. This change considerably reduces the number of likelihood evaluations and, therefore, the computational complexity of the UQ method.

By computing the difference between the MAP, 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT, and the thresholded surrogate, 𝒙^MAP,ξ^thsubscript^𝒙MAPsubscript^𝜉th\hat{\bm{x}}_{\text{MAP},\,\hat{\xi}_{\text{th}}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we obtain an estimation of the solution’s uncertainty and this can give us information about possible errors in the MAP. Furthermore, we can compare the MAP and the thresholded surrogate image to estimate errors as a function of scale, thus exposing the different structures of our reconstruction.

Let us consider our wavelet transformation as a multiscale transform of level J+1𝐽1J+1italic_J + 1 (Mallat, 2008; Starck et al., 2010). We can rewrite Equation 40 showcasing the multiscale coefficients as follows

𝒙^MAP=𝚿𝒂^MAP=l=0J𝚿l𝒂^MAP,l,subscript^𝒙MAP𝚿subscript^𝒂MAPsuperscriptsubscript𝑙0𝐽subscript𝚿𝑙subscript^𝒂MAP𝑙\hat{\bm{x}}_{\text{MAP}}=\mathbf{\Psi}\,\hat{\bm{a}}_{\text{MAP}}=\sum_{l=0}^% {J}\mathbf{\Psi}_{l}\,\hat{\bm{a}}_{\text{MAP},l}\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = bold_Ψ over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , italic_l end_POSTSUBSCRIPT , (44)

where 𝒂^MAP,lsubscript^𝒂MAP𝑙\hat{\bm{a}}_{\text{MAP},l}over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , italic_l end_POSTSUBSCRIPT are the coefficients corresponding to the l𝑙litalic_l-th level of decomposition, and the zeroth level corresponds to the coarse scale. We can now build thresholded surrogate images at different scales by replacing the MAP wavelet coefficients at level l𝑙litalic_l from Equation 44 with the coefficients of the thresholded surrogate image 𝒙^MAP,ξ^thsubscript^𝒙MAPsubscript^𝜉th\hat{\bm{x}}_{\text{MAP},\,\hat{\xi}_{\text{th}}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Let us write the thresholded surrogate image at level j𝑗jitalic_j as follows

𝒙^MAP,ξ^th,j=l=0,ljJ𝚿l𝒂^MAP,l+𝚿j𝒂^MAP,ξ^th,j,subscript^𝒙MAPsubscript^𝜉th𝑗superscriptsubscript𝑙0𝑙𝑗𝐽subscript𝚿𝑙subscript^𝒂MAP𝑙subscript𝚿𝑗subscript^𝒂MAPsubscript^𝜉th𝑗\hat{\bm{x}}_{\text{MAP},\,\hat{\xi}_{\text{th}},\,j}=\sum_{\begin{subarray}{c% }l=0,\\ l\neq j\end{subarray}}^{J}\mathbf{\Psi}_{l}\,\hat{\bm{a}}_{\text{MAP},l}+% \mathbf{\Psi}_{j}\hat{\bm{a}}_{\text{MAP},\,\hat{\xi}_{\text{th}},\,j}\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_l = 0 , end_CELL end_ROW start_ROW start_CELL italic_l ≠ italic_j end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , italic_l end_POSTSUBSCRIPT + bold_Ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT , (45)

where 𝒂^MAP,ξ^th,jsubscript^𝒂MAPsubscript^𝜉th𝑗\hat{\bm{a}}_{\text{MAP},\,\hat{\xi}_{\text{th}},\,j}over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT corresponds to the wavelet coefficients of the thesholded surrogate image 𝒙^MAP,ξ^thsubscript^𝒙MAPsubscript^𝜉th\hat{\bm{x}}_{\text{MAP},\,\hat{\xi}_{\text{th}}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT at level j𝑗jitalic_j. The errors at level j𝑗jitalic_j can be approximated by the difference between 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT and 𝒙^MAP,ξ^th,jsubscript^𝒙MAPsubscript^𝜉th𝑗\hat{\bm{x}}_{\text{MAP},\,\hat{\xi}_{\text{th}},\,j}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP , over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT.

There are two main advantages of this approach to pixel-based UQ with respect to the LCIs described in Section 4.2.2. The first one is the reduced computational complexity, as we only need to solve a single root-finding problem, significantly reducing the number of likelihood evaluations. The second is that when we saturate the HPD region, we consider the entire image simultaneously. In the LCI computation, we only change a small group of pixels until it saturates the HDP region that corresponds to the entire image. This behaviour can be problematic as, for example, the LCI error bounds will be larger if the size of the image grows and the superpixel size is kept the same, which is an undesirable property. Consequently, the predicted errors from the new pixel UQ approach are faster to compute and considerably tighter than the LCIs.

4.3 MCMC sampling and UQ validation

As stated before, MCMC sampling is not yet scalable to the high dimensions of the RI imaging problems we target. However, sampling is still helpful in validating the UQ approaches of this paper. If we sample from the posterior distribution, we can compute the posterior standard deviation, providing a visual representation of the posterior, including the learned convex regulariser. Sampling from the posterior also allows us to compare the MAP estimator with another widely known estimator, the posterior mean (Arridge et al., 2019), which coincides with the minimum mean squared error (MMSE) estimator under some assumptions.

The log-posterior distribution of the QuantifAI model with the CRR-NN reads

logpCRR-NN(𝒙|𝒚)12σ2𝒚Φ𝒙22+λμR𝜽(μ𝒙)+ιN(𝒙),proportional-tosubscript𝑝CRR-NNconditional𝒙𝒚12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22𝜆𝜇subscript𝑅𝜽𝜇𝒙subscript𝜄superscript𝑁𝒙-\log p_{\text{\scriptsize CRR-NN}}(\bm{x}|\bm{y})\propto\frac{1}{2\sigma^{2}}% \left\|\bm{y}-\Phi\bm{x}\right\|_{2}^{2}+\frac{\lambda}{\mu}R_{\bm{\theta}}(% \mu\bm{x})+\iota_{\mathbb{R}^{N}}(\bm{x})\,,- roman_log italic_p start_POSTSUBSCRIPT CRR-NN end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_y ) ∝ divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG italic_μ end_ARG italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_μ bold_italic_x ) + italic_ι start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) , (46)

with the first two terms belonging to 𝒞1superscript𝒞1\mathcal{C}^{1}caligraphic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT with Lipschitz gradients, we do not need to use any approximation, e.g., the MY envelope, to sample from it. The reality constraint is enforced directly when evaluating the gradient of the log-likelihood. The Langevin diffusion sampling algorithms reviewed in Section 2.5.3 require the gradient of the log-posterior distribution, which have been computed in Equation 26 and Equation 27. In practice, we will use the SK-ROCK algorithm (Pereyra et al., 2020) as it provides a faster convergence than the ULA algorithm. We omit the subsequent MH step to accelerate the calculations motivated by the consistent results from Cai et al. (2018a).

The log-posterior distribution of the analysis formulation of the model from Cai et al. (2018a) with a wavelet-based sparsity enforcing prior reads

logpWAV(𝒙|𝒚)12σ2𝒚Φ𝒙22+λΨ𝒙1+ιN(𝒙),proportional-tosubscript𝑝WAVconditional𝒙𝒚12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22𝜆subscriptnormsuperscriptΨ𝒙1subscript𝜄superscript𝑁𝒙-\log p_{\text{\scriptsize WAV}}(\bm{x}|\bm{y})\propto\frac{1}{2\sigma^{2}}% \left\|\bm{y}-\Phi\bm{x}\right\|_{2}^{2}+\lambda\big{\|}\Psi^{\dagger}\bm{x}% \big{\|}_{1}+\iota_{\mathbb{R}^{N}}(\bm{x})\,,- roman_log italic_p start_POSTSUBSCRIPT WAV end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_y ) ∝ divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ι start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) , (47)

which includes the non-smooth prior term with the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, and the reality constraint which we again apply to the gradient of the log-likelihood. We resort to the MY envelope γ𝛾\gammaitalic_γ-approximation of the sparsity-inducing prior term as shown in Equation 14. The proximal operator of the prior term has a closed-form solution that reads

Ssoft,βth(𝒂)=[Ssoft,βth(a1),,Ssoft,βth(aL)]T,subscript𝑆softsubscript𝛽th𝒂superscriptsubscript𝑆softsubscript𝛽thsubscript𝑎1subscript𝑆softsubscript𝛽thsubscript𝑎𝐿𝑇S_{\text{soft},\,\beta_{\text{th}}}(\bm{a})=\left[S_{\text{soft},\,\beta_{% \text{th}}}(a_{1}),\ldots,S_{\text{soft},\,\beta_{\text{th}}}(a_{L})\right]^{T% }\,,italic_S start_POSTSUBSCRIPT soft , italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_a ) = [ italic_S start_POSTSUBSCRIPT soft , italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_S start_POSTSUBSCRIPT soft , italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (48)

where 𝒂=Ψ𝒙𝒂superscriptΨ𝒙\bm{a}=\Psi^{\dagger}\bm{x}bold_italic_a = roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_x and we have applied element-wise the soft-thresholding function

Ssoft,βth(ai)={0,if|ai|βth,ai|ai|(|ai|βth),otherwise.subscript𝑆softsubscript𝛽thsubscript𝑎𝑖cases0ifsubscript𝑎𝑖subscript𝛽thsubscript𝑎𝑖subscript𝑎𝑖subscript𝑎𝑖subscript𝛽thotherwiseS_{\text{soft},\,\beta_{\text{th}}}(a_{i})=\begin{cases}0,&\text{if}\ |a_{i}|% \leq\beta_{\text{th}},\\ \frac{a_{i}}{|a_{i}|}\left(|a_{i}|-\beta_{\text{th}}\right),&\text{otherwise}.% \end{cases}italic_S start_POSTSUBSCRIPT soft , italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ROW start_CELL 0 , end_CELL start_CELL if | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG ( | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | - italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT ) , end_CELL start_CELL otherwise . end_CELL end_ROW (49)

The threshold βthsubscript𝛽th\beta_{\text{th}}italic_β start_POSTSUBSCRIPT th end_POSTSUBSCRIPT used in practice is λγ𝜆𝛾\lambda\gammaitalic_λ italic_γ, the product of the regularisation strength and the parameter of the MY approximation. See Cai et al. (2018a) for more details on sampling the model with a wavelet-based regularisation. In practice, we again rely on the SK-ROCK algorithm for sampling and avoid using an MH step for the reasons mentioned above.

5 Experimental results

In this section, we demonstrate the QuantifAI model against the wavelet-based model presented in Cai et al. (2018a, b) as it is one of the few RI imaging methods providing UQ capabilities. We use a simulated setup with real reconstructed RI images considered as the ground truth. We focus on the UQ capabilities of the methods, while also considering reconstruction performance.

5.1 Dataset

The base images used in our experiment are the ones from Cai et al. (2018a): (i) the HI region of the M31 galaxy in Figure 1 (a), (ii) the W28 supernova remnant in Figure 2 (a), (iii) the 3C288 radio galaxy in Figure 3 (a), and (iv) the Cygnus A radio galaxy in Figure 4 (a). All the images are 256×256256256256\times 256256 × 256 pixels, except for the Cygnus A galaxy, which is 256×512256512256\times 512256 × 512. As the ground truth images are reconstructed from real observations, some minor original residuals and backgrounds are more noticeable in the log scale images; for example, see Figure 3 (b). The ground truth images’ values have been normalized to a unitless range between 00 and 1111, and therefore, the colour bars in the reconstruction figures follow this range.

The previous images correspond to 𝒙𝒙\bm{x}bold_italic_x in our observational model from Equation 3. The complex Gaussian noise 𝒏M𝒏superscript𝑀\bm{n}\in\mathbb{C}^{M}bold_italic_n ∈ roman_ℂ start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is composed of independent and identically distributed (i.i.d.) samples. Each sample is simulated following a complex Gaussian distribution, ni𝒩𝒞(0,σ2)similar-tosubscript𝑛𝑖subscript𝒩𝒞0superscript𝜎2n_{i}\sim\mathcal{N}_{\mathcal{C}}(0,\sigma^{2})italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), which implies that Re(𝒏),Im(𝒏)𝒩(0,σ/2)similar-toRe𝒏Im𝒏𝒩0𝜎2\text{Re}(\bm{n}),\text{Im}(\bm{n})\sim\mathcal{N}(0,\sigma/\sqrt{2})Re ( bold_italic_n ) , Im ( bold_italic_n ) ∼ caligraphic_N ( 0 , italic_σ / square-root start_ARG 2 end_ARG ) (Tse & Viswanath, 2005). The noise is set such that we get a specific input signal-to-noise ratio (ISNR) on each image. For all the experiments, we set the ISNR to 30303030dB, and the noise standard deviation is computed as follows

σ=𝚽𝒙2M 10ISNR/20.𝜎subscriptnorm𝚽𝒙2𝑀superscript10ISNR20\sigma=\frac{\|\bm{\Phi}\bm{x}\|_{2}}{\sqrt{M}}\,10^{-\text{ISNR}/20}\,.italic_σ = divide start_ARG ∥ bold_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_M end_ARG end_ARG 10 start_POSTSUPERSCRIPT - ISNR / 20 end_POSTSUPERSCRIPT . (50)

To mimic the uv𝑢𝑣uvitalic_u italic_v plane coverage, we reuse the Fourier mask from Cai et al. (2018a, Fig. 2) and use it to generate the visibilities from 𝒚𝒚\bm{y}bold_italic_y. The variable sampling density profile was taken from Puy et al. (2011) and represents a 10%percent1010\%10 % coverage of the Fourier plane. In the experiments in the current section, we work with gridded visibilities where we have around 1.3×1041.3superscript1041.3\times 10^{4}1.3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT visibilities for Cygnus A and 6.5×1036.5superscript1036.5\times 10^{3}6.5 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT visibilities for the rest of the images. The validation of the UQ techniques through MCMC sampling requires a large amount of iterations. The use of gridded visibilities allows us to base the forward operator 𝚽𝚽\mathbf{\Phi}bold_Φ from Equation 3 on the FFT (Cooley & Tukey, 1965), hel** to alleviate the computational burden of the validation. Section 6 presents results with ungridded visibilities.

5.2 Models and experiment settings

5.2.1 RI imaging models

The CRR-NN in the QuantifAI model is a pretrained model with t=5𝑡5t=5italic_t = 5, Gaussian white noise with standard deviation σ=5𝜎5\sigma=5italic_σ = 5, and parameters μ=20𝜇20\mu=20italic_μ = 20 , λ=5×104𝜆5superscript104\lambda=5\times 10^{4}italic_λ = 5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. The model was trained on a set of natural images from the BSD500 dataset (Arbeláez et al., 2011) cointaining images of landscapes, people, animals and objects among others. The images are scaled to the [0,255]0255[0,255][ 0 , 255 ] range, using 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm as the loss function in Equation 22 with the Adam optimiser (Kingma & Ba, 2017). The training parameters followed Goujon et al. (2023b, §VI.A).

The wavelet dictionary ΨΨ\Psiroman_Ψ used in the wavelet-based model is composed of Daubechies 8888 wavelets (Daubechies, 1992) with a multiresolution level J=4𝐽4J=4italic_J = 4 following the setup from Cai et al. (2018a, b). The regularisation parameter λWAVsubscript𝜆WAV\lambda_{\text{\scriptsize WAV}}italic_λ start_POSTSUBSCRIPT WAV end_POSTSUBSCRIPT was set to 1×1021superscript1021\times 10^{2}1 × 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

The regularisation strengths of both models, λ𝜆\lambdaitalic_λ and λWAVsubscript𝜆WAV\lambda_{\text{\scriptsize WAV}}italic_λ start_POSTSUBSCRIPT WAV end_POSTSUBSCRIPT, were set to maximise the MAP reconstruction SNR using a grid search. We observed that QuantifAI’s reconstruction SNR is significantly more robust with respect to the choice of regularisation strength than the wavelet-based models.

5.2.2 Optimisation settings

For QuantifAI, we used the optimisation algorithm shown in Algorithm 1. The wavelet-based model also requires a proximal algorithm due to its non-smooth component and to provide a fair comparison we used the FISTA algorithm (Beck & Teboulle, 2009) presented with more detail in the Appendix A. In these experiments, we assumed that the noise level σ𝜎\sigmaitalic_σ is known. If the noise level is unknown, it may be estimated by standard techniques (Price et al., 2021b). Both algorithms’ tolerance criterion ξ𝜉\xiitalic_ξ was set to 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and the total number of iterations to 1.5×1041.5superscript1041.5\times 10^{4}1.5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. Nevertheless, both optimisation algorithms converged before the total number of iterations was reached.

5.2.3 MCMC sampling settings

We generate 5×1045superscript1045\times 10^{4}5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT samples from each posterior distribution, with 5×1045superscript1045\times 10^{4}5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT burn-in iterations and a thinning factor of 10101010. The burn-in iterations consist of generating several samples that will be discarded so that the chain passes its transient period. The thinning factor corresponds to the number of samples that need to be generated between two samples so that they can be considered independently drawn from the target distribution. The sampling algorithm produced a total of 5.5×1055.5superscript1055.5\times 10^{5}5.5 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT samples for each model. We have set to 10101010 the number of stages for the SK-ROCK algorithm (Pereyra et al., 2020), which is one of its main hyperparameters. The sampling of the posterior probability distributions is used as a validation, and therefore we set the sampling parameters focusing on good reconstructions and posterior samples rather than speed.

The wavelet-based model requires the MY envelope approximation to guarantee the chain’s convergence, as described in Section 2.5.3 and Section 4.3. The MY approximation parameter γ𝛾\gammaitalic_γ was set to the inverse of the likelihood gradient’s Lipschitz constant, c.f. the first term of Equation 28.

The choice of the step sizes is critical to ensure the chains’ convergence to the target distribution in a reasonable amount of time. The step size is chosen as a function of each posterior gradient’s Lipschitz constant. The step sizes δQsubscript𝛿Q\delta_{\textsc{Q}}italic_δ start_POSTSUBSCRIPT Q end_POSTSUBSCRIPT and δWsubscript𝛿W\delta_{\text{W}}italic_δ start_POSTSUBSCRIPT W end_POSTSUBSCRIPT, corresponding to the QuantifAI and wavelet-based models, respectively, are computed as follows

δQ=κQLlikelihood+Lprior-CRR-NN,δW=κWLlikelihood+γ1,formulae-sequencesubscript𝛿Qsubscript𝜅Qsubscript𝐿likelihoodsubscript𝐿prior-CRR-NNsubscript𝛿Wsubscript𝜅Wsubscript𝐿likelihoodsuperscript𝛾1\delta_{\textsc{Q}}=\frac{\kappa_{\textsc{Q}}}{L_{\text{likelihood}}+L_{\text{% prior-CRR-NN}}}\,,\quad\delta_{\text{W}}=\frac{\kappa_{\text{W}}}{L_{\text{% likelihood}}+\gamma^{-1}}\,,italic_δ start_POSTSUBSCRIPT Q end_POSTSUBSCRIPT = divide start_ARG italic_κ start_POSTSUBSCRIPT Q end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUBSCRIPT likelihood end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT prior-CRR-NN end_POSTSUBSCRIPT end_ARG , italic_δ start_POSTSUBSCRIPT W end_POSTSUBSCRIPT = divide start_ARG italic_κ start_POSTSUBSCRIPT W end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUBSCRIPT likelihood end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG , (51)

where the Lipschitz constant bounds are shown in Equation 28, and κQsubscript𝜅Q\kappa_{\textsc{Q}}italic_κ start_POSTSUBSCRIPT Q end_POSTSUBSCRIPT and κWsubscript𝜅W\kappa_{\text{W}}italic_κ start_POSTSUBSCRIPT W end_POSTSUBSCRIPT, are two positive constants smaller than one, here set to 0.980.980.980.98. We have followed the advise from Durmus et al. (2018); Cai et al. (2018a) to set the sampling parameters.

5.2.4 UQ settings

We set α=0.01𝛼0.01\alpha=0.01italic_α = 0.01 in all the UQ methods, so the confidence level is 99%percent9999\%99 %. We used the bisection algorithm to compute the LCIs and the fast pixel UQ at different scales, with tolerance 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and maximum number of iterations 200200200200, for both models. We used the same wavelet dictionary as in the wavelet-based model for the fast pixel UQ at different scales.

The inpainting algorithm uses the same stop** criterion as Algorithm 1. In this case, the tolerance is set to 5×1065superscript1065\times 10^{-6}5 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, and the total number of iterations to 1.5×1041.5superscript1041.5\times 10^{4}1.5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. The CRR-NN used for the inpainting is the same one used in the QuantifAI model.

The Gaussian blurring kernel G(0,Σ)𝐺0ΣG(0,\Sigma)italic_G ( 0 , roman_Σ ) from Equation 35 is set using Σ=σG2I2×2Σsuperscriptsubscript𝜎𝐺2subscript𝐼22\Sigma=\sigma_{G}^{2}I_{2\times 2}roman_Σ = italic_σ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT, with σGsubscript𝜎𝐺\sigma_{G}italic_σ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT being 3.53.53.53.5 pixels and a truncation radius of 7777 pixels, giving a kernel G15×15𝐺superscript1515G\in\mathbb{R}^{15\times 15}italic_G ∈ roman_ℝ start_POSTSUPERSCRIPT 15 × 15 end_POSTSUPERSCRIPT.

5.3 Image reconstruction

Table 1: Reconstruction performance of the different point estimates for the dataset images in terms of SNR with respect to the ground truth. We compare the MAP and the MMSE reconstruction of the wavelet-based and the QuantifAI model. We include the dirty reconstruction as a reference. We observe that the MAP estimation from QuantifAI outperforms the other reconstructions from the wavelet-based prior and all the MMSE estimations.
Images Reconstruction SNR [dB]
Dirty Wavelet-based prior QuantifAI
MMSE MAP MMSE MAP
W28 3.393.393.393.39 18.1718.1718.1718.17 23.0423.0423.0423.04 23.3823.3823.3823.38 26.8526.85\bf 26.85bold_26.85
M31 5.015.015.015.01 23.7823.7823.7823.78 25.5225.5225.5225.52 24.6124.6124.6124.61 27.4827.48\bf 27.48bold_27.48
3C288 7.027.027.027.02 14.3114.3114.3114.31 14.1514.1514.1514.15 23.2323.2323.2323.23 24.1024.10\bf 24.10bold_24.10
Cygnus A 4.604.604.604.60 20.5220.5220.5220.52 17.5317.5317.5317.53 25.3625.3625.3625.36 30.2530.25\bf 30.25bold_30.25

We present the RI image reconstructions of the four ground truth test images in Figure 1, Figure 2, Figure 3 and Figure 4. In each figure, we compare the wavelet-based and QuantifAI models, and we include the dirty reconstruction as a reference. The metric used to compare the RI image reconstruction is the SNR expressed in dB defined as follows

SNR(𝒙,𝒙gt)=20log10(𝒙gt𝒙2𝒙gt2),SNR𝒙subscript𝒙gt20subscript10subscriptnormsubscript𝒙gt𝒙2subscriptnormsubscript𝒙gt2\text{SNR}(\bm{x},\bm{x}_{\text{gt}})=-20\log_{10}\left(\frac{\|\bm{x}_{\text{% gt}}-\bm{x}\|_{2}}{\|\bm{x}_{\text{gt}}\|_{2}}\right)\,,SNR ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT gt end_POSTSUBSCRIPT ) = - 20 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT gt end_POSTSUBSCRIPT - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT gt end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) , (52)

where 𝒙gtsubscript𝒙gt\bm{x}_{\text{gt}}bold_italic_x start_POSTSUBSCRIPT gt end_POSTSUBSCRIPT corresponds to the reference or ground truth, and 𝒙𝒙\bm{x}bold_italic_x to the estimation, and 2\|\cdot\|_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the usual 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm.

The quantitative reconstruction performance results are presented in Table 1. The MAP reconstruction from QuantifAI performs significantly better than the wavelet-based counterpart in every image from our dataset. The performance gain lies between 1.91.91.91.9dB and 12.712.712.712.7dB, with an average gain of 7777dB. It is difficult to see the QuantifAI improvements by eye when inspecting reconstructed images. However, when observing the errors in the fourth column, the improved quality of QuantifAI’s reconstructions becomes evident. Shifting towards the sampling results, we observe a similar behaviour of the MMSE reconstruction in favour of QuantifAI’s images. The MAP is considerably faster than the MMSE, relying on optimisation rather than posterior sampling. Recall that the MMSE is built as averaging posterior samples. In addition, the MAP consistently provides improved reconstruction performance with respect to the MMSE.

The posterior standard deviation provides a qualitative way to validate the posterior model and its uncertainties. The comparison of the posterior standard deviation with the MAP reconstruction error shows a higher correlation for the QuantifAI model than the wavelet-based model. In addition, the posterior standard deviation of QuantifAI shows lower variance than its wavelet-based counterpart, which is in agreement with QuantifAI’s smaller reconstruction error. For example, in image W28 in Figure 2, we observe in subfigure (2(j)) that the posterior standard deviation value is large near the edges of the ground truth image. It is reassuring that QuantifAI’s reconstruction error also shows the same behaviour.

The performance results showcase the expressive power of the CRR-NN-based prior even if the regulariser is constrained to be convex. The results also confirm the generalisation power of the CRR-NN-based prior. Even if trained on natural images, the CRR-NN can provide remarkable reconstruction performances for astronomical images and meaningful posterior standard deviations.

The reconstructions using the wavelet-based prior model do exhibit some low-intensity artefacts in the Cygnus A and 3C288 images, as shown in Figures 4(h) and 3(h). These artefacts are due to the patterns in the ground truth images, which originate from real observations, and the thresholding of the orthogonal wavelet basis. Such patterns are absent in the M31 and W28 images because the noise was removed in a preprocessing step, as seen in Figures 1(b) and 2(b) in comparison to Figures 4(b) and 3(b). The regularisation strength for the wavelet-prior model was selected to maximise the reconstruction SNR. This chosen value is lower than in Cai et al. (2018b), explaining the observed patterns and the finer details in our reconstructions. Employing a wavelet dictionary instead of an orthogonal wavelet basis and adding a positivity-enforcing constraint could mitigate the appearance of these artefacts.

Refer to caption
(a) Ground truth (linear scale)
Refer to caption
(b) Ground truth
Refer to caption
(c) Dirty reconstruction
Refer to caption
(d) Dirty reconstruction error
Refer to caption

Wavelet-based prior

(e) MMSE reconstruction
Refer to caption
(f) Posterior standard deviation
Refer to caption
(g) MAP reconstruction
Refer to caption
(h) MAP reconstruction error
Refer to caption

QuantifAI

(i) MMSE reconstruction
Refer to caption
(j) Posterior standard deviation
Refer to caption
(k) MAP reconstruction
Refer to caption
(l) MAP reconstruction error
Figure 1: RI image reconstructions for M31. The images are shown in a log10subscript10\log_{10}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT scale except for subfigure (a). Top row: The first two images show the ground truth intensity image in linear and log10subscript10\log_{10}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT scales, respectively. The third image shows the dirty reconstruction, computed by applying the pseudo-inverse of the measurement operator on the observations. The fourth image shows the error of the dirty reconstruction with respect to the ground truth. Middle row: We show the results of the wavelet-based model. The first and second columns show the minimum mean squared error (MMSE) estimator and the posterior standard deviation, respectively. Both images are computed using the 5×1045superscript1045\times 10^{4}5 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT generated posterior samples. The third column shows the MAP reconstruction obtained through an optimisation algorithm. The fourth column depicts the error of the MAP reconstruction with respect to the ground truth. Bottom row: We present the results of the QuantifAI model. The columns are presented in the same order as for the Wavelet reconstructions in the middle row. For every reconstruction, we display the SNR with respect to the ground truth in the top left corner. Compared with the wavelet-based model, QuantifAI recovers a reconstruction with a higher SNR and shows more meaningful uncertainties, which can be seen by comparing the posterior standard deviation and the MAP reconstruction error.
Refer to caption
(a) Ground truth (linear scale)
Refer to caption
(b) Ground truth
Refer to caption
(c) Dirty reconstruction
Refer to caption
(d) Dirty reconstruction error
Refer to caption

Wavelet-based prior

(e) MMSE reconstruction
Refer to caption
(f) Posterior standard deviation
Refer to caption
(g) MAP reconstruction
Refer to caption
(h) MAP reconstruction error
Refer to caption

QuantifAI

(i) MMSE reconstruction
Refer to caption
(j) Posterior standard deviation
Refer to caption
(k) MAP reconstruction
Refer to caption
(l) MAP reconstruction error
Figure 2: RI image reconstructions for W28. The order of the images follows the M31 results presented in Figure 1. Compared with the wavelet-based model, QuantifAI recovers a reconstruction with a higher SNR and shows more meaningful uncertainties, which can be seen by comparing the posterior standard deviation and the MAP reconstruction error.
Refer to caption
(a) Ground truth (linear scale)
Refer to caption
(b) Ground truth
Refer to caption
(c) Dirty reconstruction
Refer to caption
(d) Dirty reconstruction error
Refer to caption

Wavelet-based prior

(e) MMSE reconstruction
Refer to caption
(f) Posterior standard deviation
Refer to caption
(g) MAP reconstruction
Refer to caption
(h) MAP reconstruction error
Refer to caption

QuantifAI

(i) MMSE reconstruction
Refer to caption
(j) Posterior standard deviation
Refer to caption
(k) MAP reconstruction
Refer to caption
(l) MAP reconstruction error
Figure 3: RI image reconstructions for 3C288. The order of the images follows the M31 results presented in Figure 1. Compared with the wavelet-based model, QuantifAI recovers a reconstruction with a higher SNR and shows more meaningful uncertainties, which can be seen by comparing the posterior standard deviation and the MAP reconstruction error.
Refer to caption
(a) Ground truth (linear scale)
Refer to caption
(b) Ground truth
Refer to caption
(c) Dirty reconstruction
Refer to caption
(d) Dirty reconstruction error
Refer to caption

Wavelet prior

(e) MMSE reconstruction
Refer to caption
(f) Posterior standard deviation
Refer to caption
(g) MAP reconstruction
Refer to caption
(h) MAP reconstruction error
Refer to caption

QuantifAI

(i) MMSE reconstruction
Refer to caption
(j) Posterior standard deviation
Refer to caption
(k) MAP reconstruction
Refer to caption
(l) MAP reconstruction error
Figure 4: RI image reconstructions for Cygnus A. The order of the images follows the M31 results presented in Figure 1. Compared with the wavelet-based model, QuantifAI recovers a reconstruction with a higher SNR and shows more meaningful uncertainties, which can be seen by comparing the posterior standard deviation and the MAP reconstruction error.

5.4 Hypothesis testing of image structure

We start by carrying out hypothesis tests of image structure, which are the most scalable UQ techniques we will study. First, a surrogate image is created by modifying one region of interest. It only takes one further evaluation of the likelihood and prior potential to carry out the hypothesis test. The test helps to quantitatively answer a scientific question with a 100(1α)%100percent1𝛼100(1-\alpha)\%100 ( 1 - italic_α ) % confidence level. The scientific question targeted depends on the constructed surrogate image, and in this work, we consider two scenarios.

Refer to caption

M31

Refer to caption
Refer to caption

W28

Refer to caption
Refer to caption

Cygnus A

Refer to caption
Refer to caption

3C288

Refer to caption
Refer to caption

3C288

Refer to caption
(a) MAP reconstruction (b) Inpainted surrogate
Figure 5: Hypothesis test of different regions of the four QuantifAI MAP reconstructions for M31, W28, Cygnus A, and 3C288. All the images are shown in log10subscript10\log_{10}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT scale. The left column shows the respective MAP reconstruction with the region of interest framed in a red rectangle. The right column shows the surrogate images inpainted using the QuantifAI prior. The first four rows show regions corresponding to physical structures present in the ground truth images. The last row corresponds to a non-physical region. Results of the hypothesis tests are summarised in Table 2.

In the first scenario, we consider a particular structure in the reconstructed intensity image. We can query whether the structure’s origin is physical or not. For example, the structure could be a reconstruction artefact or a physical process. Figure 5 shows this option, where we have analysed different regions of the four images. The first four inpainted regions correspond to physical structures, and the fifth region, i.e., region number 2222 of image 3C288, does not correspond to a physical structure. The surrogate images are produced with an inpainting algorithm using QuantifAI’s prior so that the inpainted region agrees with the prior.

Refer to caption

M31

Refer to caption
Refer to caption

W28

Refer to caption
Refer to caption

Cygnus A

Refer to caption
Refer to caption

3C288

Refer to caption
(a) MAP reconstruction (b) Blurred surrogate
Figure 6: Hypothesis test of the fine structure in the four QuantifAI MAP reconstructions for M31, W28, Cygnus A, and 3C288. All the images are shown in log10subscript10\log_{10}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT scale. The fine structure is blurred using a Gaussian kernel with a standard deviation of 3.53.53.53.5 pixels and a radius of 7777 pixels. The blurred surrogate images in the second column are constructed by blurring the MAP reconstruction shown in the first column. The hypothesis tests are done with the QuantifAI model.

The second scenario is to blur the finer structure in the reconstructed image and perform a hypothesis test to elucidate the question of whether the blurred structure is physical or not. The test is illustrated in Figure 6. In this case, all four blurred images represent physical structures.

In both cases, we compare the hypothesis test using a MAP-based approach described in this work and a sampling-based approach for validation. In the MAP-based approach, we build the HPD region in Equation 31 with the approximation in Equation 32 and use the MAP estimation as our reconstruction. In the sampling-based approach, we use the MMSE as the reconstruction, i.e., the mean of the posterior samples, and compute the threshold defining the HPD region using the quantile function on the potentials of the posterior samples following Cai et al. (2018a, §5.2).

Table 2: Hypothesis test results for the inpainted surrogates in Figure 5 using the QuantifAI model. The function (f+g)()𝑓𝑔(f+g)(\cdot)( italic_f + italic_g ) ( ⋅ ) corresponds to the combined potential of the likelihood and the prior. The reconstruction 𝒙^superscript^𝒙\hat{\bm{x}}^{*}over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represents the point estimate used in the sampling or optimisation scenarios, which are the MMSE and the MAP, respectively. The SK-ROCK method corresponds to the posterior sampling techniques. The image 𝒙^,sgtsuperscript^𝒙sgt\hat{\bm{x}}^{*,{\text{sgt}}}over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ , sgt end_POSTSUPERSCRIPT corresponds to the surrogate image, where the areas of interest shown in Figure 5 have been inpainted. The isocontours, γ^0.01subscript^𝛾0.01\hat{\gamma}_{0.01}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 0.01 end_POSTSUBSCRIPT, or thresholds, are calculated with an α𝛼\alphaitalic_α of 0.010.010.010.01 giving a credible set of 99%percent9999\%99 %. In the MAP row, the threshold is computed following the approximation in Equation 32. In the SK-ROCK row, the threshold is computed from the posterior samples following Cai et al. (2018a). The symbols  ✓  and  ✗  in the Ground truth column indicate if the inpainted region contains a physical structure from the ground truth or not, respectively. In the last column, the  ✓  indicates that the hypothesis test is conclusive. All values are scaled with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT. QuantifAI is able to correctly reject the hypothesis for all images where the tested structure is physical except for the Cygnus A image. In all cases, the MAP-based and MCMC sampling-based results agree with each other.
Images Test Ground Method Point estimate Surrogate Isocontour Hypothesis
area truth (f+g)(𝒙^)𝑓𝑔superscript^𝒙(f+g)(\hat{\bm{x}}^{*})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (f+g)(𝒙^,sgt)𝑓𝑔superscript^𝒙sgt(f+g)(\hat{\bm{x}}^{*,{\text{sgt}}})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ , sgt end_POSTSUPERSCRIPT ) γ^0.01subscript^𝛾0.01\hat{\gamma}_{0.01}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 0.01 end_POSTSUBSCRIPT test
M31 1 SK-ROCK 0.3400.3400.3400.340 1.1591.159\bf 1.159bold_1.159 0.7420.7420.7420.742
MAP 0.3100.3100.3100.310 1.1611.161\bf 1.161bold_1.161 0.9900.9900.9900.990
Cygnus A 1 SK-ROCK 0.1250.1250.1250.125 0.1820.1820.1820.182 0.8480.848\bf 0.848bold_0.848
MAP 0.1050.1050.1050.105 0.1690.1690.1690.169 1.4501.450\bf 1.450bold_1.450
W28 1 SK-ROCK 0.2220.2220.2220.222 4.6494.649\bf 4.649bold_4.649 0.6120.6120.6120.612
MAP 0.1960.1960.1960.196 4.6994.699\bf 4.699bold_4.699 0.8760.8760.8760.876
3C288 1 SK-ROCK 0.2570.2570.2570.257 1.9181.918\bf 1.918bold_1.918 0.6590.6590.6590.659
MAP 0.2290.2290.2290.229 1.9081.908\bf 1.908bold_1.908 0.9080.9080.9080.908
\cdashline2-8 2 SK-ROCK 0.2570.2570.2570.257 0.2570.2570.2570.257 0.6590.659\bf 0.659bold_0.659
MAP 0.2290.2290.2290.229 0.2290.2290.2290.229 0.9080.908\bf 0.908bold_0.908

Table 2 presents the results for the inpainting hypothesis test, where the inpainted surrogates are shown in Figure 5. The MAP- and sampling-based results are consistent in all the images studied, where the threshold computed with the posterior samples is slightly tighter than the MAP-based approximation. The hypothesis tests correctly classify the structure in images M31, W28 and 3C288, including the two cases of the latter image. The UQ methods cannot make a strong statistical statement about the structures in the Cygnus A image. In this image, where the inpainted region has a tiny physical structure, the potentials of the inpainted surrogate image rest close to the MAP and MMSE estimators. We include the hypothesis test results of the same inpainting experiment for the wavelet-based model in Appendix B.1 to provide a comparison between the models. We used the wavelet prior to inpaint the region of interest to allow for a fair comparison. All results from the wavelet-based model are in agreement with QuantifAI.

Table 3: Hypothesis test results for the blurred surrogates of Figure 6 using the QuantifAI model. The description of Figure 6 holds in this table. All values are scaled with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT. QuantifAI is able to correctly reject the hypothesis in all cases, and the MAP-based outcome agrees with its MCMC sampling-based counterpart.
Images Method Initial Surrogate Isocontour Hypothesis
(f+g)(𝒙^)𝑓𝑔superscript^𝒙(f+g)(\hat{\bm{x}}^{*})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (f+g)(𝒙^,sgt)𝑓𝑔superscript^𝒙sgt(f+g)(\hat{\bm{x}}^{*,{\text{sgt}}})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ , sgt end_POSTSUPERSCRIPT ) γ^0.01subscript^𝛾0.01\hat{\gamma}_{0.01}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 0.01 end_POSTSUBSCRIPT test
M31 SK-ROCK 0.3400.3400.3400.340 1.9051.905\bf 1.905bold_1.905 0.7420.7420.7420.742
MAP 0.3100.3100.3100.310 1.9061.906\bf 1.906bold_1.906 0.9900.9900.9900.990
Cygnus A SK-ROCK 0.1250.1250.1250.125 9.6429.642\bf 9.642bold_9.642 0.8480.8480.8480.848
MAP 0.1050.1050.1050.105 9.6439.643\bf 9.643bold_9.643 1.4501.4501.4501.450
W28 SK-ROCK 0.2220.2220.2220.222 8.3898.389\bf 8.389bold_8.389 0.6120.6120.6120.612
MAP 0.1960.1960.1960.196 8.3878.387\bf 8.387bold_8.387 0.8760.8760.8760.876
3C288 SK-ROCK 0.2570.2570.2570.257 0.9200.920\bf 0.920bold_0.920 0.6590.6590.6590.659
MAP 0.2290.2290.2290.229 0.9220.922\bf 0.922bold_0.922 0.9080.9080.9080.908

The results from the blurred surrogates of Figure 6 are presented in Table 3. In all the images, the hypothesis test concludes that the blurred fine structure is physical as the potential falls out of the HPD region. The MAP- and sampling-based results are consistent with each other.

The different hypothesis tests have shown consistent results between the sampling-based and highly scalable MAP-based results. In addition, the results from the hypothesis tests are coherent between the QuantifAI and wavelet-based model. We remark that the approach based on the MAP requires one further measurement operator evaluation to carry out the hypothesis test. The test provides a highly scalable way to answer scientific questions about the uncertainty of the RI imaging reconstructions.

5.5 Local credible intervals

We have exploited the approximation of the HPD region from Section 4.1 based on the MAP estimations and a credible level of 99%percent9999\%99 %. The approximate HPD regions were then used to compute the LCIs, whose lengths per pixel are visualized as an image, c.f. Figure 7. The LCI lengths are displayed after subtracting the mean LCI length overall superpixels in the image, which is shown in the top left corner of the image. The UQ results for QuantifAI are presented for two superpixel sizes, 4×4444\times 44 × 4 and 8×8888\times 88 × 8. We have omitted LCIs from the wavelet-based prior for conciseness. The posterior standard deviations at the two superpixel sizes are included for comparison with the significantly faster MAP-based UQ technique of the LCIs. We find a reasonable agreement between the structure in the LCI plots and the posterior standard deviation. For example, the 3C288 image with superpixel size 8×8888\times 88 × 8 yields tighter LCIs in the two elliptical regions and in the small connecting structure in the centre of the image. The corresponding posterior standard deviation is smaller in the aforementioned regions, which is expected as most of the observed signal concentrates there. The LCIs and the posterior standard deviation represent different quantiles, so we would not expect an exact agreement even without any approximation in the computation of the LCIs.

We observe, as expected, that the larger superpixels have tighter LCIs, as seen in the mean LCIs shown on the top left corner of the subfigures in Figure 7. The reconstructions are naturally less uncertain on the larger scales due to the properties of our measurement operator, as the visibilities are generally concentrated towards the low frequencies. In addition, varying the value of a larger superpixel saturates the HPD region faster than for a small superpixel. We have also computed the LCIs for the superpixels of size 16×16161616\times 1616 × 16, which we have not included for conciseness. The corresponding mean LCI values are 0.200.200.200.20, 0.080.080.080.08, 0.240.240.240.24, and 0.070.070.070.07 for the images in the same order as in Figure 7.

When comparing the mean value of the LCIs from the four reconstructions from Figure 7 we notice that two of them, M31 and 3C288, have higher uncertainty than the rest. The higher the uncertainty, the larger the mean value of the LCI gets, as the superpixel values need larger changes before they saturate the HPD region. Image 3C288, with a superpixel size of 4×4444\times 44 × 4, is an example where the LCIs have saturated as the mean is close to unity333n.b. The images are scaled in the range [0,1]01[0,1][ 0 , 1 ].; therefore, the LCI image’s detailed structure is lost due to the saturation. This saturation highlights the need for superpixel sizes to be selected appropriately, depending on the case at hand.

M31 W28 3C288 Cygnus A
Refer to caption

MAP estimation

Refer to caption Refer to caption Refer to caption
Superpixel size: 4×4444\times 44 × 4
Refer to caption

LCI<LCI>LCIexpectationLCI\mathrm{LCI}-<\mathrm{LCI}>roman_LCI - < roman_LCI >

Refer to caption Refer to caption Refer to caption
Refer to caption

Posterior standard deviation

Refer to caption Refer to caption Refer to caption
Superpixel size: 8×8888\times 88 × 8
Refer to caption

LCI<LCI>LCIexpectationLCI\mathrm{LCI}-<\mathrm{LCI}>roman_LCI - < roman_LCI >

Refer to caption Refer to caption Refer to caption
Refer to caption

Posterior standard deviation

Refer to caption Refer to caption Refer to caption
Figure 7: Length of the local credible intervals (LCIs), cf. Bayesian error bars, computed with a 99%percent9999\%99 % credible level using superpixel sizes of 4×4444\times 44 × 4 and 8×8888\times 88 × 8. Each column represents one of the four images in our dataset. The first row shows the MAP estimation of each image at its original resolution. The second row displays the variation of the LCIs around their mean, recorded in a box in the upper left corners. This display choice allows us to visualise the structure of the LCIs better while kee** the LCIs mean information. The third row presents the posterior standard deviation computed with the same superpixel size. The fourth and fifth rows present the equivalent information for the superpixel size of 8×8888\times 88 × 8. There is reasonable agreement between the uncertainty captured by the LCI and the posterior standard deviation.

5.6 Fast pixel uncertainty quantification at different scales

The fast pixel UQ method results for the images M31 and W28 are reported in Figure 8. We use the error between the MAP estimation and the ground truth image, i.e., true error, to validate the predicted uncertainty of the fast UQ method. The true error at different scales can be computed following Equation 45,

𝒙^GT,j=l=0,ljJ𝚿l𝒂GT,l+𝚿j𝒂^MAP,j,subscript^𝒙GT𝑗superscriptsubscript𝑙0𝑙𝑗𝐽subscript𝚿𝑙subscript𝒂GT𝑙subscript𝚿𝑗subscript^𝒂MAP𝑗\hat{\bm{x}}_{\text{GT},\,j}=\sum_{\begin{subarray}{c}l=0,\\ l\neq j\end{subarray}}^{J}\mathbf{\Psi}_{l}\,\bm{a}_{\text{GT},l}+\mathbf{\Psi% }_{j}\hat{\bm{a}}_{\text{MAP},\,j}\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT GT , italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_l = 0 , end_CELL end_ROW start_ROW start_CELL italic_l ≠ italic_j end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT GT , italic_l end_POSTSUBSCRIPT + bold_Ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT MAP , italic_j end_POSTSUBSCRIPT , (53)

where 𝒂GT,lsubscript𝒂GT𝑙\bm{a}_{\text{GT},l}bold_italic_a start_POSTSUBSCRIPT GT , italic_l end_POSTSUBSCRIPT are the wavelet decomposition coefficients of the ground truth image at multi-resolution level l𝑙litalic_l. We have replaced the ground truth image’s wavelet coefficient at a single level with the coefficients from the MAP reconstruction.

We observe a good agreement between the predicted and ground truth errors at the different multi-resolution levels. There is an overestimation of the errors, which can come from two sources. First, the approximation of the HPD region is conservative, as it has been discussed in Pereyra (2017). Second, the MAP estimation is already missing some of the fine or high-frequency structures in the ground truth images. This fact can be seen in the MAP reconstruction errors in subfigures 1(h) and 2(h). The missing high-frequency structure is expected due to the properties of the measurement operator discussed in Section 2.1.

The structure from the chosen wavelet representation, 𝚿𝚿\mathbf{\Psi}bold_Ψ, underpinning the UQ method can be observed in the predicted errors. This structure is visible mainly in the higher frequencies of the W28, where point sources are in the image. The wavelet structure should be taken into account when analysing the reconstruction errors.

This fast pixel UQ method allows us to approximate the reconstruction errors made at different scales for a fraction of the computational cost of the LCI pixel UQ method. The evaluations of the measurement operators are reduced by three orders of magnitude, resulting in an ultra-fast and truly scalable pixel UQ method. Furthermore, a single nonlinear equation solve, e.g. root finding problem, of the new pixel UQ method suffices to predict the errors at all scales, while with LCIs, we are required to repeat the process for each superpixel size.

M31 W28
Refer to caption Refer to caption Refer to caption Refer to caption
(a) Thresholded MAP (b) MAP estimation (c) Thresholded MAP (d) MAP estimation
Errors Errors
Predicted True errors Predicted True errors
Refer to caption

Level 4444

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 3333

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 2222

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 1111

Refer to caption Refer to caption Refer to caption
Refer to caption

All levels

Refer to caption Refer to caption Refer to caption
Figure 8: Fast pixel uncertainty quantification (UQ) with the QuantifAI model on the images M31 and W28. The first two columns correspond to M31, while the last two columns to W28. The first row displays the pairs of the thresholded MAP reconstruction that saturates the HPD region versus the original MAP reconstruction. The following rows compare the predicted error of the thresholded MAP computed with the fast pixel UQ method against the MAP reconstruction error using ground truth images at each wavelet scale. The last row shows the cumulative error when considering all scales.

5.7 Computation time

The computation wall-clock time for both models, QuantifAI and the wavelet-based, are summarised in Table 4. We include the results for only the W28 image in Table 4 and 5 as they are representative of the other images. All the computations for both models were done using an Nvidia-A100 40GB GPU using Pytorch (Paszke et al., 2019). We observe a lower computation time for the QuantifAI model with respect to its wavelet-based counterpart. One reason is the lightweight CRR-NN model that quickly evaluates its gradient and potential. Note that the regularisation strength has an impact on the number of iterations and it could be changed to favour a faster convergence. The regularisation strength was chosen to optimize MAP reconstruction quality.

The results shown in Table 4 highlight the importance of relying on optimisation-based rather than sampling-based reconstructions when focusing on the scalability of the method. There is a difference of approximately four orders of magnitude in the computation time of the MAP and the MMSE which relies on MCMC sampling techniques. Focusing on UQ, the posterior sampling is 60606060 times slower than the computation of the LCIs with 8×8888\times 88 × 8 superpixels and more than 37500375003750037500 times slower than the fast pixel UQ proposed in this work. The new fast pixel UQ provides an extremely rapid approach to providing pixel-based UQ, over 630630630630 times faster than the 8×8888\times 88 × 8 LCIs.

The evaluation of the measurement operator is the most time-consuming operation in a real large-scale RI imaging scenario. If we target scalability, we need to monitor the number of measurement operator evaluations. Table 5 summarises the number of measurement operator evaluations required for the UQ techniques. The results are only shown for the QuantifAI model as they are representative of the wavelet-based model. As mentioned before, we note the reduction of evaluations between optimisation and sampling-based reconstructions. We remark on the reduction in the number of evaluations for the UQ tasks, approximately 3333 orders of magnitude between the sampling and the LCIs, and 3333 subsequent orders of magnitude between the LCIs and the fast pixel UQ. These results make the fast pixel UQ 6666 orders of magnitude faster than MCMC sampling. The MAP estimation for the CRR required 1082108210821082 measurement operator evaluations. However, the algorithm’s settings were chosen to maximise the reconstruction SNR. By modifying the regularisation parameter of the CRR-based prior, we can reduce the number of evaluations by an order of magnitude. Recent developments in code parallelisation for RI imaging reconstruction algorithms444https://github.com/astro-informatics/purify and https://github.com/astro-informatics/sopt (Pratley et al., 2019a; Pratley & McEwen, 2019) could be integrated to push the scalability of the method further.

Table 4: Computation wall-clock times for the W28 image in seconds for both models being compared.
Models MAP Posterior LCIs Fast
optim. sampling 8×8888\times 88 × 8 pixel UQ
Wavelet-based 0.940.940.940.94 36.0×10336.0superscript10336.0\times 10^{3}36.0 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 149.7149.7149.7149.7
QuantifAI 0.640.640.640.64 6.44×1036.44superscript1036.44\times 10^{3}6.44 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 108.2108.2108.2108.2 0.170.170.170.17
Table 5: The number of measurement operator evaluations used by the QuantifAI for the W28 image. We do not distinguish between the measurement operator and its adjoint. Therefore, evaluating the log-likelihood gradient counts as two evaluations of the measurement operator. The fast pixel UQ is three and six orders of magnitude faster than the MCMC sampling and LCIs, respectively.
MCMC LCIs LCIs Fast
sampling 8×8888\times 88 × 8 16×16161616\times 1616 × 16 pixel UQ
11×10611superscript10611\times 10^{6}11 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81.5×10381.5superscript10381.5\times 10^{3}81.5 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 21.2×10321.2superscript10321.2\times 10^{3}21.2 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 28282828

6 More realistic experiment

In the previous section, we have validated the proposed UQ methods from QuantifAI in a simple setting with gridded visibilities. This choice lets us use the FFT algorithm for the forward model, which allows us to run MCMC algorithms for UQ validation in a sensible amount of time. In this section, we showcase QuantifAI in a more realistic experiment using simulated visibility patterns from the MeerKAT radio telescope (Jonas & MeerKAT Team, 2016). The main difference is that the visibilities are realistic and ungridded, which obliges us to rely on the Non-Uniform FFT (NUFFT) for the forward model.

6.1 Dataset and experiment settings

Refer to caption
(a) 1 hour of synthesis time
Refer to caption
(b) 2 hours of synthesis time
Refer to caption
(c) 4 hours of synthesis time
Refer to caption
(d) 8 hours of synthesis time
Figure 9: The four sets of simulated ungridded visibilities for the MeerKAT radio telescope with synthesis times of 1111, 2222, 4444 and 8888 hours.

We have simulated four single-frequency ungridded visibility patterns of differing synthesis times for MeerKAT. The start frequency is set to 1400140014001400MHz with a channel width of 10101010MHz. The pointing position has been randomly selected and set to (13(13( 13h18181818m54.8654.8654.8654.86s, 1515-15- 15d36363636m04.2504.2504.2504.25s)))) in the J2000200020002000 reference, and it was maintained for the four generated datasets. We have used a publicly available code555https://github.com/ratt-ru/simms that is based on the CASA simulation software (The CASA Team et al., 2022). The synthesis times used are 1111, 2222, 4444, and 8888 hours, with a constant integration time of 240240240240s. Each dataset has a field of view of approximately 1111 deg2. The number of visibilities of each dataset is 3×1043superscript1043\times 10^{4}3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, 6×1046superscript1046\times 10^{4}6 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, 1.2×1051.2superscript1051.2\times 10^{5}1.2 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT, and 2.4×1052.4superscript1052.4\times 10^{5}2.4 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT, correspondingly. Figure 9 presents the simulated visibility patterns. We reuse the images described in Section 5.1 as the ground truth kee** their original dimensions.

To cope with the ungridded visibilities, we have to change the forward operator 𝚽𝚽\mathbf{\Phi}bold_Φ from Equation 3 that before was based on the FFT. We rely on the pytorch-based NUFFT implementation from Muckley et al. (2020)666https://github.com/mmuckley/torchkbnufft based on Kaiser-Bessel gridding (Fessler & Sutton, 2003). We have used the same images, Gaussian noise model presented in Section 5.1, and trained CRR-NN in QuantifAI. We have reused the previously introduced hyperparameter values of QuantifAI except for λ𝜆\lambdaitalic_λ, which we have tuned to maximise the reconstruction’s SNR for the four datasets to 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, 1.4×1041.4superscript1041.4\times 10^{4}1.4 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, 1.9×1041.9superscript1041.9\times 10^{4}1.9 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, and 2.25×1042.25superscript1042.25\times 10^{4}2.25 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, respectively.

6.2 Results

1h 2h 4h 8h
Refer to caption

Dirty rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

MAP rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

Oracle error

Refer to caption Refer to caption Refer to caption
Predicted error
Refer to caption

Level 4444

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 3333

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 2222

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 1111

Refer to caption Refer to caption Refer to caption
Figure 10: Reconstructions and fast pixel uncertainty quantification (UQ) with the QuantifAI model for the M31 image with the four sets of simulated MeerKAT ungridded visibilities. Each column corresponds to the four datasets with synthesis times of 1111, 2222, 4444 and 8888 hours. The first row represents the dirty reconstruction. The MAP reconstruction is presented in the second row, while the oracle error, which we do not have access to with real data, is shown in the third row. The different decomposition levels of pixel UQ are shown in the last four rows.
Table 6: Main results of QuantifAI for the M31 image with the realistic ungridded MeerKAT visibility patterns with differing synthesis times. As the number of visibilities grows with the synthesis timer, so does the reconstruction SNR. The number of visibilities increases proportionally to the synthesis times.
Metrics Datasets
1h 2h 4h 8h
Number of visibilities 3×1043superscript1043\times 10^{4}3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 6×1046superscript1046\times 10^{4}6 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1.2×1051.2superscript1051.2\times 10^{5}1.2 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 2.4×1052.4superscript1052.4\times 10^{5}2.4 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
MAP reconstrucion SNR [dB] 25.2925.2925.2925.29 28.3928.3928.3928.39 31.9431.9431.9431.94 34.4234.4234.4234.42
Reconstruction Measurement op. evaluations 3288328832883288 2916291629162916 3006300630063006 3114311431143114
Wall-clock time [s] 17.0117.0117.0117.01 28.1928.1928.1928.19 53.2453.2453.2453.24 105.25105.25105.25105.25
UQ Measurement op. evaluations 26262626 28282828 30303030 30303030
Wall-clock time [s] 0.280.280.280.28 0.440.440.440.44 0.730.730.730.73 1.261.261.261.26

The reconstructions and the fast pixel UQ maps for the image M31 are shown in Figure 10. Each column corresponds to each of the four datasets. The results for the other images are postponed to Appendix C. Table 6 presents quantitative results regarding reconstruction SNR, measurement operator evaluations and wall-clock computing time. The reconstructions present good quality in terms of SNR, which increases with longer synthesis times, which was expected as the Fourier coverage increases, as seen in Figure 9. The wall-clock reconstruction time increases with the longer synthesis times as the time for each evaluation of the measurement operator becomes longer due to the larger number of visibilities.

Even in this more challenging setting, we find a good correlation between the predicted and the oracle error. The pixel UQ maps still show some patterns characteristic of the multiscale wavelet representation used and should, therefore, be considered in the analysis of the pixel UQ maps. The fast pixel UQ represents a fraction of the time and number of measurement operator evaluations required for the MAP reconstruction.

7 Discussion

In this work, we have worked with synthetic datasets, starting with gridded visibilities and ending with realistic ungridded visibility patterns. The handling of real data is beyond the scope of this work, which focuses on the methodology presented and the validation of the UQ techniques. Some problems faced while handling large-scale observation include the calibration of direction-independent and direction-dependent effects. Further studies of the performance of QuantifAI to more realistic images with differing large dynamic ranges and bright point sources are left for the future. The extension of QuantifAI to incorporate a frequency axis in the reconstruction to cope with multi-frequency observations is also left for further study. Even if QuantifAI could reconstruct images with UQ maps with 105similar-toabsentsuperscript105\sim 10^{5}∼ 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT visibilities in less than two minutes, there is ongoing work to exploit existing C++ parallelisation capabilities to scale the method further. A detailed performance and computing time benchmark is expected for the future implementation.

The current approach to set the regularisation strength of the CRR-NN, λ𝜆\lambdaitalic_λ, with a grid search is not compatible with real data as we require access to the ground truth. There are several ways to circumvent this problem in the large-scale setting. One way forward is to consider a subset of the observations to alleviate the computational burden and rely on the empirical Bayesian approach from (Vidal et al., 2020; De Bortoli et al., 2020) to estimate the regularisation parameter directly from the observed data. Another way forward is to follow a heuristic approach similar to Terris et al. (2022). The study of the best strategy for QuantifAI is left for further study.

The fast pixel UQ proposed in this article improved the tightness of the UQ bounds with respect to the LCIs by considering the entire image when saturating the HPD region. Nevertheless, these pixel-level UQ maps are intended to be shared as visual aids accompanying the reconstructions. The pixel UQ maps can drive the astronomer’s attention to a specific region in the image where a consequent hypothesis test can be carried out using the techniques described in this work. The choice of wavelet basis impacts the fast pixel UQ maps produced, so it should be considered when analysing the maps. A way to improve the fast pixel UQ maps would be to replace the orthogonal wavelet basis used with a more performant dictionary as in SARA (Carrillo et al., 2012), which has shown to be well adapted for RI images.

8 Conclusions

In this work, we propose a new method coined QuantifAI that addresses uncertainty quantification in radio-interferometric (RI) imaging with data-driven (learned) priors in very high-dimensional settings. We have focused on three fundamental points in the RI imaging pipeline: scalability, estimation performance, and uncertainty quantification (UQ).

Our model builds upon a principled Bayesian framework for the UQ analysis, which is known to be computationally expensive when exploiting MCMC sampling methods. However, in this work, we leverage convex optimisation techniques to estimate the maximum-a-posteriori (MAP), the point estimate of the posterior distribution we use as reconstruction. We restrict our model to a log-concave posterior distribution to remain highly scalable and have Bayesian UQ techniques. This restriction is equivalent to having convex potentials for our likelihood and prior. In this scenario, we can exploit an approximation of the high posterior density (HPD) region, which only requires the MAP estimation (Pereyra, 2017) and bypasses expensive sampling techniques.

We want to include data-driven priors that can encode complex information learned implicitly from training data making them more expressive. Consequently, the learned priors allow us to improve performance with respect to previous models based on handcrafted priors (Cai et al., 2018b), e.g., wavelet-based sparsity-promoting priors. To support fast UQ techniques, our models must be convex, hence we adopt the recently introduced learnable convex-ridge regulariser neural network (CRR-NN, Goujon et al., 2023b). The CRR-NN-based prior is performant, reliable and has shown to be robust to data distribution shifts. The QuantifAI model uses an analytic physically motivated model for the likelihood and the learned CRR-NN-based prior. In this work, we are focusing on the methodology, which is why we have only considered small problems, i.e., images of 256×256256256256\times 256256 × 256. Nevertheless, QuantifAI can be integrated into the distributed frameworks (Pratley et al., 2019a; Pratley & McEwen, 2019), which is the focus of ongoing work.

Numerical experiments are conducted with four images representative of RI imaging. We compare the QuantifAI model with the model containing a wavelet-based prior of Cai et al. (2018b). Our results show a considerable improvement in the reconstruction performance for QuantifAI. We validate our results against posterior samples from MCMC sampling algorithms and compute the posterior standard deviation. We found that QuantifAI produced more meaningful posterior standard deviations in comparison to the wavelet-based model. We also included numerical experiments with simulated MeerKAT ungridded visibilities, where we present QuantifAI’s performance and computing times going up to 105similar-toabsentsuperscript105\sim 10^{5}∼ 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT visibilities.

We explore several MAP-based UQ techniques that rely on the approximate HPD region. We carry out hypothesis tests of image structure to asses if some structures observed in the reconstructions are physical. We then computed local credible intervals (LCIs c.f. Bayesian error bars) to measure the pixel-wise uncertainty. These two approaches were proposed by Cai et al. (2018b), and in this work, we validated them with MCMC posterior sampling results. Even if LCIs represent an already scalable alternative to sampling-based methods to provide pixel-wise UQ, they remain expensive for SKA-size data. Therefore, we proposed a novel pixel-wise UQ technique to approximate pixel errors at different scales that is three orders of magnitude faster than the LCIs. The new approach is based on thresholding the coefficients of a wavelet representation of the reconstruction until the HPD region saturates and is six orders of magnitude faster than sampling-based techniques.

QuantifAI proposed an approach with the potential to be highly scalable and performant to address UQ in RI imaging. In this work, we have compared QuantifAI to a wavelet-based model using numerical experiments and a variety of metrics. However, as both models rely on the Bayesian framework, we could make a Bayesian model comparison, a principled approach to model selection and determine which model the data favours. Recent developments in McEwen et al. (2023) extend the model comparison to the learnt setting, with data-driven priors. The focus of ongoing work is to implement the proposed methodology in existing RI imaging frameworks purify777https://github.com/astro-informatics/purify (Carrillo et al., 2014; Pratley et al., 2018, 2019b) and sopt888https://github.com/astro-informatics/sopt (Carrillo et al., 2012; Onose et al., 2016) to exploit massively parallelised computing environment (Pratley et al., 2019a; Pratley & McEwen, 2019) and to realise the potential of scalability. In the near future, we plan to benchmark the speed and scalability of QuantifAI in a highly realistic setting.

A new perspective is to relax the convexity constraint of the prior by exploiting the fact that the posterior potential needs to be convex (rather than the prior) and that the RI imaging likelihood is already strongly convex. The relaxation of the CRR regulariser has been studied in a very recent work (Goujon et al., 2023a), where a weakly-convex-ridge-regulariser neural network (WCRR-NN) has been proposed. If the WCRR-NN is adopted, it could further enhance the expressiveness of the regulariser and the reconstruction performance of QuantifAI in the RI imaging problem.

Acknowledgements

This work is supported by the UK Research and Innovation (UKRI) and Engineering and Physical Sciences Research Council (EPSRC) grant numbers EP/W007673/1 (LEXCI) and EP/T007346/1 (BOLT). MM is supported by the UCL Centre for Doctoral Training in Data Intensive Science (STFC Training grant ST/P006736/1).

We acknowledge the Python packages used in this work: IPython (Pérez & Granger, 2007), Jupyter (Kluyver et al., 2016), Matplotlib (Hunter, 2007), Pytorch (Paszke et al., 2019), Numpy (Harris et al., 2020), Astropy (Astropy Collaboration et al., 2022), Scikit Image (van der Walt et al., 2014).

Data Availability

We provide the QuantifAI PyTorch-based Python package in a publicly available repository999https://github.com/astro-informatics/QuantifAI. We are in favour of reproducible research, which is why the images, visibilities, scripts, trained CRR-NN model and code to reproduce this article’s experiments can be found in the aforementioned repository.

References

Appendix A MAP calculation for the wavelet-based prior model

The MAP estimation for the wavelet-based model can recasted as the following optimisation problem

𝒙^MAP=argmin𝒙N12σ2𝒚Φ𝒙22+λwavΨ𝒙1+ιN(𝒙),subscript^𝒙MAPsubscriptargmin𝒙superscript𝑁12superscript𝜎2superscriptsubscriptnorm𝒚Φ𝒙22subscript𝜆wavsubscriptnormsuperscriptΨ𝒙1subscript𝜄superscript𝑁𝒙\hat{\bm{x}}_{\text{MAP}}=\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{N}}% \frac{1}{2\sigma^{2}}\left\|\bm{y}-\Phi\bm{x}\right\|_{2}^{2}+\lambda_{\text{% wav}}\big{\|}\Psi^{\dagger}\bm{x}\big{\|}_{1}+\iota_{\mathbb{R}^{N}}(\bm{x})\,,over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - roman_Φ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT ∥ roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ι start_POSTSUBSCRIPT roman_ℝ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) , (54)

where λwavsubscript𝜆wav\lambda_{\text{wav}}italic_λ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT corresponds to the regularisation strength of the wavelet-based prior. The FISTA algorithm to estimate the MAP is presented in Algorithm 2 where we use the soft thresholding operator, soft()soft\text{soft}(\cdot)soft ( ⋅ ), defined in Equation 49. The Lipschitz constant used to define the step size can be set as Lwav=ΦΦ/σ2subscript𝐿wavnormsuperscriptΦΦsuperscript𝜎2L_{\text{wav}}=\|\Phi^{\dagger}\Phi\|/\sigma^{2}italic_L start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT = ∥ roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT roman_Φ ∥ / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

1 Input: ΨΨ\Psiroman_Ψ, ΦΦ\Phiroman_Φ, σ𝜎\sigmaitalic_σ, λwavsubscript𝜆wav\lambda_{\text{wav}}italic_λ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT, ξ𝜉\xiitalic_ξ, a(1)=1subscript𝑎11a_{(1)}=1italic_a start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT = 1, 𝒛(1)=𝒙(0)=Re(Φ𝒚)subscript𝒛1subscript𝒙0ResuperscriptΦ𝒚\bm{z}_{(1)}=\bm{x}_{(0)}=\text{Re}(\Phi^{\dagger}\bm{y})bold_italic_z start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( 0 ) end_POSTSUBSCRIPT = Re ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT bold_italic_y ), τwav=0.98/Lwavsubscript𝜏wav0.98subscript𝐿wav\tau_{\text{wav}}=0.98/L_{\text{wav}}italic_τ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT = 0.98 / italic_L start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT.
2 Output: 𝒙^MAPsubscript^𝒙MAP\hat{\bm{x}}_{\text{MAP}}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT
3 for n=1,,Nmax𝑛1subscript𝑁maxn=1,\ldots,N_{\text{max}}italic_n = 1 , … , italic_N start_POSTSUBSCRIPT max end_POSTSUBSCRIPT do
4       𝒛~(n)=𝒛(n)τwavσ2Re(Φ(Φ𝒛(n)𝒚))subscript~𝒛𝑛subscript𝒛𝑛subscript𝜏wavsuperscript𝜎2ResuperscriptΦΦsubscript𝒛𝑛𝒚\tilde{\bm{z}}_{(n)}=\bm{z}_{(n)}-\frac{\tau_{\text{wav}}}{\sigma^{2}}\text{Re% }(\Phi^{\dagger}(\Phi\bm{z}_{(n)}-\bm{y}))over~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT = bold_italic_z start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - divide start_ARG italic_τ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG Re ( roman_Φ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( roman_Φ bold_italic_z start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_y ) )
5       𝒙(n)=𝒛~(n)+Ψ(softλwavτ(Ψ𝒛~(n))Ψ𝒛~(n))subscript𝒙𝑛subscript~𝒛𝑛Ψsubscriptsoftsubscript𝜆wav𝜏superscriptΨsubscript~𝒛𝑛superscriptΨsubscript~𝒛𝑛\bm{x}_{(n)}=\tilde{\bm{z}}_{(n)}+\Psi\left(\text{soft}_{\lambda_{\text{wav}}% \tau}(\Psi^{\dagger}\tilde{\bm{z}}_{(n)})-\Psi^{\dagger}\tilde{\bm{z}}_{(n)}\right)bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT = over~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT + roman_Ψ ( soft start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT wav end_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT over~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ) - roman_Ψ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT over~ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT )
6       a(n+1)=12(1+4a(n)2+1)subscript𝑎𝑛11214superscriptsubscript𝑎𝑛21a_{(n+1)}=\frac{1}{2}(1+\sqrt{4a_{(n)}^{2}+1})italic_a start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 + square-root start_ARG 4 italic_a start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG )
7       𝒛(n+1)=𝒙(n)+a(n)1a(n+1)(𝒙(n)𝒙(n1))subscript𝒛𝑛1subscript𝒙𝑛subscript𝑎𝑛1subscript𝑎𝑛1subscript𝒙𝑛subscript𝒙𝑛1\bm{z}_{(n+1)}=\bm{x}_{(n)}+\frac{a_{(n)}-1}{a_{(n+1)}}(\bm{x}_{(n)}-\bm{x}_{(% n-1)})bold_italic_z start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT + divide start_ARG italic_a start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT ( italic_n + 1 ) end_POSTSUBSCRIPT end_ARG ( bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT )
8       if 𝐱(n)𝐱(n1)𝐱(n1)<ξnormsubscript𝐱𝑛subscript𝐱𝑛1normsubscript𝐱𝑛1𝜉\frac{\|\bm{x}_{(n)}-\bm{x}_{(n-1)}\|}{\|\bm{x}_{(n-1)}\|}<\xidivide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUBSCRIPT ∥ end_ARG < italic_ξ then
9             break
10       end if
11      
12 end for
13
set 𝒙^MAP=𝒙(n)subscript^𝒙MAPsubscript𝒙𝑛\hat{\bm{x}}_{\text{MAP}}=\bm{x}_{(n)}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT MAP end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT
Algorithm 2 FISTA (Beck & Teboulle, 2009) tackling (54)

Appendix B Wavelet-based Bayesian uncertainty quantification

B.1 Hypothesis testing of image structure

Figure 11 and Table 7 present the results of the hypothesis testing of structure for the model with the sparsity-promoting prior.

Table 7: Hypothesis test results for the inpainted surrogates from Figure 11 generated with the wavelet-based model. All values are in units 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. The description of Table 2 applies in this table.
Images Test Ground Method Initial Surrogate Isocontour Hypothesis
area truth (f+g)(𝒙^)𝑓𝑔superscript^𝒙(f+g)(\hat{\bm{x}}^{*})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (f+g)(𝒙^,sgt)𝑓𝑔superscript^𝒙sgt(f+g)(\hat{\bm{x}}^{*,{\text{sgt}}})( italic_f + italic_g ) ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ , sgt end_POSTSUPERSCRIPT ) γ^0.01subscript^𝛾0.01\hat{\gamma}_{0.01}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 0.01 end_POSTSUBSCRIPT test
M31 1 SK-ROCK 0.4480.4480.4480.448 1.3961.396\bf 1.396bold_1.396 1.1051.1051.1051.105
MAP 0.3590.3590.3590.359 1.3351.335\bf 1.335bold_1.335 1.0391.0391.0391.039
Cygnus A 1 SK-ROCK 0.4800.4800.4800.480 0.5330.5330.5330.533 1.6391.639\bf 1.639bold_1.639
MAP 0.4440.4440.4440.444 0.5140.5140.5140.514 1.7891.789\bf 1.789bold_1.789
W28 1 SK-ROCK 0.3530.3530.3530.353 5.1905.190\bf 5.190bold_5.190 0.8790.8790.8790.879
MAP 0.2840.2840.2840.284 5.2045.204\bf 5.204bold_5.204 0.9640.9640.9640.964
3C288 1 SK-ROCK 0.7290.7290.7290.729 2.4872.487\bf 2.487bold_2.487 1.3981.3981.3981.398
MAP 0.6540.6540.6540.654 2.4092.409\bf 2.409bold_2.409 1.3331.3331.3331.333
\cdashline2-8 2 SK-ROCK 0.7290.7290.7290.729 0.7290.7290.7290.729 1.3981.398\bf 1.398bold_1.398
MAP 0.6540.6540.6540.654 0.6540.6540.6540.654 1.3331.333\bf 1.333bold_1.333
Refer to caption Refer to caption
Refer to caption Refer to caption
Refer to caption Refer to caption
Refer to caption Refer to caption
Refer to caption Refer to caption
(a) MAP reconstruction (b) Inpainted surrogate
Figure 11: Hypothesis test of different regions of the four images, M31, W28, Cygnus A, and 3C288. All the images are shown in log10subscript10\log_{10}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT scale. The figure is similar to Figure 5, but the wavelet-based model has been used to generate the MAP. The wavelet prior was used to inpaint the surrogate image.

Appendix C More realistic experiment results

Figures 12, 13 and 14 present the results using the realistic MeerKAT ungridded visibility pattern of the images W28, Cygnus A, and 3C288, correspondingly. The Tables 8, 9 and 10 show the quantitative results of the experiment.

1h 2h 4h 8h
Refer to caption

Dirty rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

MAP rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

Oracle error

Refer to caption Refer to caption Refer to caption
Predicted error
Refer to caption

Level 4444

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 3333

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 2222

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 1111

Refer to caption Refer to caption Refer to caption
Figure 12: Reconstructions and fast pixel uncertainty quantification (UQ) with the QuantifAI model for the W28 image with the four sets of simulated MeerKAT ungridded visibilities. Each column corresponds to the four datasets with synthesis times of 1111, 2222, 4444 and 8888 hours. The first row represents the dirty reconstruction. The MAP reconstruction is presented in the second row, while the oracle error, which we do not have access to with real data, is shown in the third row. The different decomposition levels of pixel UQ are shown in the last four rows.
Table 8: Main results of QuantifAI for the W28 image with the realistic ungridded MeerKAT visibility patterns with differing synthesis times. As the number of visibilities grows with the synthesis timer, so does the reconstruction SNR. The number of visibilities increases proportionally to the synthesis times.
Metrics Datasets
1h 2h 4h 8h
Number of visibilities 3×1043superscript1043\times 10^{4}3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 6×1046superscript1046\times 10^{4}6 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1.2×1051.2superscript1051.2\times 10^{5}1.2 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 2.4×1052.4superscript1052.4\times 10^{5}2.4 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
MAP reconstrucion SNR [dB] 23.8823.8823.8823.88 25.8925.8925.8925.89 27.4027.4027.4027.40 28.5628.5628.5628.56
Reconstruction Measurement op. evaluations 6976697669766976 6124612461246124 5270527052705270 4062406240624062
Wall-clock time [s] 34.8334.8334.8334.83 59.2559.2559.2559.25 93.6193.6193.6193.61 137.2137.2137.2137.2
UQ Measurement op. evaluations 30303030 30303030 32323232 32323232
Wall-clock time [s] 0.310.310.310.31 0.470.470.470.47 0.780.780.780.78 1.351.351.351.35
1h 2h 4h 8h
Refer to caption

Dirty

Refer to caption Refer to caption Refer to caption
Refer to caption

MAP

Refer to caption Refer to caption Refer to caption
Refer to caption

Oracle err.

Refer to caption Refer to caption Refer to caption
Predicted error
Refer to caption

Level 4444

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 3333

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 2222

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 1111

Refer to caption Refer to caption Refer to caption
Figure 13: Reconstructions and fast pixel uncertainty quantification (UQ) with the QuantifAI model for the Cygnus A image with the four sets of simulated MeerKAT ungridded visibilities. Each column corresponds to the four datasets with synthesis times of 1111, 2222, 4444 and 8888 hours. The first row represents the dirty reconstruction. The MAP reconstruction is presented in the second row, while the oracle error, which we do not have access to with real data, is shown in the third row. The different decomposition levels of pixel UQ are shown in the last four rows.
Table 9: Main results of QuantifAI for the Cygnus A image with the realistic ungridded MeerKAT visibility patterns with differing synthesis times. As the number of visibilities grows with the synthesis timer, so does the reconstruction SNR. The number of visibilities increases proportionally to the synthesis times.
Metrics Datasets
1h 2h 4h 8h
Number of visibilities 3×1043superscript1043\times 10^{4}3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 6×1046superscript1046\times 10^{4}6 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1.2×1051.2superscript1051.2\times 10^{5}1.2 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 2.4×1052.4superscript1052.4\times 10^{5}2.4 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
MAP reconstrucion SNR [dB] 25.7225.7225.7225.72 27.8427.8427.8427.84 28.4028.4028.4028.40 28.7428.7428.7428.74
Reconstruction Measurement op. evaluations 6482648264826482 5172517251725172 3686368636863686 2692269226922692
Wall-clock time [s] 36.1736.1736.1736.17 51.7151.7151.7151.71 66.7966.7966.7966.79 91.7991.7991.7991.79
UQ Measurement op. evaluations 30303030 30303030 30303030 32323232
Wall-clock time [s] 0.350.350.350.35 0.500.500.500.50 0.770.770.770.77 1.361.361.361.36
1h 2h 4h 8h
Refer to caption

Dirty rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

MAP rec.

Refer to caption Refer to caption Refer to caption
Refer to caption

Oracle error

Refer to caption Refer to caption Refer to caption
Predicted error
Refer to caption

Level 4444

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 3333

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 2222

Refer to caption Refer to caption Refer to caption
Refer to caption

Level 1111

Refer to caption Refer to caption Refer to caption
Figure 14: Reconstructions and fast pixel uncertainty quantification (UQ) with the QuantifAI model for the 3C288 image with the four sets of simulated MeerKAT ungridded visibilities. Each column corresponds to the four datasets with synthesis times of 1111, 2222, 4444 and 8888 hours. The first row represents the dirty reconstruction. The MAP reconstruction is presented in the second row, while the oracle error, which we do not have access to with real data, is shown in the third row. The different decomposition levels of pixel UQ are shown in the last four rows.
Table 10: Main results of QuantifAI for the 3C288 image with the realistic ungridded MeerKAT visibility patterns with differing synthesis times. As the number of visibilities grows with the synthesis timer, so does the reconstruction SNR. The number of visibilities increases proportionally to the synthesis times.
Metrics Datasets
1h 2h 4h 8h
Number of visibilities 3×1043superscript1043\times 10^{4}3 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 6×1046superscript1046\times 10^{4}6 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 1.2×1051.2superscript1051.2\times 10^{5}1.2 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 2.4×1052.4superscript1052.4\times 10^{5}2.4 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
MAP reconstrucion SNR [dB] 25.0125.0125.0125.01 27.0027.0027.0027.00 29.6229.6229.6229.62 33.0233.0233.0233.02
Reconstruction Measurement op. evaluations 2730273027302730 2198219821982198 1976197619761976 1856185618561856
Wall-clock time [s] 13.9313.9313.9313.93 21.2221.2221.2221.22 34.9934.9934.9934.99 62.6862.6862.6862.68
UQ Measurement op. evaluations 26262626 28282828 32323232 32323232
Wall-clock time [s] 0.280.280.280.28 0.440.440.440.44 0.780.780.780.78 1.331.331.331.33
\bsp