HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: arydshln

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2404.06695v1 [eess.IV] 10 Apr 2024

Spiral Scanning and Self-Supervised Image Reconstruction Enable Ultra-Sparse Sampling Multispectral Photoacoustic Tomography

Yutian Zhong    Xiaoming Zhang    Zongxin Mo    Shuangyang Zhang    Wufan Chen    \IEEEmembershipSenior Member, IEEE    and Li Qi This work was supported in part by National Natural Science Foundation of China (62371220), Guangdong Basic and Applied Basic Research Foundation (2021A1515012542, 2022A1515011748), and Guangdong Pearl River Talented Young Scholar Program (2017GC010282). (Corresponding authors: Li Qi.)The authors are with School of Biomedical Engineering, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, China; and Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, China. (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).
Abstract

Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for multispectral PAT, which we named U3S-PAT. Our strategy employs a sparse ring-shaped transducer that, when switching excitation wavelengths, simultaneously rotates and translates. This creates a spiral scanning pattern with multispectral angle-interlaced sampling. To solve the highly ill-conditioned image reconstruction problem, we propose a self-supervised learning method that is able to introduce structural information shared during spiral scanning. We simulate the proposed U3S-PAT method on a commercial PAT system and conduct in vivo animal experiments to verify its performance. The results show that even with a sparse sampling rate as low as 1/30, our U3S-PAT strategy achieves similar reconstruction and spectral unmixing accuracy as non-spiral dense sampling. Given its ability to dramatically reduce the time required for three-dimensional multispectral scanning, our U3S-PAT strategy has the potential to perform volumetric molecular imaging of dynamic biological activities.

{IEEEkeywords}

Sparsely sampled image reconstruction, photoacoustic tomography, implicit neural representation, prior embedding.

1 Introduction

\IEEEPARstart

Photoacoustic tomography (PAT) is a novel imaging approach that merges the high contrast of optical imaging with the deep imaging depth of ultrasound to produce cross-sectional images within tissue [1][6]. Multispectral PAT captures images at multiple wavelengths, enabling the identification of concentration and location of photoacoustic absorbers based on spectral unmixing. This capability aids in the precise visualization of various tissue components, such as fat, blood, and tumors [7][14].

Conventional PAT imaging system usually employs a linear or ring-shaped transducer for data acquisition, and the acquired images are 2D cross-sectional images. With linear translation scanning, a 3D volume image can be obtained, as shown in Fig.1 (a). To further perform multispectral imaging, one image is obtained per excitation wavelength. The total data amount and acquisition time increase with the number of wavelengths and slices. Therefore, multispectral and 3D PAT imaging has a heavy time cost and is very inefficient.

In practical applications, reducing the data acquisition burden can be accomplished through sparse sampling, which is usually implemented by reducing the total number of transducer elements [15][17]. It cuts down the cost of the transducer and signal acquisition equipment. However, as the number of detectors decreases, the angular coverage becomes sparser, posing challenges for achieving high-quality image reconstruction. Novel image reconstruction algorithms including iterative model-based approaches [18][21] and recently introduced deep-learning-based (DL-based) techniques [22, 23] have been devised to address this issue, yet preserving detailed structural information while avoiding excessive smoothing remains challenging. Currently, sparse sampling PAT systems only achieve a minimum sparse sampling rate of <<< 1/4, in order to maintain acceptable image reconstruction quality. Notably, these sparse sampling techniques do not reduce the total image acquisition time for multispectral imaging.

Refer to caption
Figure 1: (a) Traditional transducer fixed dense sampling (DS) strategy: the detection array consists of a large number of elements (e.g. Nd=128Nd128\rm Nd=128roman_Nd = 128). A dense set of signal is acquired at each slice at multiple wavelengths. (b) The proposed ultra-sparse spiral sampling (U3S-PAT) strategy: the sparse transducer (e.g. Nd=21Nd21\rm Nd=21roman_Nd = 21) rotates and translates linearly while switching excitation wavelength, thus creating a spiral scanning trajectory. The scanning time and amount of data can be dramatically reduced.

This brings us to an important question: is there any space to further reduce the sparse sampling rate? One way to do this is to make use of the redundant spatial-spectral information during scanning. As a proof of concept, our group recently proposed a multispectral sparse sampled PAT method, named interlaced sparse sampling PAT (ISS-PAT) [24], where the sparse ring-shaped transducer array rotates by an angle for each wavelength switch. This scheme leverages the shared information during each laser excitation, it offers image reconstruction quality similar to dense sampling at a sparse sampling rate as low as 1/8. Yet, ISS-PAT can only image a single slice, and for 3D imaging accomplished by linear scanning, the scanning time will still be increased accordingly.

Except for utilizing shared spectral information, we can also leverage the redundant spatial information shared during 3D scanning for efficient imaging. For example, spiral scanning, which rotates and translates the detector at the same time, is a highly efficient imaging concept widely adopted in clinical X-ray computed tomography (X-ray CT) imaging [25][28]. Recently, spiral scanning has been introduced to photoacoustic imaging. Initially proposed in [29], spiral volumetric PAT based on a high-end spherical array transducer has been used to characterize brown adipose tissue [30] and generate panoramic images of mice [31]. In these studies, spiral scanning is used to compensate for the limited imaging view of the spherical transducer. Their data acquisition is still dense sampling, and the cost of a spherical transducer with a multi-channel synchronized data acquisition system is very high. Therefore, optimizing the scanning time and system burdens is a problem to be solved, especially in multispectral 3D imaging.

For cross-sectional PAT with a ring-shaped transducer, the axial displacement between two slices is on the order of sub-millimeters. Therefore, there is a wealth of information shared between slices that may enable sparser sampling. Inspired by this, we propose to combine the technique of spiral scanning with multispectral interlaced sparse sampling to further reduce the sparse sampling rate. As shown in Fig.1 (b), our approach, which is dubbed ultra-sparse spiral sampling PAT or U3S-PAT, involves the simultaneous rotation and translation of the sparse ring-shaped transducer each time the excitation wavelength is changed. By doing so, our U3S-PAT facilitates multi-wavelength and multi-slice cross-sectional imaging that covers the entire region of interest with much fewer measurements. To solve the highly ill-conditioned image reconstruction problem, we develop a novel self-supervised image reconstruction model that does not require any ground truth for network training and only relies on intrinsic information of the raw PA signal. Provided by our unique U3S-PAT scanning strategy, we incorporate a structural prior image that fuses the redundancy information among slices and wavelengths to guide image reconstruction. To assess the performance of the proposed U3S-PAT strategy, we simulate the acquisition of spiral scanning data on a commercial PAT system and conduct in vivo animal experiments. The results demonstrate that even with a sparse sampling rate as low as 1/30, the reconstruction image quality and spectral unmixing accuracy achieved using our U3S-PAT are similar to those obtained with full-angle, all-wavelength dense sampling.

2 Methods

2.1 Multispectral ultra-sparse spiral sampling PAT

Our U3S-PAT method is based on a cross-sectional multispectral PAT system with a ring-shaped array transducer. Currently, most of popular PAT systems are based on the dense sampling strategy [Fig.1 (a)], where the transducer array consists of a large number of elements (e.g., the total number of elements Nd=128Nd128\rm Nd=128roman_Nd = 128) to ensure dense angular coverage. Either the transducer or the imaging target is linearly translated for imaging of different slices. At each slice position, multiple sets of signal are obtained for multispectral imaging. Therefore, the dense sampling scheme inevitably generates a large amount of data, i.e., the total size of acquired data is W×Nd×Nt×MWNdNtM\rm W\times Nd\times Nt\times Mroman_W × roman_Nd × roman_Nt × roman_M, where NtNt\rm Ntroman_Nt is the number of temporal samples obtained by each detector, WW\rm Wroman_W is the number of excitation wavelengths in each slice and MM\rm Mroman_M is the number of scanned slices.

In our U3S-PAT strategy [Fig.1 (b)], a sparse transducer array is used (e.g. Nd=21Nd21\rm Nd=21roman_Nd = 21). During scanning, each time the excitation wavelength is switched from one another, in the meantime, the transducer rotates for a small angle around the imaged object and translates a specified distance along the axial direction. Therefore, the slice position, scanning wavelength, and angle of the transducer are changed per image simultaneously. A spiral scanning trajectory is formed for the transducer. To make full use of the angular spacing between two adjacent transducer elements, the transducer is rotated such that each measurement is evenly distributed. That means, letting φ𝜑\varphiitalic_φ be the angle between two adjacent elements, the angle ΔθΔ𝜃\mathrm{\Delta}\thetaroman_Δ italic_θ rotated by the transducer as the laser transits from one wavelength to another is Δθ=φ/WΔ𝜃𝜑W\mathrm{\Delta}\theta=\varphi/\rm{W}roman_Δ italic_θ = italic_φ / roman_W. Therefore, the rotation angle at the tomographic position αm(m{1,2,M})subscript𝛼𝑚𝑚12M\alpha_{m}\left({m\in\left\{{1,2,\cdots\rm{M}}\right\}}\right)italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_m ∈ { 1 , 2 , ⋯ roman_M } ) is denoted as θmsubscript𝜃𝑚\theta_{m}italic_θ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where θm=m×Δθsubscript𝜃𝑚𝑚Δ𝜃\theta_{m}=m\times\mathrm{\Delta}\thetaitalic_θ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_m × roman_Δ italic_θ.

As a comparison, under our U3S-PAT strategy, the total size of the acquired data is only r×Nd×Nt×M𝑟NdNtMr\rm{\times Nd\times Nt\times M}italic_r × roman_Nd × roman_Nt × roman_M, where r𝑟ritalic_r is the downsampling rate of transducer elements. Therefore, compared to dense sampling, the total sparse sampling rate is 1/rW1𝑟W1/r\rm{W}1 / italic_r roman_W. For example, for W=5W5\rm W=5roman_W = 5, compared to dense sampling using 128 elements, we can reduce the amount of data to 1/30 with 21 elements, i.e. r1/6𝑟16r\approx 1/6italic_r ≈ 1 / 6. More importantly, our U3S-PAT strategy reduces multispectral scanning time during 3D imaging. To image the same number of slices, the time it takes for traditional transducer-fixed PAT to perform WW\rm Wroman_W wavelength imaging can be reduced to only 1/W1W\rm 1/W1 / roman_W. Since only one wavelength is used at each slice, the imaging speed of U3S-PAT is limited by the wavelength switching time. Therefore, our U3S-PAT may enable 3D multispectral imaging of dynamic scenes such as the beating heart, thus making possible the visualization of oxyhemoglobin (HbO2subscriptHbO2\rm{HbO}_{2}roman_HbO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) and deoxyhemoglobin (HbHb\rm{Hb}roman_Hb) within the heart during cardiac cycles.

2.2 Image reconstruction based on implicit neural representation

The U3S-PAT principle results in ultra-sparse data. Therefore, reconstructing high-quality images of all the slices at all wavelengths from such few measurements is a significant challenge. Recent developments in deep learning (DL) techniques offer powerful tools to tackle this kind of problem, but well-developed supervised DL methods require high-quality ground truth images which is difficult to obtain in sparse sampled PAT. Alternatively, neural representations that do not require ground truth data for training have recently garnered significant attention [32][35], with application on CT and MR reconstruction recently demonstrated [32]. Inspired by these works, rather than learning end-to-end map**, we propose a prior-embedded multi-layer perceptron (MLP) to learn the implicit neural representation of the target image from U3S-PAT data without any densely sampled reference.

Refer to caption
Figure 2: (a) Schematic of the image acquisition and reconstruction procedure of U3S-PAT. The prior-embedded MLP network takes the detected signal from five adjacent slices as input, and reconstructs five images at the center slice position corresponding to the five excitation wavelengths. (b) Schematic diagram of the implicit neural representation-based image reconstruction model. The wavelength and angle interlaced raw data is combined to form a dense signal and then reconstructed into a structural prior image. The image is embedded into the MLP network to guide multi-wavelength image reconstruction.

Since the transducer in cross-sectional PAT has a certain slice thickness limited by its focus, there is redundant information shared between adjacent slices if the translation distance is smaller than the slice thickness. Making use of this information should be able to improve image reconstruction. Fig.2 shows the principle of our self-supervised image reconstruction method. As shown in Fig.2 (a), in our U3S-PAT strategy, only one wavelength is used at each slice, and each slice corresponds to a single wavelength. Specifically, since the imaging wavelength loops through λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to λWsubscript𝜆W\lambda_{\rm{W}}italic_λ start_POSTSUBSCRIPT roman_W end_POSTSUBSCRIPT, the wavelength index at αmsubscript𝛼𝑚\alpha_{m}italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is given by ω=(m1)mod(W)+1𝜔𝑚1modW1\omega=\left({m-1}\right)\rm{mod}(\rm{W})+1italic_ω = ( italic_m - 1 ) roman_mod ( roman_W ) + 1, where modmod\rm{mod}roman_mod is the modulus operator and ω=1,2,W𝜔12W\omega=1,2,\cdots\rm{W}italic_ω = 1 , 2 , ⋯ roman_W. To reconstruct images at all wavelengths, the remaining spectral information can be induced from neighboring slices. Consequently, the input to our network model is a set of data of continuous slices, and the total number of slices is determined by the number of excitation wavelengths, i.e. WW\rm{W}roman_W, such that all wavelengths are covered. The output of the network is consisting of multi-wavelength images of the center slice position. Thus, the main function of the network model is to transfer the missing spectral information from adjacent slices to the center slice.

Fig.2 (b) shows the schematic of the reconstruction network at a given tomographic position when W=5W5\rm W=5roman_W = 5. Considering the interlaced scanning principle of U3S-PAT, we can combine all the acquired spirally scanned signal to form a dense set of signal with a high angular coverage density. From the combined signal we can then reconstruct a fusion image that integrates information from different wavelengths, transducer rotation angles, and tomographic positions. Therefore, this fusion image has rich, high-quality structural information, and thus can be used as prior knowledge to guide image reconstruction. To do so, the weights of the MLP network are embedded with internal information from the prior image to serve as an initialization for the search target image representation. Starting from this pre-embedded initialization, the network is optimized to identify the optimal points in the function space based solely on sparsely sampled measurements.

Our method involves three steps to derive the final images. Firstly, the prior image is encoded as an implicit neural representation by embedding the entire spatial image field into network parameters. Secondly, using the prior-embedded network as initialization, for training network to learn the reconstruction of multi-wavelength data, we utilized the sparse sampling data obtained at the given slice as well as its neighboring slices acquired at different wavelengths to perform the constrained reconstruction. Finally, by traversing all spatial coordinates in the image space, the learned MLP generates the reconstructed images, i.e. images of the center slice at λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to λ5subscript𝜆5\lambda_{5}italic_λ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. By performing the above steps at each tomographic position, a complete set of multi-wavelength images at all slices can be obtained.

2.2.1 Network architecture

The neural network in our approach is implemented as a 4-layer MLP network comprising 512 neural nodes in width. In each layer except the final layer, we employed a periodic activation function, known for its effectiveness in capturing fine details in signals [35]. The image is represented as a continuous function within the neural network. We define the network, denoted as ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT, with the parameter ξ𝜉\xiitalic_ξ, as follows:

ξ:cvwithc[0,1)n,v\left.\mathcal{M}_{\xi}:c\rightarrow v~{}with~{}c\in[0,1)^{n},~{}v\in\mathbb{R% }\right.caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT : italic_c → italic_v italic_w italic_i italic_t italic_h italic_c ∈ [ 0 , 1 ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_v ∈ blackboard_R (1)

The network function ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT takes the normalized spatial coordinates c𝑐citalic_c as input and produces the corresponding intensity value v𝑣vitalic_v as output. By map** coordinates to image intensities, the network function ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT encapsulates the internal information of the entire image within its parameters ξ𝜉\xiitalic_ξ. Thus, we consider the network structure ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT with parameters ξ𝜉\xiitalic_ξ as a neural representation of the images because it captures the essential characteristics of the images.

2.2.2 Prior embedding

We incorporate the prior images xprsuperscript𝑥𝑝𝑟x^{pr}italic_x start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT into the network during initialization. The coordinate-based MLP network ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT is employed to map the spatial coordinates to their corresponding intensity values in the prior images xprsuperscript𝑥𝑝𝑟x^{pr}italic_x start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT. Mathematically, this can be expressed as ξ:cixipr:subscript𝜉subscript𝑐𝑖superscriptsubscript𝑥𝑖𝑝𝑟\left.\mathcal{M}_{\xi}:c_{i}\rightarrow x_{i}^{pr}\right.caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT : italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT, where i𝑖iitalic_i denotes the index of coordinates in the image’s spatial field. With a total of N𝑁Nitalic_N pixels in the image denoted as {ci,xipr}i=1Nsuperscriptsubscriptsubscript𝑐𝑖superscriptsubscript𝑥𝑖𝑝𝑟𝑖1𝑁\left\{{c_{i},x_{i}^{pr}}\right\}_{i=1}^{N}{ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. The initially randomized MLP network ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT is optimized using the following objective function:

ξ*=argmin𝜉1Ni=1Nξ(ci)xipr22superscript𝜉𝜉𝑎𝑟𝑔𝑚𝑖𝑛1𝑁superscriptsubscript𝑖1𝑁superscriptsubscriptnormsubscript𝜉subscript𝑐𝑖superscriptsubscript𝑥𝑖𝑝𝑟22\xi^{*}=\underset{\xi}{argmin}\frac{1}{N}{\sum_{i=1}^{N}\left\|{\mathcal{M}_{% \xi}\left(c_{i}\right)-x_{i}^{pr}}\right\|_{2}^{2}}italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = underitalic_ξ start_ARG italic_a italic_r italic_g italic_m italic_i italic_n end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (2)

Following the optimization process, ξ*subscriptsuperscript𝜉\mathcal{M}_{\xi^{*}}caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is updated to encode the internal information of the prior images xprsuperscript𝑥𝑝𝑟x^{pr}italic_x start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT. This encoding is achieved by combining the optimized network parameters ξ*superscript𝜉\xi^{*}italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT with ξ*subscriptsuperscript𝜉\mathcal{M}_{\xi^{*}}caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. To maintain clarity and distinction, we denote the resulting MLP network as prsuperscript𝑝𝑟\mathcal{M}^{pr}caligraphic_M start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT, representing the prior-embedded MLP network. In other words, the prior images xprsuperscript𝑥𝑝𝑟x^{pr}italic_x start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT are equivalent to the output of this embedded MLP network, denoted as xpr=ξ*=prsuperscript𝑥𝑝𝑟subscriptsuperscript𝜉superscript𝑝𝑟x^{pr}=\mathcal{M}_{\xi^{*}}=\mathcal{M}^{pr}italic_x start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = caligraphic_M start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT.

2.2.3 Network training

Utilizing the pre-embedded MLP network prsuperscript𝑝𝑟\mathcal{M}^{pr}caligraphic_M start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT and the given sparse measurements y𝑦yitalic_y, where y𝑦yitalic_y includes sparse sampling data for each tomographic position. We aim to train the network to acquire the neural representation of target images. The target images x𝑥xitalic_x, being dependent on the coordinate-based MLP network ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT and its parameter ξ𝜉\xiitalic_ξ, are utilized to define the data term as min𝑥(Ax,y)=min𝜉(Aξ,y)𝑥𝑚𝑖𝑛𝐴𝑥𝑦𝜉𝑚𝑖𝑛𝐴subscript𝜉𝑦\underset{x}{min}\mathcal{~{}}\mathcal{E}\left({Ax,y}\right)=\underset{\xi}{% min}\mathcal{~{}}\mathcal{E}\left(A\mathcal{M}_{\xi},y\right)underitalic_x start_ARG italic_m italic_i italic_n end_ARG caligraphic_E ( italic_A italic_x , italic_y ) = underitalic_ξ start_ARG italic_m italic_i italic_n end_ARG caligraphic_E ( italic_A caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT , italic_y ), where (Ax,y)𝐴𝑥𝑦\mathcal{E}\left({Ax,y}\right)caligraphic_E ( italic_A italic_x , italic_y ) is a data term used to measure the error between the Ax𝐴𝑥Axitalic_A italic_x and y𝑦yitalic_y variables to ensure that the data is consistent with the sensor measurements. Matrix A𝐴Aitalic_A represents the forward model of the imaging system. The function \mathcal{E}caligraphic_E is a distance metric such as L1 or L2 norm. This formulation enables the optimization of the MLP parameter space to translate into the optimization of the image space. The network ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT undergoes training through the minimization of the L2 parametric loss, leveraging the initialization provided by the prior embedding network prsuperscript𝑝𝑟\mathcal{M}^{pr}caligraphic_M start_POSTSUPERSCRIPT italic_p italic_r end_POSTSUPERSCRIPT. Consequently, the optimization objective for the network ξsubscript𝜉\mathcal{M}_{\xi}caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT can be expressed as follows:

ξ*=argmin𝜉Aξy22,x*=ξ*formulae-sequencesuperscript𝜉𝜉𝑎𝑟𝑔𝑚𝑖𝑛superscriptsubscriptnorm𝐴subscript𝜉𝑦22superscript𝑥subscriptsuperscript𝜉\xi^{*}=\underset{\xi}{argmin}\left\|{A\mathcal{M}_{\xi}-y}\right\|_{2}^{2},~{% }x^{*}=\mathcal{M}_{\xi^{*}}italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = underitalic_ξ start_ARG italic_a italic_r italic_g italic_m italic_i italic_n end_ARG ∥ italic_A caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (3)

The forward model A𝐴Aitalic_A utilized in this paper is the universal back-projection (UBP) transform, which possesses differentiability properties. By combining this with the differentiable orthorectified model of the PAT imaging system, a link between the image space and the transducer space is established.

To reconstruct a complete set of multi-wavelength images, we employ a joint constraint reconstruction using the sparse sampling data obtained from not only the given slice position, but also from neighboring slices acquired at multiple wavelengths. Therefore, for reconstructing multi-wavelength images of tomographic position αmsubscript𝛼𝑚\alpha_{m}italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, since its corresponding excitation wavelength is λωsubscript𝜆𝜔\lambda_{\omega}italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, where ω=(m1)mod(W)+1𝜔𝑚1𝑚𝑜𝑑W1\omega=\left({m-1}\right)mod(\rm{W})+1italic_ω = ( italic_m - 1 ) italic_m italic_o italic_d ( roman_W ) + 1, the reconstruction loss for the λωsubscript𝜆𝜔\lambda_{\omega}italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT image can be expressed as:

ξ*=argmin𝜉Aξyαm22,x*=ξ*formulae-sequencesuperscript𝜉𝜉𝑎𝑟𝑔𝑚𝑖𝑛superscriptsubscriptnorm𝐴subscript𝜉subscript𝑦subscript𝛼𝑚22superscript𝑥subscriptsuperscript𝜉\xi^{*}=\underset{\xi}{argmin}\left\|{A\mathcal{M}_{\xi}-y_{\alpha_{m}}}\right% \|_{2}^{2},~{}x^{*}=\mathcal{M}_{\xi^{*}}italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = underitalic_ξ start_ARG italic_a italic_r italic_g italic_m italic_i italic_n end_ARG ∥ italic_A caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (4)

For the excitation wavelengths other than λωsubscript𝜆𝜔\lambda_{\omega}italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, the image reconstruction loss is:

ξ*=argmin𝜉(Aξyαm22+δAξyαm+k22),superscript𝜉𝜉𝑎𝑟𝑔𝑚𝑖𝑛superscriptsubscriptnorm𝐴subscript𝜉subscript𝑦subscript𝛼𝑚22𝛿superscriptsubscriptnorm𝐴subscript𝜉subscript𝑦subscript𝛼𝑚𝑘22\displaystyle\xi^{*}=\underset{\xi}{argmin}\left({\left\|{A\mathcal{M}_{\xi}-y% _{\alpha_{m}}}\right\|_{2}^{2}+\delta\left\|{A\mathcal{M}_{\xi}-y_{\alpha_{m+k% }}}\right\|_{2}^{2}}\right),italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = underitalic_ξ start_ARG italic_a italic_r italic_g italic_m italic_i italic_n end_ARG ( ∥ italic_A caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_δ ∥ italic_A caligraphic_M start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_m + italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (5)
x*=ξ*superscript𝑥subscriptsuperscript𝜉\displaystyle~{}x^{*}=\mathcal{M}_{\xi^{*}}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

Where k{1W+12,2W+12,,WW+12}𝑘1W122W12WW12k\rm\in\left\{{1-\frac{W+1}{2},2-\frac{W+1}{2},\cdots,W-\frac{W+1}{2}}\right\}italic_k ∈ { 1 - divide start_ARG roman_W + 1 end_ARG start_ARG 2 end_ARG , 2 - divide start_ARG roman_W + 1 end_ARG start_ARG 2 end_ARG , ⋯ , roman_W - divide start_ARG roman_W + 1 end_ARG start_ARG 2 end_ARG }, k0𝑘0k\neq 0italic_k ≠ 0, and δ𝛿\deltaitalic_δ is a hyperparameter.

2.2.4 Images inference

Once the network has undergone training, the reconstructed images can be obtained by deducing the trained network across all spatial coordinates within the image field. Mathematically, this can be represented as x*:{ci,ξ*(ci)}i=1N:superscript𝑥superscriptsubscriptsubscript𝑐𝑖subscriptsuperscript𝜉subscript𝑐𝑖𝑖1𝑁x^{*}:\left\{{c_{i},\mathcal{M}_{\xi^{*}}\left(c_{i}\right)}\right\}_{i=1}^{N}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT : { italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_M start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where i𝑖iitalic_i denotes the index of the coordinates. The resulting intensity values at these coordinates form the reconstructed multi-wavelength images x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT at each tomographic position.

3 Experimental setup

3.1 U3S-PAT data acquisition

All experiments are conducted using a commercial multispectral PAT platform (MSOT inVision128, iThera Medical GmbH, Germany). The system employs a ring transducer array comprising 128 elements and a radius of 40.5 mm. The transducer elements have a central frequency of 5 MHz. The system is equipped with a pulsed optical parametric oscillator (OPO) laser (SpotLight 600, InnoLas, Germany), capable of emitting pulsed laser light ranging from 680 nm to 980 nm. The laser pulses are emitted through 10 irradiation ports, ensuring a uniform distribution of laser energy across the surface of the imaged object. The laser operates at a pulse repetition frequency of 10 Hz, with a wavelength switching time of 30 ms. When irradiating a cylindrical sample with a diameter of 20.0 mm, the laser creates a ring-shaped illumination with a width of 8.0 mm on the sample’s surface, as shown in Fig.3 (a).

Refer to caption
Figure 3: (a) Schematic of the detection geometry of the commercial PAT system used in this study. (b) The detailed transducer layout of U3S-PAT simulated on the used PAT system. α𝛼\alphaitalic_α, λ𝜆\lambdaitalic_λ, and θ𝜃\thetaitalic_θ indicate the scanning slice position, excitation wavelength, and transducer rotation angle, respectively.
Refer to caption
Figure 4: The image reconstruction results of Nd=21Nd21\rm Nd=21roman_Nd = 21 transducer: (a) PAT images obtained by different methods at different slice positions and wavelengths. (b) The structural prior image obtained by U3S-PAT. (c) Absolute normalized error map between the image reconstructed by each method and the reference dense sampled image.
Refer to caption
Figure 5: The image reconstruction results of Nd=16Nd16\rm Nd=16roman_Nd = 16 transducer: (a) PAT images obtained by different methods at different slice positions and wavelengths. (b) The structural prior image obtained by U3S-PAT. (c) Absolute normalized error map between the image reconstructed by each method and the reference dense sampled image.

To showcase the effectiveness of the U3S-PAT method, the imaging configuration of the U3S-PAT scheme is simulated by leveraging a densely sampled dataset acquired by our MSOT system. By selectively retaining the signals acquired only at the sparse transducer locations and discarding the remaining densely sampled data, the spirally scanned and sparsely sampled data can be obtained. We use W=5W5\rm W=5roman_W = 5 wavelengths for U3S-PAT in all our experiments, including 700 nm, 730 nm, 760 nm, 800 nm and 850 nm. We perform separate image reconstructions for two distinct numbers of transducer elements. As shown in Fig.3 (b), one transducer array consists of Nd=21Nd21\rm Nd=21roman_Nd = 21 elements, corresponding to an element down sampling rate of similar-to\sim1/6, with a rotation angle of Δθ=2.57\mathrm{\Delta}\theta=2.57{{}^{\circ}}roman_Δ italic_θ = 2.57 start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT per step. The other transducer array consists of Nd=16Nd16\rm Nd=16roman_Nd = 16 elements, corresponding to an element down sampling rate of 1/8, with a rotation angle of Δθ=3.37\mathrm{\Delta}\theta=3.37{{}^{\circ}}roman_Δ italic_θ = 3.37 start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT per step. With the above setup, the fused prior image used during image reconstruction covers the whole angular spacing.

In the U3S-PAT imaging system, the transducer array is horizontally translated and rotated during the scanning process. The translation distance in spectral unmixing experiments, also known as the slice spacing, is set to 0.5 mm, which is smaller than the 800 μμ\upmuroman_μm focus detection sensitivity field of the laser. This scheme ensures that anatomical information is shared between neighboring images. While the translation distance in the image reconstruction experiments is set to 1.0 mm, because the focal zone of each transducer element is characterized by the -6 dB zone (i.e. the area in which the sensitivity drops to half its maximum value), and the actual detection region is larger than the focus detection sensitivity region.

3.2 Small animal imaging experiments

In vivo animal experiments are conducted using healthy female nude mice (8 weeks old) on the commercial PAT system described in Fig.3. All animal experimental procedures are approved by the animal ethics committee of Southern Medical University and conducted following current guidelines. To minimize image artifacts caused by respiratory movements, the nude mice are anesthetized. Throughout the data acquisition process, the imaging chamber is filled with water and maintained at a constant temperature of 37°C. The speed of sound in the medium is set to 1536 m/s.

3.3 Implementation of image reconstruction

In our experimental setup, all algorithms 111https://github.com/zhongniuniu/U3S-PAT are implemented using the PyTorch framework. The size of the reconstructed PAT image is defined as 220 × 220 pixels. For the prior embedding process, the training objective defined in Eq. (2) is optimized using the Adam optimizer, with a learning rate of 0.0001, over a total of 2000 training iterations. The reconstruction network is trained optimally, with the MLP network initialized using the prior embedding, and the objectives stated in Eq. (3) are trained for 2000 iterations using the Adam optimizer, with a learning rate of 0.00001. The hyperparameter δ𝛿\deltaitalic_δ in Eq. (5) is set to 0.8.

4 Experimental results

4.1 PAT images reconstruction at different locations under five wavelengths

Fig.4 (a) and Fig.5 (a) display the image reconstruction results and the densely sampled (DS) reference images at different slice position and different wavelength. To investigate the impact of prior embedding, we also obtain the reconstruction results of U3S-PAT without the use of the prior image, i.e., random initialization is used during network training. As can be seen, both the sparsely sampled (SS) and U3S-PAT (NoPrior) methods suffer from significant background artifacts at the two transducer settings. In contrast, our U3S-PAT method effectively eliminates these artifacts, yielding reconstructed images of much better quality with well-defined anatomical structures and organ boundaries. When comparing Fig.4 (a) and Fig.5 (a), it could be seen that with even fewer transducer elements, the proposed method still produced acceptable results. In contrast, the image quality of the SS and U3S-PAT (NoPrior) reconstructions degraded heavily as the number of transducer elements decreased.

By comparing U3S-PAT with or without prior embedding, we observe that the introduction of prior significantly enhances image sharpness and reduces artifacts. Notably, despite utilizing the same prior image and relying solely on sparsely sampled data at each stage for image reconstruction, the resulting images accurately capture continuous tissue variations at different slice locations. To further observe the accuracy of the reconstructed images, absolute normalized error images between the reconstructed images obtained using each method and the corresponding reference images are calculated. Fig.4 (c) and Fig.5 (c) display the difference image between the reconstructed image at 21.0 mm / 700 nm and the corresponding DS reference. The U3S-PAT method exhibits the smallest error, emphasizing its superior performance compared to other methods. Moreover, comparing the results obtained from Nd=21Nd21\rm Nd=21roman_Nd = 21 and Nd=16Nd16\rm Nd=16roman_Nd = 16, the U3S-PAT images vary little with the number of transducer elements, while the absolute difference images of SS and U3S-PAT (NoPrior) became significantly worse as fewer transducer are used.

Table 1: PSNR /// SSIM OF IMAGE RECONSTUCTION RESULTS OBTAINED USING DIFFERENT METHODS. PSNR, PEAK SIGNAL TO NOISE RATIO; SSIM, STRUCTURAL SIMILARITY. (MEAN, Bold: BEST).
Methods 700 nm 730 nm 760 nm 800 nm 850 nm
Number of transducer elements = 21
SS 14.51 / 0.5814 16.72 / 0.6461 19.40 / 0.6565 17.91 / 0.7387 13.71 / 0.7318
U3S-PAT (NoPrior) 30.60 / 0.7741 28.22 / 0.7068 28.71 / 0.7191 31.53 / 0.8099 33.50 / 0.8510
U3S-PAT 34.76 / 0.8767 36.06 / 0.9011 35.45 / 0.8911 37.61 / 0.9225 37.47 / 0.9143
Number of transducer elements = 16
SS 13.67 / 0.5254 16.34 / 0.5962 17.57 / 0.5871 17.10 / 0.6532 17.15 / 0.6702
U3S-PAT (NoPrior) 28.71 / 0.7196 26.50 / 0.6659 26.26 / 0.6522 28.83 / 0.7489 30.41 / 0.7824
U3S-PAT 33.06 / 0.8362 33.87 / 0.8536 33.17 / 0.8356 34.18 / 0.8528 34.42 / 0.8442

To further characterize the performance of U3S-PAT, we show the enlarged details of the images obtained using different methods in Fig.6. Upon observation of the magnified images, our U3S-PAT method exhibits superior structural visibility compared to both the SS method and U3S-PAT without prior embedding. Even at Nd=16, the quality of the U3S-PAT image remains unaffected, with only minor edge blurring observed in the region indicated by the red arrow. These findings highlight the effectiveness of the proposed self-supervised image reconstruction algorithm.

Next, we perform quantitative evaluation of the image reconstruction results. We use peak signal noise ratio (PSNR) and structural similarity (SSIM) for evaluation, and the obtained results are presented in Table I. At Nd=21Nd21\rm Nd=21roman_Nd = 21, the U3S-PAT results showed an average increase of 120.48%percent\%% in PSNR and 34.31%percent\%% in SSIM compared to the SS results. At ND=16, the PSNR increased by 106.15%percent\%% and SSIM increased by 39.25%percent\%% on average. This means that the intensity values obtained by the U3S-PAT method are closer to the reference DS image in both settings. In addition, when Nd=21Nd21\rm Nd=21roman_Nd = 21 and Nd=16Nd16\rm Nd=16roman_Nd = 16, U3S-PAT improves PSNR by 18.87%percent\%% and 19.89%percent\%%, and SSIM by 16.70%percent\%% and 18.30%percent\%%, respectively, compared to the U3S-PAT (NoPrior) method. U3S-PAT method consistently achieves superior performance across all metrics when utilizing prior embedding.

Refer to caption
Figure 6: Comparison of enlarged image reconstruction results of different transducer settings.
Refer to caption
Figure 7: The multispectral image reconstruction results at a single slice position (90.5 mm). (a) PAT images of five different wavelengths obtained by different methods. (b) The structural prior image obtained by U3S-PAT. (c) Spectral unmixing results of HbO2subscriptHbO2\rm{HbO}_{2}roman_HbO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and HbHb\rm{Hb}roman_Hb.

4.2 Spectral unmixing results

Fig.7 shows the image reconstruction and spectral un-mixing results of the in vivo animal experiment with Nd=21Nd21\rm Nd=21roman_Nd = 21. The images reconstructed by different methods are depicted in Fig.7 (a). Analysis of the images reconstructed using the SS and U3S-PAT (NoPrior) methods reveals the presence of numerous artifacts, attributable to inadequate measurement angle and the absence of prior constraints. In contrast, the images obtained with the U3S-PAT method exhibit cleaner results, closely resembling the DS image. Fig.7 (c) illustrates the distribution images of HbO2subscriptHbO2\rm{HbO}_{2}roman_HbO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and HbHb\rm{Hb}roman_Hb, obtained by applying the linear spectral unmixing algorithm to each set of images. The spectral unmixing outcomes highlight the influence of artifacts in the reconstructed images on the unmixing accuracy. The results achieved by the U3S-PAT method align closely with the DS method, enabling clear differentiation of the distributions of HbO2subscriptHbO2\rm{HbO}_{2}roman_HbO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and HbHb\rm{Hb}roman_Hb along organ contours.

The quantitative evaluation of the spectral unmixing experimental results is presented in Table II. The results demonstrate that the U3S-PAT method outperforms other methods in terms of PSNR, SSIM, and Dice metrics: compared to the SS method, the PSNR and SSIM increased for 100.50%percent\%% and 27.00%percent\%%, respectively, and the Hb-Dice and HbO2-Dice increased for 202.39%percent\%% and 320.59%percent\%%, respectively.

Table 2: QUANTITATIVE ANALYSIS OF IMAGE RECONSTRUCTION AND SPECTRAL UNMIXING RESULTS OF DIFFERENT METHODS. Dice: DICE COEFFICIENT. (MEAN, Bold: BEST).
Methods PSNR SSIM Hb-Dice HbO2-Dice
SS 17.91 0.7180 0.2926 0.1986
U3S-PAT (NoPrior) 32.27 0.8482 0.7656 0.6530
U3S-PAT 35.91 0.9119 0.8848 0.8356

4.3 Ablation studies

4.3.1 Impact of network structure on image reconstruction performance

Selecting an appropriate network structure is crucial to ensure the optimal performance of the proposed U3S-PAT strategy. The function is influenced by the network’s depth and width, which refer to the number of layers and neurons per layer, respectively. We obtained reconstruction results for the MLP network with various depths and widths at Nd=21Nd21\rm Nd=21roman_Nd = 21 and slice spacing =1.0absent1.0=1.0= 1.0 mm. The results are summarized in Table III. From the table, it can be observed that increasing the number of layers may cause inadequate optimization and lead to inferior reconstruction outcomes, and a certain increase in the width of the network can improve the performance of our reconstruction. Therefore, we selected the structure of a 4-layer MLP with 512 neural nodes as the backbone for our network.

Table 3: QUANTITATIVE ANALYSIS OF IMAGE RECONSTRUCTION RESULTS UNDER DIFFERENT NETWORK STRUCTURES. (MEAN, Bold: BEST).
Network Structure PSNR / SSIM (Nd=21normal-Nd21\rm Nd=21roman_Nd = 21)
Width =512 4 Layers 36.65 / 0.9132
6 Layers 33.12 / 0.7956
8 Layers 36.01 / 0.8989
Width =256 8 Layers 36.27 / 0.9011
16 Layers 26.83 / 0.4765
20 Layers 22.98 / 0.2715

4.3.2 Impact of slice spacing on image reconstruction performance

Selecting an appropriate slice spacing is essential to balance the data volume and quality of reconstructed images. To evaluate its impact on image quality, we reconstructed PAT images at different slice spacing and the results are depicted in Fig.8. As can be seen, as the slice spacing increases, the reconstructed images become increasingly blurred and artifacts emerge. The blurring can also be seen in the prior images, which confirms that the feasibility of our U3S-PAT strategy comes from redundant spatial information shared between adjacent slices during spiral scanning.

Refer to caption
Figure 8: Comparison of image reconstruction results of different slice spacings.

4.3.3 Impact of number of transducer elements on image reconstruction performance

To analyze the impact of element down sampling rate on the reconstruction performance, we conducted PAT image reconstructions with varying numbers of transducer elements. We have tested transducer settings of Nd = 4, 8, 16, 21, 26, 32, and 64 elements, and the obtained quantitative PSNR and SSIM results with respect to the transducer with Nd=128 are presented in Fig.9. As can be seen, as the number of transducer elements used in the measurement increases, both PSNR and SSIM increase accordingly. Specifically, optimal image quality is achieved when Nd is larger than 21. When a smaller number of elements is used, both PSNR and SSIM decrease rapidly.

5 Discussion

One of the primary drawbacks of multispectral PAT lies in the dense signal acquisition that necessitates high performance detector and system. The presented U3S-PAT method offers a new approach to address the challenge by proposing a novel spiral sparse sampling strategy, thus enables multispectral 3D signal acquisition with far less measurement compared to traditional dense sampling PAT. It is able to alleviate the need for multi-channel parallel signal acquisition and may significantly reduce system cost. Moreover, although has not been confirmed experimentally in our current study, our method is able to reduce the multispectral imaging time dramatically.

Refer to caption
Figure 9: PSNR and SSIM of the image reconstruction results obtained at different number of transducer elements.

The success of our method is based on the full use of spatial-spectral redundancy information during spiral scanning, which is induced to the image reconstruction of U3S-PAT. Our self-supervised neural network model eliminates the need for ground truth datasets and only relies on the raw acquired signal. The inclusion of the unique structural prior information provided by our U3S-PAT strategy significantly improves the reconstruction outcomes. Notably, our method can be used to further improve the imaging capability of dense sampling systems because the same wavelength-interlaced spiral scanning principle can also be applied on these systems.

The performance of the proposed U3S-PAT method is confirmed on a series of in vivo animal experiments. However, in our current study, our U3S-PAT idea is validated based on signal simulated on a commercial transducer-fixed multispectral PAT system, its feasibility needs to be evaluated on a hardware-ready system in the future. The other limitation of our U3S-PAT method is that it is based on cross-sectional PAT geometry with a ring-shaped transducer. Its applicability on linear or spherical transducer settings has to be investigated.

6 Conclusion

In this work, we propose a novel ultra-sparse spiral sampling strategy for multispectral PAT. The strategy, which we named U3S-PAT, incorporates a novel spiral scanning principle with wavelength and transducer angle multiplexed based on a sparse ring-shaped transducer. We develop a self-supervised image reconstruction model that incorporates structural prior knowledge of spiral scanning to enhance image quality. The effectiveness of our U3S-PAT strategy is demonstrated through in vivo animal experiments acquired on a commercial PAT system, and a total sparse sampling rate down to 1/30 is demonstrated. Our U3S-PAT provides a promising solution for the development of more efficient 3D multispectral PAT technology.

\appendices

References

  • [1] L. V. Wang and S. Hu, “Photoacoustic Tomography: in vivo Imaging from Organelles to Organs,” Science, vol. 335, no. 6075, pp. 1458–1462, Mar. 2012.
  • [2] L. V. Wang and J. Yao, “A practical guide to photoacoustic tomography in the life sciences,” Nat. Methods, vol. 13, no. 8, pp. 627–638, Jul. 2016.
  • [3] P. K. Upputuri and M. Pramanik, “Recent advances toward preclinical and clinical translation of photoacoustic tomography: a review,” J. Biomed. Opt., vol. 22, no. 4, p. 041006, Nov. 2016.
  • [4] P. K. Upputuri and M. Pramanik, “Dynamic in vivo imaging of small animal brain using pulsed laser diode-based photoacoustic tomography system,” J. Biomed. Opt., vol. 22, no. 09, p. 1, Sep. 2017.
  • [5] K. Tang et al., “Advanced Image Post-Processing Methods for Photoacoustic Tomography: A Review,” Photonics, vol. 10, no. 7, p. 707, Jul. 2023.
  • [6] Y. Zhong et al., “Unsupervised Fusion of Misaligned PAT and MRI Images via Mutually Reinforcing Cross-Modality Image Generation and Registration,” IEEE Trans. Med. Imaging, pp. 1–1, Jan. 2024.
  • [7] D. Razansky et al., “Multispectral opto-acoustic tomography of deep-seated fluorescent proteins in vivo,” Nat. Photonics, vol. 3, no. 7, pp. 412–417, Jul. 2009.
  • [8] X. Luís Deán-Ben, N. C. Deliolanis, V. Ntziachristos, and D. Razansky, “Fast unmixing of multispectral optoacoustic data with vertex component analysis,” Opt. Lasers Eng., vol. 58, pp. 119–125, Jul. 2014.
  • [9] J. Glatz, N. C. Deliolanis, A. Buehler, D. Razansky, and V. Ntziachristos, “Blind source unmixing in multi-spectral optoacoustic tomography,” Opt. Express, vol. 19, no. 4, p. 3175, Feb. 2011.
  • [10] S. Tzoumas, N. Deliolanis, S. Morscher, and V. Ntziachristos, “Unmixing Molecular Agents From Absorbing Tissue in Multispectral Optoacoustic Tomography,” IEEE Trans. Med. Imaging, vol. 33, no. 1, pp. 48–60, Jan. 2014.
  • [11] K. Tang et al., “Learning Spatially Variant Degradation for Unsupervised Blind Photoacoustic Tomography Image Restoration,” Photoacoustics, vol. 32, pp. 100536–100536, Aug. 2023.
  • [12] S. Zhang et al., “MRI Information-Based Correction and Restoration of Photoacoustic Tomography,” IEEE Trans. Med. Imaging, vol. 41, no. 9, pp. 2543–2555, Sep. 2022.
  • [13] L. Qi et al., “Photoacoustic Tomography Image Restoration With Measured Spatially Variant Point Spread Functions,” IEEE Trans. Med. Imaging, vol. 40, no. 9, pp. 2318–2328, May 2021.
  • [14] Z. Liang et al., “Automatic 3-D segmentation and volumetric light fluence correction for photoacoustic tomography based on optimal 3-D graph search,” Med. Image Anal., vol. 75, pp. 102275–102275, Jan. 2022.
  • [15] L. Yao and H. Jiang, “Photoacoustic image reconstruction from few-detector and limited-angle data,” Biomed. Opt. Express, vol. 2, no. 9, p. 2649, Aug. 2011.
  • [16] J. Meng et al., “High-speed, sparse-sampling three-dimensional photoacoustic computed tomography in vivo based on principal component analysis,” J. Biomed. Opt., vol. 21, no. 7, p. 076007, Jul. 2016.
  • [17] Y. Wang et al., “Enhancing sparse-view photoacoustic tomography with combined virtually parallel projecting and U3S-PATially adaptive filtering,” Biomed. Opt. Express, vol. 9, no. 9, p. 4569, Aug. 2018.
  • [18] H. Yu and G. Wang, “Compressed sensing based interior tomography,” Phys. Med. Biol., vol. 54, no. 13, pp. 4341–4341, Jun. 2009.
  • [19] G.-H. Chen, J. Tang, and S. Leng, “Prior image constrained compressed sensing (PICCS): A method to accurately reconstruct dynamic CT images from highly undersampled projection data sets,” Med. Phys., vol. 35, no. 2, pp. 660–663, Jan. 2008.
  • [20] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, no. 17, pp. 4777–4807, Aug. 2008.
  • [21] S. Ravishankar and Y. Bresler, “MR Image Reconstruction From Highly Undersampled k-Space Data by Dictionary Learning,” IEEE Trans. Med. Imaging, vol. 30, no. 5, pp. 1028–1041, May 2011.
  • [22] N. Davoudi, X. L. Deán-Ben, and D. Razansky, “Deep learning optoacoustic tomography with sparse data,” Nat. Mach. Intell., vol. 1, no. 10, pp. 453–460, Sep. 2019.
  • [23] A. Hauptmann et al., “Model-Based Learning for Accelerated, Limited-View 3-D Photoacoustic Tomography,” IEEE Trans. Med. Imaging, vol. 37, no. 6, pp. 1382–1393, Mar. 2018.
  • [24] X. Li et al., “Multispectral Interlaced Sparse Sampling Photoacoustic Tomography,” IEEE Trans. Med. Imaging, vol. 39, no. 11, pp. 3463–3474, Oct. 2020.
  • [25] N. R. Mollet et al., “High-Resolution Spiral Computed Tomography Coronary Angiography in Patients Referred for Diagnostic Conventional Coronary Angiography,” Circulation, vol. 112, no. 15, pp. 2318–2323, Oct. 2005.
  • [26] C. B. Ahn, J. H. Kim, and Z. H. Cho, “High-speed spiral-scan echo planar NMR imaging-I,” IEEE transactions on medical imaging, vol. 5, no. 1, pp. 2–7, 1986.
  • [27] E. C. Ford, G. S. Mageras, E. Yorke, and C. C. Ling, “Respiration-correlated spiral CT: A method of measuring respiratory-induced anatomic motion for radiation treatment planning,” Med. Phys., vol. 30, no. 1, pp. 88–97, Dec. 2002.
  • [28] T. Fuchs, M. Kachelrieß, and W. A. Kalender, “Technical advances in multi–slice spiral CT,” Eur. J. Radiol., vol. 36, no. 2, pp. 69–73, Nov. 2000.
  • [29] X. L. Deán-Ben, T. F. Fehm, S. J. Ford, S. Gottschalk, and D. Razansky, “Spiral volumetric optoacoustic tomography visualizes multi-scale dynamics in mice,” Light Sci. Appl., vol. 6, no. 4, pp. e16247–e16247, Nov. 2016.
  • [30] A. Ron, X. L. Deán-Ben, J. Reber, V. Ntziachristos, and D. Razansky, “Characterization of Brown Adipose Tissue in a Diabetic Mouse Model with Spiral Volumetric Optoacoustic Tomography,” Mol. Imag. Biol., vol. 21, no. 4, pp. 620–625, Nov. 2018.
  • [31] S. K. Kalva, X. L. Deán-Ben, M. Reiss, and D. Razansky, “Head-to-tail imaging of mice with spiral volumetric optoacoustic tomography,” Photoacoustics, vol. 30, p. 100480, Apr. 2023.
  • [32] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,” in Proc. Europ. Conf. Comp. Visi(ECCV), pp. 405–421, 2020.
  • [33] S. M. A. Eslami et al., “Neural scene representation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, Jun. 2018.
  • [34] M. Tancik et al., “Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains,” in Proc. Adv. Neural Inf.Process. Syst. (NeurIPS), Jun. 2020.
  • [35] V. Sitzmann, J. N. P. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, “Implicit Neural Representations with Periodic Activation Functions,” in Proc. Adv. Neural Inf.Process. Syst. (NeurIPS), Jun. 2020.