Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling

Haoran Li1, Xingjian Li2, Jiahua Shi3, Huaming Chen4, Bo Du5, Daisuke Kihara6,
Johan Barthelemy7, Jun Shen1 and Min Xu2
1
School of Computing and Information Technology, University of Wollongong, Australia
2Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, USA
3Centre for Nutrition and Food Sciences, The University of Queensland, Australia
4School of Electrical and Information Engineering, University of Sydney, Australia
5Department of Business Strategy and Innovation, Griffith University, Australia
6Department of Biological Sciences, Purdue University, USA
7NVIDIA, USA
Corresponding author.
Abstract

Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of fully-supervised approaches for cryo-ET images. Some unsupervised domain adaptation (UDA) approaches have been designed to enhance the segmentation network performance using unlabeled data. However, applying these methods directly to cryo-ET images segmentation tasks remains challenging due to two main issues: 1) the source data, usually obtained through simulation, contain a certain level of noise, while the target data, directly collected from raw-data from real-world scenario, have unpredictable noise levels. 2) the source data used for training typically consists of known macromoleculars, while the target domain data are often unknown, causing the model’s segmenter to be biased towards these known macromolecules, leading to a domain shift problem. To address these challenges, in this work, we introduce the first voxel-wise unsupervised domain adaptation approach, termed Vox-UDA, specifically for cryo-ET subtomogram segmentation. Vox-UDA incorporates a noise generation module to simulate target-like noises in the source dataset for cross-noise level adaptation. Additionally, we propose a denoised pseudo-labeling strategy based on improved Bilateral Filter to alleviate the domain shift problem. Experimental results on both simulated and real cryo-ET subtomogram datasets demonstrate the superiority of our proposed approach compared to state-of-the-art UDA methods.

Index Terms:
Cryo-Electron Tomography, Volumetric image segmentation, Unsupervised domain adaptation, Deep learning.

I Introduction

Cryo-Electron Tomography (cryo-ET) is one cutting-edge imaging technique which enables three-dimensional views of biological samples in a native frozen-hydrated state [1]. This automatic electron tomography technique allows biologists to capture high-resolution structures of macromolecular complexes [2], which plays an important role in the field of drug discovery and disease treatment. Inspired by the development of deep learning research in recent years, some efforts have been made in cryo-ET image analysis, especially for the subtomogram segmentation task [3, 4, 5, 6]. Subtomogram segmentation is a 3D segmentation task which aims to mine the meaningful information of the target macromolecular on the voxel-level. However, existing methods [4, 5, 7, 8] heavily rely on manual annotations which are highly subjective and resource-intensive.

Refer to caption
Figure 1: Some examples of the subtomograms and their corresponding segmentation masks. This figure shows: (a) simulated 3D cryo-ET subtomogram; (b) grey-scale ground truth segmentation mask; (c) binary segmentation mask after pre-processd (b), we set a threshold (300 in this paper) to turn the grey-scale mask into a binary one; (d) and (e) are real 3D cryo-ET subtomogram and its binary mask, respectively.

To tackle the challenges for data annotation, the classical unsupervised domain adaptation (UDA) method involves transferring the knowledge from labeled source domains to unlabelled target domains. Ganin et al. [9] proposed the first UDA approach through adversarial learning, which has become the most commonly used framework for UDA tasks [10, 11]. Some other works [12, 13, 14] have proposed generation-based approaches which synthesizes target-like images from the source ones, and applies supervised learning using the synthesized data with their original groundtruth mask. However, these approaches were primarily designed for 2D images, and can not perform well for 3D tasks. Some recent approaches have explored UDA on 3D images [15, 16, 17], however, all those volumetric UDA approaches firstly cut 3D input into 2D slices for the network input, leading to the loss of spatial information.

In this paper, we introduce one UDA approach using the large simulated macromolecular data [18, 19] as the source domain dataset and the real dataset as the target domain dataset. With the development of data simulation techniques, the acquisition of cryo-ET subtomogram data is no longer limited to traditional biological methods. Given the structure of macromolecules, existing generative methods [20, 21] can directly produce realistic synthetic datasets with corresponding voxel-level segmentation masks, which can be seen as a zero-cost alternative compared to traditional methods which requires high-end equipment and enormous human expertise. Nevertheless, the significant disparities between two domains bring new challenges for the UDA task. Firstly, the simulated data is generated through fixed parameters, yielding a fixed value of the noise in each subtomogram (often 0.03 dB or 0.05 dB), while the noise rate is unpredictable in the real dataset. Some examples of the subtomograms are shown in Fig. 1. Secondly, although subtomogram segmentation is a binary segmentation task, the simulated subtomograms and the real ones often may not share the same molecular categories, which will cause the segmentation network biased to the simulated ones and lead to the domain shift problem.

To address the challenges aforementioned, we propose a voxel-wise UDA framework, termed Vox-UDA, for cryo-ET subtomogram segmentation. Vox-UDA consists of a noise generation module (NGM) and a denoised pseudo-labeling (DPL) strategy. NGM generates Gaussian noise from a subset of the target dataset and applies it to the source samples to create a target-like noisy phenomenon. Meanwhile, DPL improves the existing bilateral filter, making it more suitable for 3D grayscale images through modifying the pixel difference of one Gaussian kernel to gradient difference. While denoising, DPL preserves edge information as much as possible to obtain undistorted pseudo-labels. These pseudo-labels provide additional supervision signals to address the domain shift problem, thereby enhancing the model’s performance on the target data.

In a nutshell, our contributions are as follows:

  • \bullet

    To the best of our knowledge, herein we are the first to esablish a paradigm for voxel-wise UDA segmentation in cryo-ET images (termed Vox-UDA). Our approach eliminates the reliance on large amounts of labeled real data by transferring knowledge learned from zero-cost simulated data to the real ones, enabling segmentation on real cryo-ET subtomograms.

  • \bullet

    Our Vox-UDA includes a noise generation module (NGM) and a denoised pseudo-labeling (DPL) strategy to enable the simulation of target-like noisy phenomenon, and it provide additional supervision signals to address the domain shift problem.

  • \bullet

    We propose an improved bilateral filter that, by being sensitive to the changes in gradients, preserves edge information as much as possible while eliminating noises in order to obtain high-quality pseudo-labels.

  • \bullet

    The extensive experimental results demonstrate the superiority of Vox-UDA method over state-of-the-art UDA methods on subtomogram segmentation. Besides, our method even outperforms fully supervised methods on some metrics.

The rest of the paper is organized as follows. A brief literature review is presented in Section II. We provide the details of our proposed Vox-UDA in Section III. Experimental results and visualizations are shown in Section IV, followed be the conclusion in Section V.

Refer to caption
Figure 2: Overview of our proposed Vox-UDA framework. IBF denotes the improved Bilateral Filter, which is detailed in Fig 3. We use different colors to represent different flows. Best viewed in color.

II Related Work

II-A Unsupervised Domain Adaptation for Vision Tasks

Under the unsupervised domain adaptation (UDA) settings, there are two types of dataset being used for training: the source domain dataset which is fully labelled, and the target domain dataset, which is unlabelled. The first UDA approach is proposed by [9], which aims to transfer the model trained on source data to target data without introducing additional annotations through adversarial learning. Since UDA greatly expands the model’s generalization ability, the model can be adapted to new domains without requiring labeled data in the target domain and is introduced into various tasks, e.g., classification [22, 23, 24, 25], object detection [26, 27, 28, 29] and semantic segmentation [30, 31, 32, 33]. As cryo-ET subtomogram segmentation is a segmentation task, in this paper, we mainly focus on the UDA approaches applied in semantic segmentation tasks. For segmentation, the UDA methods aim to eliminate the cross-domain discrepancies through the content at both the feature- and pixel-level. Zou et al. proposes a class-aware UDA approach based on self-training to handle the class-imbalance problem [34]. Zheng et al. designs a dual-path framework, which fuses equirectangular projections and tangent projection for Panoramic Semantic Segmentation [35]. Additionally, UDA has achieved excellent results in medical image segmentation [36, 37, 7]. Ji et al. introduces an attention-based method, which learns the hierarchical consistencies and transfer more discriminative information between the source and target domain [38]. However, although these methods achieves great performance in UDA segmentation, they are primarily designed for 2D images, which are not suitable for cryo-ET subtomogram segmentation as tomographies are often volumetric images. To handle this UDA challenge in 3D segmentation tasks, Shin et al. [15] proposed a cross-modality translation method to generate synthetic 3D target volumes from source 2D scans. Xu et al. [16] applied a fast Fourier transform to convert input 2D slices into frequency domain. A consistency loss was utilised to simultaneously constrain both the feature domain and frequency domain to achieve UDA. As discussed in Sec Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling, all the exiting 3D UDA approaches convert images into 2D UDA tasks by slicing, rather than directly transferring in the three-dimensional voxel space, which will lead to information loss.

II-B Cryo-Electron Tomography

Cryo-electron tomography (cryo-ET) integrates cryogenic specimen preparation, electron microscopy for data acquisition, and tomographic reconstruction for 3D visualization [39]. This technique allows to capture structural information of macromolecules in an ultra-low temperature environment, which holds significant importance in the fields of biology and medicine. A cryo-ET subtomogram is a small cubic sub-volume extracted from a tomogram, normally only a single macromolecular complex is contained in each subtomogram. Inspired by recent advancements in deep learning, their applications on cryo-ET have drawn widespread interests with their potential to aid in the corresponding cryo-ET tasks, e.g., subtomogram alignment [40, 41], subtomogram classification [42, 43] and subtomogram segmentation [6, 44, 45]. However, deep learning methods rely on large amounts of data annotation, which is particularly challenging for cryo-ET images. Bandyopadhyay et al. proposes a domain randomization-based approach to enhance the generalization performance of the model in subtomogram classification [46]. Zhu et al. proposed a weakly supervised approach which used only 2D-level annotation for voxel segmentation to alleviate the burden of annotation [3]. In this paper, we will propose an UDA approach for subtomogram segmentation, which aims at utilizing a large amount of cost-free annotated simulated data for knowledge transfer, enabling the segmentation network to generalize on real cryo-ET subtomograms.

III Method

Our proposed framework is based on VoxResNet [47], a state-of-the-art method designed for fully-supervised voxel-level segmentation. As can be seen from Fig. 2, VoxResNet takes the combination of the outputs from the second convolution layer, the second VoxReS module, the fourth VoxReS module, and the last VoxReS module as its final output.

Given a source domain dataset 𝒮={xis,yis}i=1N𝒮subscriptsuperscriptsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑦𝑠𝑖𝑁𝑖1\mathcal{S}=\left\{x^{s}_{i},y^{s}_{i}\right\}^{N}_{i=1}caligraphic_S = { italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT and a target domain dataset 𝒯={xjt}j=1M𝒯subscriptsuperscriptsubscriptsuperscript𝑥𝑡𝑗𝑀𝑗1\mathcal{T}=\left\{x^{t}_{j}\right\}^{M}_{j=1}caligraphic_T = { italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT, where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the input 3D subtomogram and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the 3D groundtruth mask, we aim to train a voxel segmentation network for the target domain only using groundtruth supervision signals from the source domain. Fig. 2 illustrates the details of the proposed Vox-UDA. As can be seen from the figure, Vox-UDA takes xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, xitsubscriptsuperscript𝑥𝑡𝑖x^{t}_{i}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and a subset of 𝒯𝒯\mathcal{T}caligraphic_T as input. This subset 𝒯Nsampledsubscript𝒯subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑑\mathcal{T}_{N_{sampled}}caligraphic_T start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT is randomly sampled from 𝒯𝒯\mathcal{T}caligraphic_T, which contains Nsampledsubscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑑N_{sampled}italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e italic_d end_POSTSUBSCRIPT samples. 𝒯Nsampledsubscript𝒯subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑑\mathcal{T}_{N_{sampled}}caligraphic_T start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT is then sent to the noise generation module (NGM) to obtain the target-like voxel-wise Gaussian noise ϵ𝒩(0,σt2𝐈)similar-toitalic-ϵ𝒩0superscriptsubscript𝜎𝑡2𝐈\epsilon\sim\mathcal{N}(0,\sigma_{t}^{2}\mathbf{I})italic_ϵ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ). Further, ϵitalic-ϵ\epsilonitalic_ϵ is introduced to the source input xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to produce updated input xissubscriptsuperscript𝑥superscript𝑠𝑖x^{s^{\prime}}_{i}italic_x start_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, xissubscriptsuperscript𝑥superscript𝑠𝑖x^{s^{\prime}}_{i}italic_x start_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjtsubscriptsuperscript𝑥𝑡𝑗x^{t}_{j}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are all passed to the student network to acquire segmentation loss segsubscript𝑠𝑒𝑔\mathcal{L}_{seg}caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT, consistency loss consubscript𝑐𝑜𝑛\mathcal{L}_{con}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n end_POSTSUBSCRIPT and discriminator loss dissubscript𝑑𝑖𝑠\mathcal{L}_{dis}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_s end_POSTSUBSCRIPT for optimization. Following [7, 16], we set the same weight for different losses. Hence, the overall loss can be rewritten as

=seg+con+dis.subscript𝑠𝑒𝑔subscript𝑐𝑜𝑛subscript𝑑𝑖𝑠\mathcal{L}=\mathcal{L}_{seg}+\mathcal{L}_{con}+\mathcal{L}_{dis}.caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_s end_POSTSUBSCRIPT . (1)

Furthermore, to handle the domain shift problem, we design a denoised pseudo-labeling strategy. xjtsubscriptsuperscript𝑥𝑡𝑗x^{t}_{j}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is sent to the improved Bilateral Filter (IBF) to eliminate its noise and then sent to the teacher network to obtain the pseudo-label, which is then used to tune the student network for better performance. Noted that the threshold η𝜂\etaitalic_η used for pseudo-label selection is set to 0.850.850.850.85 and the teacher network is updated via exponential moving average (EMA).

III-A Noise Generation Module

Inspired by recent approaches, such as  [48, 49, 50], using the segmentation network as the denoiser for noise elimination in 2D space, we extend this insight to three-dimensional space and propose a new noise generation module. Given a sample xntsubscriptsuperscript𝑥𝑡𝑛x^{t}_{n}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from the input 𝒯Nsampledsubscript𝒯subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑑\mathcal{T}_{N_{sampled}}caligraphic_T start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we first apply Discrete Fourier transform (DFT) to obtain its frequency information

x^n(u,v,ζ)=ξ[xt],subscript^𝑥𝑛𝑢𝑣𝜁𝜉delimited-[]superscript𝑥𝑡\hat{x}_{n}(u,v,\zeta)=\xi[x^{t}],over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u , italic_v , italic_ζ ) = italic_ξ [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] , (2)

where u𝑢uitalic_u, v𝑣vitalic_v and ζ𝜁\zetaitalic_ζ represent the spatial frequencies of the Fourier transform, and ξ𝜉\xiitalic_ξ denotes the Discrete Fourier transform. In the frequency domain, low-frequency information corresponds to the textural details of the target object, while large amounts of noise with little edge information about the object are usually encompassed into the high-frequency information. To obtain the noise encompassed in the high-frequency information, xn(u,v,ζ)subscript𝑥𝑛𝑢𝑣𝜁x_{n}(u,v,\zeta)italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u , italic_v , italic_ζ ) is then passed to a high-pass filter to eliminate the textural details contained in low-frequency information

x^n(u,v,ζ)=Hhigh(u,v,ζ)x^n(u,v,ζ),superscriptsubscript^𝑥𝑛𝑢𝑣𝜁subscript𝐻𝑖𝑔𝑢𝑣𝜁subscript^𝑥𝑛𝑢𝑣𝜁\hat{x}_{n}^{\prime}(u,v,\zeta)=H_{high}(u,v,\zeta)\hat{x}_{n}(u,v,\zeta),over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_u , italic_v , italic_ζ ) = italic_H start_POSTSUBSCRIPT italic_h italic_i italic_g italic_h end_POSTSUBSCRIPT ( italic_u , italic_v , italic_ζ ) over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u , italic_v , italic_ζ ) , (3)

where Hhigh(u,v,ζ)subscript𝐻𝑖𝑔𝑢𝑣𝜁H_{high}(u,v,\zeta)italic_H start_POSTSUBSCRIPT italic_h italic_i italic_g italic_h end_POSTSUBSCRIPT ( italic_u , italic_v , italic_ζ ) denotes the high-pass filter. The filter rate is set to 24.4%percent24.424.4\%24.4 %, which means only 24.4%percent24.424.4\%24.4 % remains while the rest of them are filtered (see Sec. IV-C for detailed discussions). Inverse Discrete Fourier transform (iDFT) is further applied to recover voxel-level information from the filtered frequency domain xn(u,v,ζ)superscriptsubscript𝑥𝑛𝑢𝑣𝜁x_{n}^{\prime}(u,v,\zeta)italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_u , italic_v , italic_ζ ):

xnt=ξ1[x^n],subscriptsuperscript𝑥superscript𝑡𝑛superscript𝜉1delimited-[]superscriptsubscript^𝑥𝑛x^{t^{\prime}}_{n}=\xi^{-1}[\hat{x}_{n}^{\prime}],italic_x start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_ξ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] , (4)
Refer to caption
Figure 3: Proposed improved Bilateral Filter. (a) Both domain filtering and range filtering are applied to an sub-figure extracted from the input target subtomogram with size 3×3×33333\times 3\times 33 × 3 × 3. (b) Deploying Laplace transform to obtain the gradient changes used in range filtering.

where ξ1[]superscript𝜉1delimited-[]\xi^{-1}[\cdot]italic_ξ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ ⋅ ] denotes the Inverse Discrete Fourier Transform. As discussed in Sec Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling, the noise level of each input from the target domain is unpredictable, hence instead of using the noise from single xntsubscriptsuperscript𝑥𝑡𝑛x^{t}_{n}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we calculate the average noise level x¯ntsubscriptsuperscript¯𝑥superscript𝑡𝑛\overline{x}^{t^{\prime}}_{n}over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from the whole subset. Since deep learning models are more sensitive to noise that conforms to a probability distribution [50], we set the Gaussian Noise as the input noise for noise generation (we also compare other types of noise, ablation studies are provided in Sec IV-C2). Therefore, instead of directly introducing x¯ntsubscriptsuperscript¯𝑥superscript𝑡𝑛\overline{x}^{t^{\prime}}_{n}over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to the source input xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we only take its variance σt2subscriptsuperscript𝜎2𝑡\sigma^{2}_{t}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and generate a Gaussian noise based on

ϵ=𝒩(0,σt2𝐈),italic-ϵ𝒩0superscriptsubscript𝜎𝑡2𝐈\epsilon=\mathcal{N}(0,\sigma_{t}^{2}\mathbf{I}),italic_ϵ = caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) , (5)

where 𝒩(0,σt2𝐈)𝒩0superscriptsubscript𝜎𝑡2𝐈\mathcal{N}(0,\sigma_{t}^{2}\mathbf{I})caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) denotes a random generated Gaussian noise with expectation equals to 00, and variance equals to σt2superscriptsubscript𝜎𝑡2\sigma_{t}^{2}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. And the updated source input is obtained through

xis=xis+ϵ.subscriptsuperscript𝑥superscript𝑠𝑖subscriptsuperscript𝑥𝑠𝑖italic-ϵx^{s^{\prime}}_{i}=x^{s}_{i}+\epsilon.italic_x start_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ϵ . (6)
TABLE I: Comparison of experimental results on UDA cryo-ET subtomogram segmentation between Vox-UDA and previous non-voxel-level UDA approaches. Best results are in bold font.
Source macromoleculars: 1bxn, 1f1b and 1yg6
Method mIoU mIoUribo mIoU26S mIoUTRiC Dice Diceribo Dice26S DiceTRiC
w/o adaptation 9.7 11.1 2.6 2.6 17.4 19.9 5.1 5.0
Fully Supervised 46.0 49.5 20.0 34.6 61.6 65.7 30.0 48.3
DANN [9] 38.4 43.0 6.5 11.7 53.0 59.1 11.2 16.8
PDAM [12] 39.8 43.3 13.8 22.6 55.1 59.6 21.5 31.9
ASC [16] 40.4 43.4 19.2 23.3 55.8 59.7 28.7 32.7
LE-UDA [13] 41.5 44.7 18.4 23.9 56.8 61.0 28.4 32.6
Vox-UDA(w NGM) 48.5 50.6 32.2 38.6 64.4 66.8 47.1 51.5
Vox-UDA(w BF) 49.1 50.4 30.5 39.2 64.5 67.1 46.9 50.7
Vox-UDA(w IBF) 50.3 53.8 28.8 41.3 65.9 68.5 44.0 52.8
Refer to caption
Figure 4: Visualization of subtomogram segmentation results using 1bxn, 1f1b and 1yg6 as the source datasets. We use UCSF Chimera [51] for 3D cryo-ET visualization.

xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xissubscriptsuperscript𝑥superscript𝑠𝑖x^{s^{\prime}}_{i}italic_x start_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are both sent to the student network to obtain the consistency loss consubscript𝑐𝑜𝑛\mathcal{L}_{con}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n end_POSTSUBSCRIPT. Following VoxResNet, we also take the output feature embeddings from the same layers as the input of the loss function

con=λ1BN(fc,fc)+λ2BN(fv2,fv2)+λ3BN(fv4,fv4)subscript𝑐𝑜𝑛subscript𝜆1subscript𝐵𝑁subscript𝑓𝑐subscriptsuperscript𝑓𝑐subscript𝜆2subscript𝐵𝑁subscript𝑓𝑣2subscriptsuperscript𝑓𝑣2subscript𝜆3subscript𝐵𝑁subscript𝑓𝑣4subscriptsuperscript𝑓𝑣4\displaystyle\mathcal{L}_{con}=\lambda_{1}\mathcal{L}_{BN}(f_{c},f^{\prime}_{c% })+\lambda_{2}\mathcal{L}_{BN}(f_{v2},f^{\prime}_{v2})+\lambda_{3}\mathcal{L}_% {BN}(f_{v4},f^{\prime}_{v4})caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_B italic_N end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_B italic_N end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_v 2 end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v 2 end_POSTSUBSCRIPT ) + italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_B italic_N end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_v 4 end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v 4 end_POSTSUBSCRIPT ) (7)
+λ4BN(fv6,fv6),subscript𝜆4subscript𝐵𝑁subscript𝑓𝑣6subscriptsuperscript𝑓𝑣6\displaystyle+\lambda_{4}\mathcal{L}_{BN}(f_{v6},f^{\prime}_{v6}),+ italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_B italic_N end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_v 6 end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v 6 end_POSTSUBSCRIPT ) ,

where BNsubscript𝐵𝑁\mathcal{L}_{BN}caligraphic_L start_POSTSUBSCRIPT italic_B italic_N end_POSTSUBSCRIPT denotes the cosine similarity loss and λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes the weights to control the relative importance among different consistency losses. Although NGM is introduced to simulate target-like noises, it is impossible to create a noise environment in the source domain that is entirely the same as the target domain. On the other hand, while the shallower layers of the decoder containing more textural information, the deeper layers contain more edge information [52]. To overcome these constraints, our solution is, instead of an equal superposition, we assign different weights to the different layers to control the weighting of texture and edge consistency losses (see detail discussions in Sec IV-C).

III-B Denoised Pseudo-Labeling

Although the NGM can narrow noise level gaps between two domains, as aforementioned, the segmentation network is still biased to the source data due to the domain shift problem. Therefore, we provide an extra supervision signal for optimization through pseudo-labeling. However, due to the noise level being unknown in the target domain and also that such noise may lead to distorted pseudo-labels further harming the performance of the model, we propose a denoised pseudo-labeling strategy instead. Unlike the existing pseudo-labeling method whereby adding an extra training step, we use the student-teacher structure [53]. Before xjtsubscriptsuperscript𝑥𝑡𝑗x^{t}_{j}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is sent to the teacher network to obtain the pseudo-label, we first perform denoising on xjtsubscriptsuperscript𝑥𝑡𝑗x^{t}_{j}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We designed three different denoising methods: 1)1)1 ) directly using NGM for denoising, 2)2)2 ) using Bilateral Filter for denoising, and 3)3)3 ) using our designed improved Bilateral Filter (IBF) for noise reduction.

TABLE II: Experimental results on UDA cryo-ET subtomogram segmentation under different dataset settings. Different from the results reported in Table I, we set 2byu, 2h12 and 21db as the source datasets.
Source macromoleculars: 2byu, 2h12 and 21db
Method mIoU mIoUribo mIoU26S mIoUTRiC Dice Diceribo Dice26S DiceTRiC
w/o adaptation 12.7 14.2 3.7 3.0 22.2 24.7 7.2 5.8
Fully Supervised 46.0 49.5 20.0 34.6 61.6 65.7 30.0 48.3
DANN [9] 31.9 36.1 3.8 6.2 45.9 51.9 6.4 8.7
PDAM [12] 39.1 43.1 10.9 15.7 54.1 59.4 17.7 24.0
ASC [16] 41.7 45.2 24.7 13.3 56.9 61.2 38.1 19.8
LE-UDA [13] 43.1 46.4 21.9 22.3 58.6 62.6 33.8 32.4
Vox-UDA(w NGM) 47.5 50.1 27.5 34.7 63.2 66.3 41.0 46.8
Vox-UDA(w BF) 48.0 50.3 27.8 34.4 63.8 66.7 39.9 47.0
Vox-UDA(w IBF) 49.5 52.4 28.3 35.1 65.2 68.9 41.3 47.7
Refer to caption
Figure 5: Additonal visualization of subtomogram segmentation results using 2byu, 2h12 and 21db as source dataset.

NGM Denoising. The xjtsubscriptsuperscript𝑥𝑡𝑗x^{t}_{j}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is directly sent to the NGM to obtain its noise xjtsubscriptsuperscript𝑥superscript𝑡𝑗x^{t^{\prime}}_{j}italic_x start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence, the denoised image can be represented as x~jt=(xjtxjt)subscriptsuperscript~𝑥𝑡𝑗subscriptsuperscript𝑥𝑡𝑗subscriptsuperscript𝑥superscript𝑡𝑗\widetilde{x}^{t}_{j}=\left(x^{t}_{j}-x^{t^{\prime}}_{j}\right)over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

Bilateral Filter Denoising. Although noise can be partially removed through frequency domain analysis, some edge information will also be eliminated, leading to distortion of the pseudo-labels. Therefore, we further deploy a non-linear approach, Bilateral Filter [54], as the denoiser instead of the NGM. Bilateral filter (BF) consists of a domain Gaussian kernel and a range Gaussian kernel, the former is used to eliminate the noises, and the later is to retain edge information as much as possible during filtering. BF uses a sliding window, extracting a 3×3×33333\times 3\times 33 × 3 × 3 sub-figure for filtering operations each time. Given the central voxel vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and the rest voxels vq,qVsubscript𝑣𝑞𝑞𝑉v_{q},q\in Vitalic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_q ∈ italic_V of the sub-figure, the updated voxel can be represented as

vq=BF(vq)=qVGσd(pq)Gσr(vqvp)×vqqVGσd(pq)Gσr(vqvp),subscriptsuperscript𝑣𝑞BFsubscript𝑣𝑞subscript𝑞𝑉subscript𝐺subscript𝜎𝑑norm𝑝𝑞subscript𝐺subscript𝜎𝑟subscript𝑣𝑞subscript𝑣𝑝subscript𝑣𝑞subscript𝑞𝑉subscript𝐺subscript𝜎𝑑norm𝑝𝑞subscript𝐺subscript𝜎𝑟subscript𝑣𝑞subscript𝑣𝑝v^{{}^{\prime}}_{q}=\text{BF}(v_{q})=\frac{\sum_{q\in V}G_{\sigma_{d}}(||p-q||% )G_{\sigma_{r}}(v_{q}-v_{p})\times v_{q}}{\sum_{q\in V}G_{\sigma_{d}}(||p-q||)% G_{\sigma_{r}}(v_{q}-v_{p})},italic_v start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = BF ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_q ∈ italic_V end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | | italic_p - italic_q | | ) italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) × italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_q ∈ italic_V end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | | italic_p - italic_q | | ) italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG , (8)

where ||||||\cdot||| | ⋅ | | denotes the Euclidean distance [55], σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT denote the domain hyperparameter and range hyperparameter, and Gσsubscript𝐺𝜎G_{\sigma}italic_G start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT denotes the Gaussian kernel

Gσ(x)=ex22σ22πσ.subscript𝐺𝜎𝑥superscript𝑒superscript𝑥22superscript𝜎22𝜋𝜎G_{\sigma}(x)=\frac{e^{-\frac{x^{2}}{2\sigma^{2}}}}{2\pi\sigma}.italic_G start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π italic_σ end_ARG . (9)

Hence, the denoised image can be represented as x~jt=BF(xjt)subscriptsuperscript~𝑥𝑡𝑗BFsubscriptsuperscript𝑥𝑡𝑗\widetilde{x}^{t}_{j}=\text{BF}(x^{t}_{j})over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = BF ( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

IBF Denoising. The key point of the Bilateral Filter is the design of using two separate Gaussian kernels for different tasks. However, the range kernel introduced for retaining edge information mainly focuses on the voxel-level color difference, which indeed can achieve satisfactory results in the RGB space, but in the grayscale space, there might be more differences in brightness, which coulc affect the effectiveness of this kernel. Therefore, we further propose an improved Bilateral Filter (IBF), which uses the gradient of each voxel instead of its value for the range kernel for edge retaining. In detail, we reflect Laplace operator [56] into 3-dimension and calculate the gradient of each voxel vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in the hhitalic_h, w𝑤witalic_w, and d𝑑ditalic_d (height, width and depth) directions in a three-dimensional space. Since voxel space is discrete, the gradient of vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in each direction can be represented as

Δvqh=Δ(vq+1h,vqw,vqd)Δ(vq1h,vqw,vqd),Δsuperscriptsubscript𝑣𝑞Δsuperscriptsubscript𝑣𝑞1superscriptsubscript𝑣𝑞𝑤superscriptsubscript𝑣𝑞𝑑Δsuperscriptsubscript𝑣𝑞1superscriptsubscript𝑣𝑞𝑤superscriptsubscript𝑣𝑞𝑑\displaystyle\frac{\partial\Delta}{\partial\vec{v}_{q}^{h}}=\Delta(v_{q+1}^{h}% ,v_{q}^{w},v_{q}^{d})-\Delta(v_{q-1}^{h},v_{q}^{w},v_{q}^{d}),divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT end_ARG = roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) - roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) , (10)
Δvqw=Δ(vqh,vq+1w,vqd)Δ(vqh,vq1w,vqd),Δsuperscriptsubscript𝑣𝑞𝑤Δsuperscriptsubscript𝑣𝑞superscriptsubscript𝑣𝑞1𝑤superscriptsubscript𝑣𝑞𝑑Δsuperscriptsubscript𝑣𝑞superscriptsubscript𝑣𝑞1𝑤superscriptsubscript𝑣𝑞𝑑\displaystyle\frac{\partial\Delta}{\partial\vec{v}_{q}^{w}}=\Delta(v_{q}^{h},v% _{q+1}^{w},v_{q}^{d})-\Delta(v_{q}^{h},v_{q-1}^{w},v_{q}^{d}),divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT end_ARG = roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) - roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) , (11)
Δvqd=Δ(vqh,vqw,vq+1d)Δ(vqh,vqw,vq1d),Δsuperscriptsubscript𝑣𝑞𝑑Δsuperscriptsubscript𝑣𝑞superscriptsubscript𝑣𝑞𝑤superscriptsubscript𝑣𝑞1𝑑Δsuperscriptsubscript𝑣𝑞superscriptsubscript𝑣𝑞𝑤superscriptsubscript𝑣𝑞1𝑑\displaystyle\frac{\partial\Delta}{\partial\vec{v}_{q}^{d}}=\Delta(v_{q}^{h},v% _{q}^{w},v_{q+1}^{d})-\Delta(v_{q}^{h},v_{q}^{w},v_{q-1}^{d}),divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG = roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) - roman_Δ ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_q - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) , (12)

where xqhsuperscriptsubscript𝑥𝑞x_{q}^{h}italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT, xqwsuperscriptsubscript𝑥𝑞𝑤x_{q}^{w}italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and xqdsuperscriptsubscript𝑥𝑞𝑑x_{q}^{d}italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denote the values of vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in the hhitalic_h, w𝑤witalic_w, and d𝑑ditalic_d directions, and ΔΔ\Deltaroman_Δ denotes the Laplace operator. Moreover, compared to the gradient of the central voxel vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, if a voxel is belonging to the object (inside), their gradient should be similar. Otherwise, the difference between two gradient should be large. Therefore, we can replace the second filter in Eq 8 and obtain the improved Bilateral Filter (IBF):

vq=IBF(vq)=qVGσd(pq)Gσr(ΔvpΔvq)×vqqVGσd(pq)Gσr(ΔvpΔvq),subscriptsuperscript𝑣𝑞IBFsubscript𝑣𝑞subscript𝑞𝑉subscript𝐺subscript𝜎𝑑norm𝑝𝑞subscript𝐺subscript𝜎𝑟Δsubscript𝑣𝑝Δsubscript𝑣𝑞subscript𝑣𝑞subscript𝑞𝑉subscript𝐺subscript𝜎𝑑norm𝑝𝑞subscript𝐺subscript𝜎𝑟Δsubscript𝑣𝑝Δsubscript𝑣𝑞v^{{}^{\prime}}_{q}=\text{IBF}(v_{q})=\frac{\sum_{q\in V}G_{\sigma_{d}}(||p-q|% |)G_{\sigma_{r}}(\frac{\partial\Delta}{\partial\vec{v}_{p}}-\frac{\partial% \Delta}{\partial\vec{v}_{q}})\times v_{q}}{\sum_{q\in V}G_{\sigma_{d}}(||p-q||% )G_{\sigma_{r}}(\frac{\partial\Delta}{\partial\vec{v}_{p}}-\frac{\partial% \Delta}{\partial\vec{v}_{q}})},italic_v start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = IBF ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_q ∈ italic_V end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | | italic_p - italic_q | | ) italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG - divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG ) × italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_q ∈ italic_V end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | | italic_p - italic_q | | ) italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG - divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG ) end_ARG , (13)

where

ΔvpΔvq=(Δvph,Δvpw,Δvpd)(Δvqh,Δvqw,Δvqd).Δsubscript𝑣𝑝Δsubscript𝑣𝑞Δsuperscriptsubscript𝑣𝑝Δsuperscriptsubscript𝑣𝑝𝑤Δsuperscriptsubscript𝑣𝑝𝑑Δsuperscriptsubscript𝑣𝑞Δsuperscriptsubscript𝑣𝑞𝑤Δsuperscriptsubscript𝑣𝑞𝑑\frac{\partial\Delta}{\partial\vec{v}_{p}}-\frac{\partial\Delta}{\partial\vec{% v}_{q}}=(\frac{\partial\Delta}{\partial\vec{v}_{p}^{h}},\frac{\partial\Delta}{% \partial\vec{v}_{p}^{w}},\frac{\partial\Delta}{\partial\vec{v}_{p}^{d}})-(% \frac{\partial\Delta}{\partial\vec{v}_{q}^{h}},\frac{\partial\Delta}{\partial% \vec{v}_{q}^{w}},\frac{\partial\Delta}{\partial\vec{v}_{q}^{d}}).divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG - divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG = ( divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT end_ARG , divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT end_ARG , divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG ) - ( divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT end_ARG , divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT end_ARG , divide start_ARG ∂ roman_Δ end_ARG start_ARG ∂ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG ) . (14)

Consequently, the denoised image is denoted as x~jt=IBF(xjt)subscriptsuperscript~𝑥𝑡𝑗IBFsubscriptsuperscript𝑥𝑡𝑗\widetilde{x}^{t}_{j}=\text{IBF}(x^{t}_{j})over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = IBF ( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

x~jtsubscriptsuperscript~𝑥𝑡𝑗\widetilde{x}^{t}_{j}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is further sent to the teacher network and obtain the pseudo-label with the threshold η𝜂\etaitalic_η. The pseudo-label is further sent back to the student network as a supervision signal for the target flow.

IV Experiments

IV-A Experimental Settings

IV-A1 Datasets and Evaluation Metrics

We conduct experiments on two types of datasets: simulated dataset and real dataset.

Simulated Dataset. The simulated dataset used as source dataset is generated following the same generation process as  [57]. We choose six representative macromolecule complexes in our simulated datasets and divide them into two groups as two separate source datasets (1bxn, 1f1b, and 1yg6; 2byu, 2h12, and 21db). For each macromolecule complex, we simulate it with two different noise levels, with SNR of 0.03 and 0.05, and each of them contains 500 samples. Following existing work [40, 58], all the input subtomogram are resized to 323superscript32332^{3}32 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The simulated dataset contains 6,000 samples in total (3,000 samples for each source dataset).

Real Dataset. The real dataset used as the target dataset is the public dataset Poly-GA [59], which contains 66 26S26𝑆26S26 italic_S subtomograms, 66 TRiC𝑇𝑅𝑖𝐶TRiCitalic_T italic_R italic_i italic_C subtomograms and 901 Ribosome𝑅𝑖𝑏𝑜𝑠𝑜𝑚𝑒Ribosomeitalic_R italic_i italic_b italic_o italic_s italic_o italic_m italic_e subtomograms (1,033 samples in total). Each subtomogram is also re-scaled to size 323superscript32332^{3}32 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.

For evaluation, the mean intersection of union (mIoU) and dice similarity coefficient (Dice) are employed to evaluate the segmentation performance.

IV-A2 Implementation Details

We utilize the VoxResNet as our base architecture. The whole model is trained on a single NVIDIA A100 Tensor Core GPU with 80GB memory. For training, we choose the Adam optimizer with an initial learning rate set to 1e-3 for optimization. The model is trained for 300 epochs with batch size of 16. The learning rate is decayed by 90%percent9090\%90 % every 100 epochs. The hyperparameters sampled number Nsampledsubscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒𝑑N_{sampled}italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e italic_d end_POSTSUBSCRIPT and filter rate ρ𝜌\rhoitalic_ρ are empirically set to 10 and 24.4%percent24.424.4\%24.4 %, separately (see discussions in Sec IV-C). For the improved Bilateral Filter, the domain hyperparameter σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and range hyperparameter σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are set to 120120120120 and 1.21.21.21.2, respectively. We use Sobel operator as the Laplacian operator.

Refer to caption
Figure 6: The t-SNE visualization of learned features. (a) After UDA. (b) Before UDA.

IV-A3 Baselines

As there are no existing methods designed for voxel-level UDA, we implement several traditional and state-of-the-art UDA approaches for 2D image segmentation on our task, including single discriminator-based (DANN [9]) and image synthesizing based (PDAM [12]). And we also include two most recent approaches designed for volumetric images(ASC [16] and LE-UDA [13]), which cut 3D images into 2D slices and apply UDA on 2D scenario. Following existing UDA methods [12, 38, 32], we also set a “w/o adaptation” setting and a “Fully Supervised” setting for comparison. The “w/o adaptation” setting is a original VoxResNet trained on source dataset without adaptation. The “Fully Supervised” setting is a VoXResNet fully supervised trained on the labelled target datasets, as the upper bound.

IV-B Comparisons With State-of-the-arts

We report the segmentation results on the Poly-GA dataset in Table I using the [1bxn, 1f1b, and 1yg6] as the source dataset. As can be observed in the table, our approach outperforms all the state-of-the-art methods. Compared with “w/o adaptation”, DANN and PDAM indeed boost the model’s performance, however, the effect is not obvious compared with our Vox-UDA (w IBF) (i.e., PDAM achieves 55.155.155.155.1 in Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e while Vox-UDA (w IBF) achieves 65.965.965.965.9). And compared with two recent UDA methods, our proposed Vox-UDA (w IBF) still excels on target subtomogram segmentation, which leads to significant improvements in both mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U (i.e., 40.450.340.450.340.4\rightarrow 50.340.4 → 50.3) and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e (i.e., 56.865.956.865.956.8\rightarrow 65.956.8 → 65.9). We also report extra UDA setting results in Table II by using the other three macromoleculars [2byu, 2h12, and 21db] as source datasets, by which our proposed method still achieves state-of-the-art performance over all the comparison approaches. It is worth noting that in both tables, our Vox-UDA even surpasses the “fully supervised” setting on the vast majority of the metrics (i.e., “Fully Supervised” achieves 46.046.046.046.0 in mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U in Table II, while our Vox-UDA (w IBF) achieves 49.549.549.549.5).

IV-B1 Segmentation results Visualization

Fig 4 shows the segmentation results on the Poly-GA dataset using 1bxn, 1f1b and 1yg6 as source dataset. As can be observed, due to the proposed noise generation module can simulate the target noise environment on the source data, our model’s robustness to noise is significantly enhanced (i.e., compared to the segmentation results of ASC and PDAM, ours results focus more on the macromolecules rather than the surrounding noise). Compared with LE-UDA, our segmentation results exhibit better texture details due to the proposed denoised pseudo labeling strategy, which avoids the model being biased towards source data and addresses the domain shift problem. We also provide additional visualization results using 2byu, 2h12 and 21db as source dataset in Fig 5.

IV-B2 Feature Visualization

As shown in Fig 6, we visualize the feature embeddings learned by the segmentation network with the commonly used t-SNE [60] method. Fig 6(a) and Fig 6(b) represent the visualization results of “after adaptation” and “before adaptation”, respectively. As can be observed from the figure, the distribution of the source and target features learned from our proposed method is more consistent compared to the distribution without adaptation. This indicates that our method has achieved knowledge transfer and generalized the model to the target data.

IV-C Ablation Study

TABLE III: Ablation study on the components of our proposed model on Poly-GA dataset. NGM and PL denote noise generation module and pseudo-label, respectively. Best results are in bold font.
Source macromoleculars: 1bxn, 1f1b and 1yg6
Method mIoU mIoUribo mIoU26S mIoUTRiC Dice Diceribo Dice26S DiceTRiC
Baseline 31.9 36.1 3.8 6.2 45.9 51.9 6.4 8.7
w/o NGM 41.9 43.9 20.1 37.9 57.5 60.2 31.0 50.4
w/o PL 46.2 49.1 23.3 31.6 61.8 65.3 36.3 44.0
Vox-UDA(Ours) 50.3 53.8 28.8 41.3 65.9 68.5 44.0 52.8
TABLE IV: Ablation study on the hyperparameters of our proposed model on Poly-GA dataset. Nsample, ρ𝜌\rhoitalic_ρ and λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote the sampled number of target data used for noise generation, the high-pass filter rate and the weights for different losses, respectively.
Source macromoleculars: 1bxn, 1f1b and 1yg6
Nsample mIoU Dice ρ𝜌\rhoitalic_ρ mIoU Dice λ1λ4subscript𝜆1subscript𝜆4\lambda_{1}\rightarrow\lambda_{4}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT mIoU Dice σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT mIoU Dice σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT mIoU Dice
5 41.2 57.1 8.4%percent8.48.4\%8.4 % 41.7 57.3 [0.1, 0.1, 0.4, 0.4] 44.4 60.2 100 49.3 64.9 0.8 47.5 63.2
10 50.3 65.9 17.8%percent17.817.8\%17.8 % 43.5 59.6 [0.2, 0.2, 0.3, 0.3] 50.3 65.9 120 50.3 65.9 1.0 48.0 63.8
15 43.6 59.4 24.4% 50.3 65.9 [0.3, 0.3, 0.2, 0.2] 45.3 61.0 140 49.1 64.5 1.2 50.3 65.9
20 42.6 58.3 42.2%percent42.242.2\%42.2 % 41.0 56.8 [0.4, 0.4, 0.1, 0.1] 42.2 57.8 160 47.0 62.5 1.4 49.5 65.2

IV-C1 Effectiveness of the Improved Bilateral Filter

We conduct a comprehensive set of experiments to validate the effectiveness of the proposed improved Bilateral Filter (IBF) for denoised pseudo-labeling. We conduct experiments using the three different denoisers introduced in Sec III-B respectively, and report the segmentation results in both Table I and Table II. As can be seen from the tables, comparing the method of using NGM as a denoiser, employing Bilateral Filtering (BF) indeed brings a performance improvement (i.e., mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U increased 1.2%percent1.21.2\%1.2 % in Table I and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e increased 1.0%percent1.01.0\%1.0 % in Table II). This is because BF can preserve some edge information while denoising, thereby avoiding pseudo-label distortion. However, as discussed in Sec III-B, range kernel of BF is not suitable for grayscale inputs. Our proposed IBF addresses this drawback by using a Laplacian transform, which allows the range kernel to focus more on gradient changes in the voxel space rather than value changes. Therefore, our new model achieves the best performance via using the proposed improved Bilateral Filter (i.e., mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U increased 2.4%percent2.42.4\%2.4 % in Table I and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e increased 3.3%percent3.33.3\%3.3 % in Table II).

Refer to caption
Figure 7: Ablation study of the proposed noise generation process. We set two different types of noise, Poisson noise and Speckle noise, to replace the Gaussian noise used in NGM for noise generation.

IV-C2 Effectiveness of the Noise Generation Process

As mentioned in Sec III-A, we choose Gaussian noise in NGM for noise generation. To demonstrate the rationality of our choice, we provide additional experiments in Fig 7, using Poisson noise [61] and Speckle noise [62] as the added noise for NGM, respectively. Given the variance σt2subscriptsuperscript𝜎2𝑡\sigma^{2}_{t}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT obtained through the average noise level x¯ntsubscriptsuperscript¯𝑥superscript𝑡𝑛\overline{x}^{t^{\prime}}_{n}over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of the subset, the Poisson noise can be represented as ϵp=π(σt)subscriptitalic-ϵ𝑝𝜋subscript𝜎𝑡\epsilon_{p}=\pi(\sigma_{t})italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = italic_π ( italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and the Speckle noise can be formulated as

ϵs=xis×𝒩(0,σt2𝐈),subscriptitalic-ϵ𝑠subscriptsuperscript𝑥𝑠𝑖𝒩0superscriptsubscript𝜎𝑡2𝐈\epsilon_{s}=x^{s}_{i}\times\mathcal{N}(0,\sigma_{t}^{2}\mathbf{I}),italic_ϵ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) , (15)

where xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the input source image. As can be seen from the figure, compared with the other two noises, using Gaussian noise achieves the best performance.

IV-C3 Effectiveness of Different Proposed Modules

We evaluate our Vox-UDA following the same experimental setting in Table I for the ablation study and use Vox-UDA (w IBF) as the final result of our proposed method. Table III shows the evaluation of the effectiveness of each module in our method. For comparison, we build a baseline only using a single discriminator with VoxResNet, which shows the same model structure as DANN [9]. From Table III, we observe that both “w/o NGM” and “w/o PL” achieve better performance than “Baseline”, which demonstrates the effectiveness of our proposed two modules in dealing with the challenges for UDA in subtomogram segmentation. In the meantime, compared with using these two modules only, Vox-UDA achieves a significant performance improvement (i.e., mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U improved by 15.8%percent15.815.8\%15.8 % and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e improved by 11.0%percent11.011.0\%11.0 %).

IV-C4 Hyperparameter Analysis

We herein further evaluate the hyperparameters in our approach. As shown in Table IV, we evaluate the sampled number Nsamplesubscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒N_{sample}italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT, the high-pass filter rate ρ𝜌\rhoitalic_ρ, the weight λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for consistency losses, the domain hyperparameter σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and range hyperparameter σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Nsamplesubscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒N_{sample}italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT is used to control the number of sampled targets for noise generation. As we discussed in the previous sections, the main goal of the noise generation module is to simulate target-like noises for the inputs from the source domain. Because the noise level of the whole target domain dataset is not evenly distributed, we choose to use random sampling instead of the whole dataset. Therefore, Nsamplesubscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒N_{sample}italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT is the key point. Either being too large or too small for such a number will lead to a negative impact on the model’s performance. As the results reported in Table IV, Nsample=10subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒10N_{sample}=10italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT = 10 achieves the best performance (i.e., mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U increased 18%percent1818\%18 % compared to Nsample=20subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒20N_{sample}=20italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT = 20 and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e increased 11%percent1111\%11 % compared to Nsample=15subscript𝑁𝑠𝑎𝑚𝑝𝑙𝑒15N_{sample}=15italic_N start_POSTSUBSCRIPT italic_s italic_a italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT = 15). ρ𝜌\rhoitalic_ρ is the filter rate to control how much information is retained for further processes. As aforementioned, noise is usually contained in the high-frequency information of 2D or 3D images. However, the object information between the high frequency and the low frequency of the high-pass filter is typically determined by subjective discretion. Hence, we try different percentages of how much high-frequency information should remain to see which works better in our framework. Experimental results prove that ρ=24.4%𝜌percent24.4\rho=24.4\%italic_ρ = 24.4 % works the best in our proposed approach (i.e., mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U increased 16%percent1616\%16 % compared to ρ=17.8%𝜌percent17.8\rho=17.8\%italic_ρ = 17.8 % and Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e increased 10%percent1010\%10 % compared to ρ=8.4%𝜌percent8.4\rho=8.4\%italic_ρ = 8.4 %). λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the weight we set to control the relative importance among different consistency losses. As mentioned in Sec III-A that for segmentation tasks, high-level features focus more on the edge details and low-level features focus on the textual information, we set the same value for λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for low-level features, and the same value for λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and λ4subscript𝜆4\lambda_{4}italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT for the high-level ones (see Eq.7). As can be seen in Table IV, λ1=0.2subscript𝜆10.2\lambda_{1}=0.2italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.2, λ2=0.2subscript𝜆20.2\lambda_{2}=0.2italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.2, λ3=0.3subscript𝜆30.3\lambda_{3}=0.3italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.3 and λ4=0.3subscript𝜆40.3\lambda_{4}=0.3italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 0.3 achieve the best performance compared to other settings (i.e., our approach increases 19%percent1919\%19 % on mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U and 14%percent1414\%14 % on Dice𝐷𝑖𝑐𝑒Diceitalic_D italic_i italic_c italic_e). We further evaluate the effectiveness of the σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Experimental results demonstrate that our settings are the most reasonable, increasing or altering σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT would not lead to an improvement in the model’s performance.

V Conclusion

In this paper, we propose the first voxel-level unsupervised domain adaptation approach, termed Vox-UDA, for the subtomogram segmentation task. In detail, our Vox-UDA consists of a Noise Generation Module (NGM) and a denoised pseudo-labeling (DPL) strategy. NGM takes a subset of target samples as input and generates target-like Gaussian noise for the source domain data. DPL is based on a student-teacher learning framework, using a denoised target domain data to produce pseudo label for the original target data to boost the segmentation performance by reducing the effect of domain shift. Additionally, we propose an improved bilateral filter (IBF) to provide denoised target data for DPL, thereby enhancing the quality of the pseudo labels. The proposed IBF utilizes a 3D Laplacian operator to calculate the gradient of each voxel in the hhitalic_h, w𝑤witalic_w, and d𝑑ditalic_d directions, and replaces value differences with gradient differences to enhance the performance of bilateral filtering in the grayscale space. We have conducted large-scale experiments to demonstrate the prominent performance of our method. We anticipate our novel method can contribute more to the research in cryo-ET in terms of methodology and possibly enhanced intepretability. Furthermore, we would propose that future research endeavors focus on enhancing the scalability of our method for a broader range of biomedical 3D image segmentation tasks.

Acknowledgments

The authors acknowledge NVIDIA and its research support team for the help provided to conduct this work. This work was partially supported by the Australian Research Council (ARC) Industrial Transformation Training Centres (IITC) for Innovative Composites for the Future of Sustainable Mining Equipment under Grant IC220100028. This work was partially supported by U.S. NIH grants R01GM134020 and P41GM103712, NSF grants DBI-1949629, DBI-2238093, IIS-2007595, IIS-2211597, and MCB-2205148. This work was supported in part by Oracle Cloud credits and related resources provided by Oracle for Research, and the computational resources support from AMD HPC Fund.

References

  • [1] C. M. Oikonomou and G. J. Jensen, “Cellular electron cryotomography: toward structural biology in situ,” Annual review of biochemistry, vol. 86, pp. 873–896, 2017.
  • [2] W. Wan and J. A. Briggs, “Cryo-electron tomography and subtomogram averaging,” Methods in enzymology, vol. 579, pp. 329–367, 2016.
  • [3] X. Zhu, J. Chen, X. Zeng, J. Liang, C. Li, S. Liu, S. Behpour, and M. Xu, “Weakly supervised 3d semantic segmentation using cross-image consensus and inter-voxel affinity relations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision.   IEEE, 2021, pp. 2834–2844.
  • [4] B. Zhou, H. Yu, X. Zeng, X. Yang, J. Zhang, and M. Xu, “One-shot learning with attention-guided segmentation in cryo-electron tomography,” Frontiers in Molecular Biosciences, vol. 7, p. 613347, 2021.
  • [5] J. E. Heebner, C. Purnell, R. K. Hylton, M. Marsh, M. A. Grillo, and M. T. Swulius, “Deep learning-based segmentation of cryo-electron tomograms,” JoVE (Journal of Visualized Experiments), no. 189, p. e64435, 2022.
  • [6] H. Zhu, C. Wang, Y. Wang, Z. Fan, M. R. Uddin, X. Gao, J. Zhang, X. Zeng, and M. Xu, “Unsupervised multi-task learning for 3d subtomogram image alignment, clustering and segmentation,” in 2022 IEEE International Conference on Image Processing (ICIP).   IEEE, 2022, pp. 2751–2755.
  • [7] C. Li, D. Liu, H. Li, Z. Zhang, G. Lu, X. Chang, and W. Cai, “Domain adaptive nuclei instance segmentation and classification via category-aware feature alignment and pseudo-labelling,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2022, pp. 715–724.
  • [8] P. Naylor, M. Laé, F. Reyal, and T. Walter, “Segmentation of nuclei in histopathology images by deep regression of the distance map,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 448–459, 2018.
  • [9] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016.
  • [10] G. van Tulder and M. de Bruijne, “Unpaired, unsupervised domain adaptation assumes your domains are already similar,” Medical Image Analysis, vol. 87, p. 102825, 2023.
  • [11] J. Zhang, H. Chao, A. Dhurandhar, P.-Y. Chen, A. Tajer, Y. Xu, and P. Yan, “Spectral adversarial mixup for few-shot unsupervised domain adaptation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2023, pp. 728–738.
  • [12] D. Liu, D. Zhang, Y. Song, F. Zhang, L. O’Donnell, H. Huang, M. Chen, and W. Cai, “Pdam: A panoptic-level feature alignment framework for unsupervised domain adaptive instance segmentation in microscopy images,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 154–165, 2020.
  • [13] Z. Zhao, F. Zhou, K. Xu, Z. Zeng, C. Guan, and S. K. Zhou, “Le-uda: Label-efficient unsupervised domain adaptation for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 42, no. 3, pp. 633–646, 2022.
  • [14] S. Cicek, N. Xu, Z. Wang, H. **, and S. Soatto, “Disentangled image generation for unsupervised domain adaptation,” in European Conference on Computer Vision.   Springer, 2020, pp. 662–665.
  • [15] H. Shin, H. Kim, S. Kim, Y. Jun, T. Eo, and D. Hwang, “Sdc-uda: Volumetric unsupervised domain adaptation framework for slice-direction continuous cross-modality medical image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.   IEEE, 2023, pp. 7412–7421.
  • [16] Z. Xu, H. Gong, X. Wan, and H. Li, “Asc: Appearance and structure consistency for unsupervised domain adaptation in fetal brain mri segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2023, pp. 325–335.
  • [17] J. Xian, X. Li, D. Tu, S. Zhu, C. Zhang, X. Liu, X. Li, and X. Yang, “Unsupervised cross-modality adaptation via dual structural-oriented guidance for 3d medical image segmentation,” IEEE Transactions on Medical Imaging, 2023.
  • [18] F. Eisenstein, R. Danev, and M. Pilhofer, “Improved applicability and robustness of fast cryo-electron tomography data acquisition,” Journal of structural biology, vol. 208, no. 2, pp. 107–114, 2019.
  • [19] W. J. Hagen, W. Wan, and J. A. Briggs, “Implementation of a cryo-electron tomography tilt-scheme optimized for high resolution subtomogram averaging,” Journal of structural biology, vol. 197, no. 2, pp. 191–198, 2017.
  • [20] A. Martinez-Sanchez, L. Lamm, M. Jasnin, and H. Phelippeau, “Simulating the cellular context in synthetic datasets for cryo-electron tomography,” IEEE Transactions on Medical Imaging, pp. 1–1, 2024.
  • [21] P. Harar, L. Herrmann, P. Grohs, and D. Haselbach, “Faket: Simulating cryo-electron tomograms with neural style transfer,” arXiv preprint arXiv:2304.02011, 2023.
  • [22] A. Sharma, T. Kalluri, and M. Chandraker, “Instance level affinity-based transfer for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.   IEEE, 2021, pp. 5361–5371.
  • [23] N. Xiao and L. Zhang, “Dynamic weighted learning for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.   IEEE, 2021, pp. 15 242–15 251.
  • [24] J. Zhang, J. Huang, Z. Tian, and S. Lu, “Spectral unsupervised domain adaptation for visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.   IEEE, 2022, pp. 9829–9840.
  • [25] J. Yang, J. Liu, N. Xu, and J. Huang, “Tvt: Transferable vision transformer for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.   IEEE, 2023, pp. 520–530.
  • [26] D. Guan, J. Huang, A. Xiao, S. Lu, and Y. Cao, “Uncertainty-aware unsupervised domain adaptation in object detection,” IEEE Transactions on Multimedia, vol. 24, pp. 2502–2514, 2021.
  • [27] F. Yu, D. Wang, Y. Chen, N. Karianakis, T. Shen, P. Yu, D. Lymberopoulos, S. Lu, W. Shi, and X. Chen, “Sc-uda: Style and content gaps aware unsupervised domain adaptation for object detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.   IEEE, 2022, pp. 382–391.
  • [28] G. Mattolin, L. Zanella, E. Ricci, and Y. Wang, “Confmix: Unsupervised domain adaptation for object detection via confidence-based mixing,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.   IEEE, 2023, pp. 423–433.
  • [29] J. Yoo, I. Chung, and N. Kwak, “Unsupervised domain adaptation for one-stage object detector using offsets to bounding box,” in European Conference on Computer Vision.   Springer, 2022, pp. 691–708.
  • [30] J. Dong, Y. Cong, G. Sun, Z. Fang, and Z. Ding, “Where and how to transfer: Knowledge aggregation-induced transferability perception for unsupervised domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1664–1681, 2021.
  • [31] S. Lee, J. Hyun, H. Seong, and E. Kim, “Unsupervised domain adaptation for semantic segmentation by content transfer,” in Proceedings of the AAAI conference on Artificial Intelligence, vol. 35, no. 9, 2021, pp. 8306–8315.
  • [32] J. Zhu, Y. Guo, G. Sun, L. Yang, M. Deng, and J. Chen, “Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023.
  • [33] X. Zhao, N. C. Mithun, A. Rajvanshi, H.-P. Chiu, and S. Samarasekera, “Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.   IEEE, 2024, pp. 2399–2409.
  • [34] Y. Zou, Z. Yu, B. Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in Proceedings of the European conference on computer vision (ECCV).   Springer, 2018, pp. 289–305.
  • [35] X. Zheng, J. Zhu, Y. Liu, Z. Cao, C. Fu, and L. Wang, “Both style and distortion matter: Dual-path unsupervised domain adaptation for panoramic semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.   IEEE, 2023, pp. 1285–1295.
  • [36] Y. Zhang, Y. Wang, L. Xu, Y. Yao, W. Qian, and L. Qi, “St-gan: A swin transformer-based generative adversarial network for unsupervised domain adaptation of cross-modality cardiac segmentation,” IEEE Journal of Biomedical and Health Informatics, 2023.
  • [37] Q. Xie, Y. Li, N. He, M. Ning, K. Ma, G. Wang, Y. Lian, and Y. Zheng, “Unsupervised domain adaptation for medical image segmentation by disentanglement learning and self-training,” IEEE Transactions on Medical Imaging, 2022.
  • [38] W. Ji and A. C. Chung, “Unsupervised domain adaptation for medical image segmentation using transformer with meta attention,” IEEE Transactions on Medical Imaging, 2023.
  • [39] R. I. Koning, “Cryo-electron tomography of cellular microtubules,” Methods in cell biology, vol. 97, pp. 455–473, 2010.
  • [40] X. Zeng and M. Xu, “Gum-net: Unsupervised geometric matching for fast and accurate 3d subtomogram image alignment and averaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.   IEEE, 2020, pp. 4073–4084.
  • [41] F. P. de Isidro-Gómez, J. Vilas, P. Losana, J. Carazo, and C. O. S. Sorzano, “A deep learning approach to the automatic detection of alignment errors in cryo-electron tomographic reconstructions,” Journal of Structural Biology, vol. 216, no. 1, p. 108056, 2024.
  • [42] W. Wan, S. Khavnekar, and J. Wagner, “Stopgap: an open-source package for template matching, subtomogram alignment and classification,” Acta Crystallographica Section D: Structural Biology, vol. 80, no. 5, 2024.
  • [43] X. Du, H. Wang, Z. Zhu, X. Zeng, Y.-W. Chang, J. Zhang, E. Xing, and M. Xu, “Active learning to classify macromolecular structures in situ for less supervision in cryo-electron tomography,” Bioinformatics, vol. 37, no. 16, pp. 2340–2346, 2021.
  • [44] N. Nguyen, C. Bohak, D. Engel, P. Mindek, O. Strnad, P. Wonka, S. Li, T. Ropinski, and I. Viola, “Finding nano-ötzi: cryo-electron tomography visualization guided by learned segmentation,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 10, pp. 4198–4214, 2022.
  • [45] M. Siggel, R. K. Jensen, V. J. Maurer, J. Mahamid, and J. Kosinski, “Colabseg: An interactive tool for editing, processing, and visualizing membrane segmentations from cryo-et data,” Journal of Structural Biology, p. 108067, 2024.
  • [46] H. Bandyopadhyay, Z. Deng, L. Ding, S. Liu, M. R. Uddin, X. Zeng, S. Behpour, and M. Xu, “Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization,” Bioinformatics, vol. 38, no. 4, pp. 977–984, 2022.
  • [47] H. Chen, Q. Dou, L. Yu, J. Qin, and P.-A. Heng, “Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images,” NeuroImage, vol. 170, pp. 446–455, 2018.
  • [48] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  • [49] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8162–8171.
  • [50] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2noise: Learning image restoration without clean data,” in International Conference on Machine Learning.   PMLR, 2018, pp. 2965–2974.
  • [51] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, and T. E. Ferrin, “Ucsf chimera—a visualization system for exploratory research and analysis,” Journal of computational chemistry, vol. 25, no. 13, pp. 1605–1612, 2004.
  • [52] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
  • [53] K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Advances in neural information processing systems, vol. 33, pp. 596–608, 2020.
  • [54] M. Elad, “On the origin of the bilateral filter and ways to improve it,” IEEE Transactions on image processing, vol. 11, no. 10, pp. 1141–1151, 2002.
  • [55] L. Wang, Y. Zhang, and J. Feng, “On the euclidean distance of images,” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1334–1339, 2005.
  • [56] L. J. Van Vliet, I. T. Young, and G. L. Beckers, “A nonlinear laplace operator as edge detector in noisy images,” Computer vision, graphics, and image processing, vol. 45, no. 2, pp. 167–195, 1989.
  • [57] X. Zeng, A. Kahng, L. Xue, J. Mahamid, Y.-W. Chang, and M. Xu, “High-throughput cryo-et structural pattern mining by unsupervised deep iterative subtomogram clustering,” Proceedings of the National Academy of Sciences, vol. 120, no. 15, p. e2213149120, 2023.
  • [58] X. Liao, W. Li, Q. Xu, X. Wang, B. **, X. Zhang, Y. Wang, and Y. Zhang, “Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.   IEEE, 2020, pp. 9394–9402.
  • [59] Q. Guo, C. Lehmer, A. Martínez-Sánchez, T. Rudack, F. Beck, H. Hartmann, M. Pérez-Berlanga, F. Frottin, M. S. Hipp, F. U. Hartl et al., “In situ structure of neuronal c9orf72 poly-ga aggregates reveals proteasome recruitment,” Cell, vol. 172, no. 4, pp. 696–705, 2018.
  • [60] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
  • [61] M. Carlavan and L. Blanc-Féraud, “Sparse poisson noisy image deblurring,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1834–1846, 2011.
  • [62] A. Maity, A. Pattanaik, S. Sagnika, and S. Pani, “A comparative study on approaches to speckle noise reduction in images,” in International Conference on Computational Intelligence and Networks.   IEEE, 2015, pp. 148–155.