Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling
Abstract
Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of fully-supervised approaches for cryo-ET images. Some unsupervised domain adaptation (UDA) approaches have been designed to enhance the segmentation network performance using unlabeled data. However, applying these methods directly to cryo-ET images segmentation tasks remains challenging due to two main issues: 1) the source data, usually obtained through simulation, contain a certain level of noise, while the target data, directly collected from raw-data from real-world scenario, have unpredictable noise levels. 2) the source data used for training typically consists of known macromoleculars, while the target domain data are often unknown, causing the model’s segmenter to be biased towards these known macromolecules, leading to a domain shift problem. To address these challenges, in this work, we introduce the first voxel-wise unsupervised domain adaptation approach, termed Vox-UDA, specifically for cryo-ET subtomogram segmentation. Vox-UDA incorporates a noise generation module to simulate target-like noises in the source dataset for cross-noise level adaptation. Additionally, we propose a denoised pseudo-labeling strategy based on improved Bilateral Filter to alleviate the domain shift problem. Experimental results on both simulated and real cryo-ET subtomogram datasets demonstrate the superiority of our proposed approach compared to state-of-the-art UDA methods.
Index Terms:
Cryo-Electron Tomography, Volumetric image segmentation, Unsupervised domain adaptation, Deep learning.I Introduction
Cryo-Electron Tomography (cryo-ET) is one cutting-edge imaging technique which enables three-dimensional views of biological samples in a native frozen-hydrated state [1]. This automatic electron tomography technique allows biologists to capture high-resolution structures of macromolecular complexes [2], which plays an important role in the field of drug discovery and disease treatment. Inspired by the development of deep learning research in recent years, some efforts have been made in cryo-ET image analysis, especially for the subtomogram segmentation task [3, 4, 5, 6]. Subtomogram segmentation is a 3D segmentation task which aims to mine the meaningful information of the target macromolecular on the voxel-level. However, existing methods [4, 5, 7, 8] heavily rely on manual annotations which are highly subjective and resource-intensive.
![Refer to caption](x1.png)
To tackle the challenges for data annotation, the classical unsupervised domain adaptation (UDA) method involves transferring the knowledge from labeled source domains to unlabelled target domains. Ganin et al. [9] proposed the first UDA approach through adversarial learning, which has become the most commonly used framework for UDA tasks [10, 11]. Some other works [12, 13, 14] have proposed generation-based approaches which synthesizes target-like images from the source ones, and applies supervised learning using the synthesized data with their original groundtruth mask. However, these approaches were primarily designed for 2D images, and can not perform well for 3D tasks. Some recent approaches have explored UDA on 3D images [15, 16, 17], however, all those volumetric UDA approaches firstly cut 3D input into 2D slices for the network input, leading to the loss of spatial information.
In this paper, we introduce one UDA approach using the large simulated macromolecular data [18, 19] as the source domain dataset and the real dataset as the target domain dataset. With the development of data simulation techniques, the acquisition of cryo-ET subtomogram data is no longer limited to traditional biological methods. Given the structure of macromolecules, existing generative methods [20, 21] can directly produce realistic synthetic datasets with corresponding voxel-level segmentation masks, which can be seen as a zero-cost alternative compared to traditional methods which requires high-end equipment and enormous human expertise. Nevertheless, the significant disparities between two domains bring new challenges for the UDA task. Firstly, the simulated data is generated through fixed parameters, yielding a fixed value of the noise in each subtomogram (often 0.03 dB or 0.05 dB), while the noise rate is unpredictable in the real dataset. Some examples of the subtomograms are shown in Fig. 1. Secondly, although subtomogram segmentation is a binary segmentation task, the simulated subtomograms and the real ones often may not share the same molecular categories, which will cause the segmentation network biased to the simulated ones and lead to the domain shift problem.
To address the challenges aforementioned, we propose a voxel-wise UDA framework, termed Vox-UDA, for cryo-ET subtomogram segmentation. Vox-UDA consists of a noise generation module (NGM) and a denoised pseudo-labeling (DPL) strategy. NGM generates Gaussian noise from a subset of the target dataset and applies it to the source samples to create a target-like noisy phenomenon. Meanwhile, DPL improves the existing bilateral filter, making it more suitable for 3D grayscale images through modifying the pixel difference of one Gaussian kernel to gradient difference. While denoising, DPL preserves edge information as much as possible to obtain undistorted pseudo-labels. These pseudo-labels provide additional supervision signals to address the domain shift problem, thereby enhancing the model’s performance on the target data.
In a nutshell, our contributions are as follows:
-
To the best of our knowledge, herein we are the first to esablish a paradigm for voxel-wise UDA segmentation in cryo-ET images (termed Vox-UDA). Our approach eliminates the reliance on large amounts of labeled real data by transferring knowledge learned from zero-cost simulated data to the real ones, enabling segmentation on real cryo-ET subtomograms.
-
Our Vox-UDA includes a noise generation module (NGM) and a denoised pseudo-labeling (DPL) strategy to enable the simulation of target-like noisy phenomenon, and it provide additional supervision signals to address the domain shift problem.
-
We propose an improved bilateral filter that, by being sensitive to the changes in gradients, preserves edge information as much as possible while eliminating noises in order to obtain high-quality pseudo-labels.
-
The extensive experimental results demonstrate the superiority of Vox-UDA method over state-of-the-art UDA methods on subtomogram segmentation. Besides, our method even outperforms fully supervised methods on some metrics.
The rest of the paper is organized as follows. A brief literature review is presented in Section II. We provide the details of our proposed Vox-UDA in Section III. Experimental results and visualizations are shown in Section IV, followed be the conclusion in Section V.
![Refer to caption](x2.png)
II Related Work
II-A Unsupervised Domain Adaptation for Vision Tasks
Under the unsupervised domain adaptation (UDA) settings, there are two types of dataset being used for training: the source domain dataset which is fully labelled, and the target domain dataset, which is unlabelled. The first UDA approach is proposed by [9], which aims to transfer the model trained on source data to target data without introducing additional annotations through adversarial learning. Since UDA greatly expands the model’s generalization ability, the model can be adapted to new domains without requiring labeled data in the target domain and is introduced into various tasks, e.g., classification [22, 23, 24, 25], object detection [26, 27, 28, 29] and semantic segmentation [30, 31, 32, 33]. As cryo-ET subtomogram segmentation is a segmentation task, in this paper, we mainly focus on the UDA approaches applied in semantic segmentation tasks. For segmentation, the UDA methods aim to eliminate the cross-domain discrepancies through the content at both the feature- and pixel-level. Zou et al. proposes a class-aware UDA approach based on self-training to handle the class-imbalance problem [34]. Zheng et al. designs a dual-path framework, which fuses equirectangular projections and tangent projection for Panoramic Semantic Segmentation [35]. Additionally, UDA has achieved excellent results in medical image segmentation [36, 37, 7]. Ji et al. introduces an attention-based method, which learns the hierarchical consistencies and transfer more discriminative information between the source and target domain [38]. However, although these methods achieves great performance in UDA segmentation, they are primarily designed for 2D images, which are not suitable for cryo-ET subtomogram segmentation as tomographies are often volumetric images. To handle this UDA challenge in 3D segmentation tasks, Shin et al. [15] proposed a cross-modality translation method to generate synthetic 3D target volumes from source 2D scans. Xu et al. [16] applied a fast Fourier transform to convert input 2D slices into frequency domain. A consistency loss was utilised to simultaneously constrain both the feature domain and frequency domain to achieve UDA. As discussed in Sec Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling, all the exiting 3D UDA approaches convert images into 2D UDA tasks by slicing, rather than directly transferring in the three-dimensional voxel space, which will lead to information loss.
II-B Cryo-Electron Tomography
Cryo-electron tomography (cryo-ET) integrates cryogenic specimen preparation, electron microscopy for data acquisition, and tomographic reconstruction for 3D visualization [39]. This technique allows to capture structural information of macromolecules in an ultra-low temperature environment, which holds significant importance in the fields of biology and medicine. A cryo-ET subtomogram is a small cubic sub-volume extracted from a tomogram, normally only a single macromolecular complex is contained in each subtomogram. Inspired by recent advancements in deep learning, their applications on cryo-ET have drawn widespread interests with their potential to aid in the corresponding cryo-ET tasks, e.g., subtomogram alignment [40, 41], subtomogram classification [42, 43] and subtomogram segmentation [6, 44, 45]. However, deep learning methods rely on large amounts of data annotation, which is particularly challenging for cryo-ET images. Bandyopadhyay et al. proposes a domain randomization-based approach to enhance the generalization performance of the model in subtomogram classification [46]. Zhu et al. proposed a weakly supervised approach which used only 2D-level annotation for voxel segmentation to alleviate the burden of annotation [3]. In this paper, we will propose an UDA approach for subtomogram segmentation, which aims at utilizing a large amount of cost-free annotated simulated data for knowledge transfer, enabling the segmentation network to generalize on real cryo-ET subtomograms.
III Method
Our proposed framework is based on VoxResNet [47], a state-of-the-art method designed for fully-supervised voxel-level segmentation. As can be seen from Fig. 2, VoxResNet takes the combination of the outputs from the second convolution layer, the second VoxReS module, the fourth VoxReS module, and the last VoxReS module as its final output.
Given a source domain dataset and a target domain dataset , where represents the input 3D subtomogram and denotes the 3D groundtruth mask, we aim to train a voxel segmentation network for the target domain only using groundtruth supervision signals from the source domain. Fig. 2 illustrates the details of the proposed Vox-UDA. As can be seen from the figure, Vox-UDA takes , and a subset of as input. This subset is randomly sampled from , which contains samples. is then sent to the noise generation module (NGM) to obtain the target-like voxel-wise Gaussian noise . Further, is introduced to the source input to produce updated input . , and are all passed to the student network to acquire segmentation loss , consistency loss and discriminator loss for optimization. Following [7, 16], we set the same weight for different losses. Hence, the overall loss can be rewritten as
(1) |
Furthermore, to handle the domain shift problem, we design a denoised pseudo-labeling strategy. is sent to the improved Bilateral Filter (IBF) to eliminate its noise and then sent to the teacher network to obtain the pseudo-label, which is then used to tune the student network for better performance. Noted that the threshold used for pseudo-label selection is set to and the teacher network is updated via exponential moving average (EMA).
III-A Noise Generation Module
Inspired by recent approaches, such as [48, 49, 50], using the segmentation network as the denoiser for noise elimination in 2D space, we extend this insight to three-dimensional space and propose a new noise generation module. Given a sample from the input , we first apply Discrete Fourier transform (DFT) to obtain its frequency information
(2) |
where , and represent the spatial frequencies of the Fourier transform, and denotes the Discrete Fourier transform. In the frequency domain, low-frequency information corresponds to the textural details of the target object, while large amounts of noise with little edge information about the object are usually encompassed into the high-frequency information. To obtain the noise encompassed in the high-frequency information, is then passed to a high-pass filter to eliminate the textural details contained in low-frequency information
(3) |
where denotes the high-pass filter. The filter rate is set to , which means only remains while the rest of them are filtered (see Sec. IV-C for detailed discussions). Inverse Discrete Fourier transform (iDFT) is further applied to recover voxel-level information from the filtered frequency domain :
(4) |
![Refer to caption](x3.png)
where denotes the Inverse Discrete Fourier Transform. As discussed in Sec Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling, the noise level of each input from the target domain is unpredictable, hence instead of using the noise from single , we calculate the average noise level from the whole subset. Since deep learning models are more sensitive to noise that conforms to a probability distribution [50], we set the Gaussian Noise as the input noise for noise generation (we also compare other types of noise, ablation studies are provided in Sec IV-C2). Therefore, instead of directly introducing to the source input , we only take its variance and generate a Gaussian noise based on
(5) |
where denotes a random generated Gaussian noise with expectation equals to , and variance equals to . And the updated source input is obtained through
(6) |
Source macromoleculars: 1bxn, 1f1b and 1yg6 | ||||||||
Method | mIoU | mIoUribo | mIoU26S | mIoUTRiC | Dice | Diceribo | Dice26S | DiceTRiC |
w/o adaptation | 9.7 | 11.1 | 2.6 | 2.6 | 17.4 | 19.9 | 5.1 | 5.0 |
Fully Supervised | 46.0 | 49.5 | 20.0 | 34.6 | 61.6 | 65.7 | 30.0 | 48.3 |
DANN [9] | 38.4 | 43.0 | 6.5 | 11.7 | 53.0 | 59.1 | 11.2 | 16.8 |
PDAM [12] | 39.8 | 43.3 | 13.8 | 22.6 | 55.1 | 59.6 | 21.5 | 31.9 |
ASC [16] | 40.4 | 43.4 | 19.2 | 23.3 | 55.8 | 59.7 | 28.7 | 32.7 |
LE-UDA [13] | 41.5 | 44.7 | 18.4 | 23.9 | 56.8 | 61.0 | 28.4 | 32.6 |
Vox-UDA(w NGM) | 48.5 | 50.6 | 32.2 | 38.6 | 64.4 | 66.8 | 47.1 | 51.5 |
Vox-UDA(w BF) | 49.1 | 50.4 | 30.5 | 39.2 | 64.5 | 67.1 | 46.9 | 50.7 |
Vox-UDA(w IBF) | 50.3 | 53.8 | 28.8 | 41.3 | 65.9 | 68.5 | 44.0 | 52.8 |
![Refer to caption](x4.png)
and are both sent to the student network to obtain the consistency loss . Following VoxResNet, we also take the output feature embeddings from the same layers as the input of the loss function
(7) | ||||
where denotes the cosine similarity loss and denotes the weights to control the relative importance among different consistency losses. Although NGM is introduced to simulate target-like noises, it is impossible to create a noise environment in the source domain that is entirely the same as the target domain. On the other hand, while the shallower layers of the decoder containing more textural information, the deeper layers contain more edge information [52]. To overcome these constraints, our solution is, instead of an equal superposition, we assign different weights to the different layers to control the weighting of texture and edge consistency losses (see detail discussions in Sec IV-C).
III-B Denoised Pseudo-Labeling
Although the NGM can narrow noise level gaps between two domains, as aforementioned, the segmentation network is still biased to the source data due to the domain shift problem. Therefore, we provide an extra supervision signal for optimization through pseudo-labeling. However, due to the noise level being unknown in the target domain and also that such noise may lead to distorted pseudo-labels further harming the performance of the model, we propose a denoised pseudo-labeling strategy instead. Unlike the existing pseudo-labeling method whereby adding an extra training step, we use the student-teacher structure [53]. Before is sent to the teacher network to obtain the pseudo-label, we first perform denoising on . We designed three different denoising methods: directly using NGM for denoising, using Bilateral Filter for denoising, and using our designed improved Bilateral Filter (IBF) for noise reduction.
Source macromoleculars: 2byu, 2h12 and 21db | ||||||||
Method | mIoU | mIoUribo | mIoU26S | mIoUTRiC | Dice | Diceribo | Dice26S | DiceTRiC |
w/o adaptation | 12.7 | 14.2 | 3.7 | 3.0 | 22.2 | 24.7 | 7.2 | 5.8 |
Fully Supervised | 46.0 | 49.5 | 20.0 | 34.6 | 61.6 | 65.7 | 30.0 | 48.3 |
DANN [9] | 31.9 | 36.1 | 3.8 | 6.2 | 45.9 | 51.9 | 6.4 | 8.7 |
PDAM [12] | 39.1 | 43.1 | 10.9 | 15.7 | 54.1 | 59.4 | 17.7 | 24.0 |
ASC [16] | 41.7 | 45.2 | 24.7 | 13.3 | 56.9 | 61.2 | 38.1 | 19.8 |
LE-UDA [13] | 43.1 | 46.4 | 21.9 | 22.3 | 58.6 | 62.6 | 33.8 | 32.4 |
Vox-UDA(w NGM) | 47.5 | 50.1 | 27.5 | 34.7 | 63.2 | 66.3 | 41.0 | 46.8 |
Vox-UDA(w BF) | 48.0 | 50.3 | 27.8 | 34.4 | 63.8 | 66.7 | 39.9 | 47.0 |
Vox-UDA(w IBF) | 49.5 | 52.4 | 28.3 | 35.1 | 65.2 | 68.9 | 41.3 | 47.7 |
![Refer to caption](x5.png)
NGM Denoising. The is directly sent to the NGM to obtain its noise . Hence, the denoised image can be represented as .
Bilateral Filter Denoising. Although noise can be partially removed through frequency domain analysis, some edge information will also be eliminated, leading to distortion of the pseudo-labels. Therefore, we further deploy a non-linear approach, Bilateral Filter [54], as the denoiser instead of the NGM. Bilateral filter (BF) consists of a domain Gaussian kernel and a range Gaussian kernel, the former is used to eliminate the noises, and the later is to retain edge information as much as possible during filtering. BF uses a sliding window, extracting a sub-figure for filtering operations each time. Given the central voxel and the rest voxels of the sub-figure, the updated voxel can be represented as
(8) |
where denotes the Euclidean distance [55], and denote the domain hyperparameter and range hyperparameter, and denotes the Gaussian kernel
(9) |
Hence, the denoised image can be represented as .
IBF Denoising. The key point of the Bilateral Filter is the design of using two separate Gaussian kernels for different tasks. However, the range kernel introduced for retaining edge information mainly focuses on the voxel-level color difference, which indeed can achieve satisfactory results in the RGB space, but in the grayscale space, there might be more differences in brightness, which coulc affect the effectiveness of this kernel. Therefore, we further propose an improved Bilateral Filter (IBF), which uses the gradient of each voxel instead of its value for the range kernel for edge retaining. In detail, we reflect Laplace operator [56] into 3-dimension and calculate the gradient of each voxel in the , , and (height, width and depth) directions in a three-dimensional space. Since voxel space is discrete, the gradient of in each direction can be represented as
(10) | |||
(11) | |||
(12) |
where , and denote the values of in the , , and directions, and denotes the Laplace operator. Moreover, compared to the gradient of the central voxel , if a voxel is belonging to the object (inside), their gradient should be similar. Otherwise, the difference between two gradient should be large. Therefore, we can replace the second filter in Eq 8 and obtain the improved Bilateral Filter (IBF):
(13) |
where
(14) |
Consequently, the denoised image is denoted as .
is further sent to the teacher network and obtain the pseudo-label with the threshold . The pseudo-label is further sent back to the student network as a supervision signal for the target flow.
IV Experiments
IV-A Experimental Settings
IV-A1 Datasets and Evaluation Metrics
We conduct experiments on two types of datasets: simulated dataset and real dataset.
Simulated Dataset. The simulated dataset used as source dataset is generated following the same generation process as [57]. We choose six representative macromolecule complexes in our simulated datasets and divide them into two groups as two separate source datasets (1bxn, 1f1b, and 1yg6; 2byu, 2h12, and 21db). For each macromolecule complex, we simulate it with two different noise levels, with SNR of 0.03 and 0.05, and each of them contains 500 samples. Following existing work [40, 58], all the input subtomogram are resized to . The simulated dataset contains 6,000 samples in total (3,000 samples for each source dataset).
Real Dataset. The real dataset used as the target dataset is the public dataset Poly-GA [59], which contains 66 subtomograms, 66 subtomograms and 901 subtomograms (1,033 samples in total). Each subtomogram is also re-scaled to size .
For evaluation, the mean intersection of union (mIoU) and dice similarity coefficient (Dice) are employed to evaluate the segmentation performance.
IV-A2 Implementation Details
We utilize the VoxResNet as our base architecture. The whole model is trained on a single NVIDIA A100 Tensor Core GPU with 80GB memory. For training, we choose the Adam optimizer with an initial learning rate set to 1e-3 for optimization. The model is trained for 300 epochs with batch size of 16. The learning rate is decayed by every 100 epochs. The hyperparameters sampled number and filter rate are empirically set to 10 and , separately (see discussions in Sec IV-C). For the improved Bilateral Filter, the domain hyperparameter and range hyperparameter are set to and , respectively. We use Sobel operator as the Laplacian operator.
![Refer to caption](x6.png)
IV-A3 Baselines
As there are no existing methods designed for voxel-level UDA, we implement several traditional and state-of-the-art UDA approaches for 2D image segmentation on our task, including single discriminator-based (DANN [9]) and image synthesizing based (PDAM [12]). And we also include two most recent approaches designed for volumetric images(ASC [16] and LE-UDA [13]), which cut 3D images into 2D slices and apply UDA on 2D scenario. Following existing UDA methods [12, 38, 32], we also set a “w/o adaptation” setting and a “Fully Supervised” setting for comparison. The “w/o adaptation” setting is a original VoxResNet trained on source dataset without adaptation. The “Fully Supervised” setting is a VoXResNet fully supervised trained on the labelled target datasets, as the upper bound.
IV-B Comparisons With State-of-the-arts
We report the segmentation results on the Poly-GA dataset in Table I using the [1bxn, 1f1b, and 1yg6] as the source dataset. As can be observed in the table, our approach outperforms all the state-of-the-art methods. Compared with “w/o adaptation”, DANN and PDAM indeed boost the model’s performance, however, the effect is not obvious compared with our Vox-UDA (w IBF) (i.e., PDAM achieves in while Vox-UDA (w IBF) achieves ). And compared with two recent UDA methods, our proposed Vox-UDA (w IBF) still excels on target subtomogram segmentation, which leads to significant improvements in both (i.e., ) and (i.e., ). We also report extra UDA setting results in Table II by using the other three macromoleculars [2byu, 2h12, and 21db] as source datasets, by which our proposed method still achieves state-of-the-art performance over all the comparison approaches. It is worth noting that in both tables, our Vox-UDA even surpasses the “fully supervised” setting on the vast majority of the metrics (i.e., “Fully Supervised” achieves in in Table II, while our Vox-UDA (w IBF) achieves ).
IV-B1 Segmentation results Visualization
Fig 4 shows the segmentation results on the Poly-GA dataset using 1bxn, 1f1b and 1yg6 as source dataset. As can be observed, due to the proposed noise generation module can simulate the target noise environment on the source data, our model’s robustness to noise is significantly enhanced (i.e., compared to the segmentation results of ASC and PDAM, ours results focus more on the macromolecules rather than the surrounding noise). Compared with LE-UDA, our segmentation results exhibit better texture details due to the proposed denoised pseudo labeling strategy, which avoids the model being biased towards source data and addresses the domain shift problem. We also provide additional visualization results using 2byu, 2h12 and 21db as source dataset in Fig 5.
IV-B2 Feature Visualization
As shown in Fig 6, we visualize the feature embeddings learned by the segmentation network with the commonly used t-SNE [60] method. Fig 6(a) and Fig 6(b) represent the visualization results of “after adaptation” and “before adaptation”, respectively. As can be observed from the figure, the distribution of the source and target features learned from our proposed method is more consistent compared to the distribution without adaptation. This indicates that our method has achieved knowledge transfer and generalized the model to the target data.
IV-C Ablation Study
Source macromoleculars: 1bxn, 1f1b and 1yg6 | ||||||||
Method | mIoU | mIoUribo | mIoU26S | mIoUTRiC | Dice | Diceribo | Dice26S | DiceTRiC |
Baseline | 31.9 | 36.1 | 3.8 | 6.2 | 45.9 | 51.9 | 6.4 | 8.7 |
w/o NGM | 41.9 | 43.9 | 20.1 | 37.9 | 57.5 | 60.2 | 31.0 | 50.4 |
w/o PL | 46.2 | 49.1 | 23.3 | 31.6 | 61.8 | 65.3 | 36.3 | 44.0 |
Vox-UDA(Ours) | 50.3 | 53.8 | 28.8 | 41.3 | 65.9 | 68.5 | 44.0 | 52.8 |
Source macromoleculars: 1bxn, 1f1b and 1yg6 | ||||||||||||||
Nsample | mIoU | Dice | mIoU | Dice | mIoU | Dice | mIoU | Dice | mIoU | Dice | ||||
5 | 41.2 | 57.1 | 41.7 | 57.3 | [0.1, 0.1, 0.4, 0.4] | 44.4 | 60.2 | 100 | 49.3 | 64.9 | 0.8 | 47.5 | 63.2 | |
10 | 50.3 | 65.9 | 43.5 | 59.6 | [0.2, 0.2, 0.3, 0.3] | 50.3 | 65.9 | 120 | 50.3 | 65.9 | 1.0 | 48.0 | 63.8 | |
15 | 43.6 | 59.4 | 24.4% | 50.3 | 65.9 | [0.3, 0.3, 0.2, 0.2] | 45.3 | 61.0 | 140 | 49.1 | 64.5 | 1.2 | 50.3 | 65.9 |
20 | 42.6 | 58.3 | 41.0 | 56.8 | [0.4, 0.4, 0.1, 0.1] | 42.2 | 57.8 | 160 | 47.0 | 62.5 | 1.4 | 49.5 | 65.2 |
IV-C1 Effectiveness of the Improved Bilateral Filter
We conduct a comprehensive set of experiments to validate the effectiveness of the proposed improved Bilateral Filter (IBF) for denoised pseudo-labeling. We conduct experiments using the three different denoisers introduced in Sec III-B respectively, and report the segmentation results in both Table I and Table II. As can be seen from the tables, comparing the method of using NGM as a denoiser, employing Bilateral Filtering (BF) indeed brings a performance improvement (i.e., increased in Table I and increased in Table II). This is because BF can preserve some edge information while denoising, thereby avoiding pseudo-label distortion. However, as discussed in Sec III-B, range kernel of BF is not suitable for grayscale inputs. Our proposed IBF addresses this drawback by using a Laplacian transform, which allows the range kernel to focus more on gradient changes in the voxel space rather than value changes. Therefore, our new model achieves the best performance via using the proposed improved Bilateral Filter (i.e., increased in Table I and increased in Table II).
![Refer to caption](extracted/5701126/ablation.png)
IV-C2 Effectiveness of the Noise Generation Process
As mentioned in Sec III-A, we choose Gaussian noise in NGM for noise generation. To demonstrate the rationality of our choice, we provide additional experiments in Fig 7, using Poisson noise [61] and Speckle noise [62] as the added noise for NGM, respectively. Given the variance obtained through the average noise level of the subset, the Poisson noise can be represented as , and the Speckle noise can be formulated as
(15) |
where denotes the input source image. As can be seen from the figure, compared with the other two noises, using Gaussian noise achieves the best performance.
IV-C3 Effectiveness of Different Proposed Modules
We evaluate our Vox-UDA following the same experimental setting in Table I for the ablation study and use Vox-UDA (w IBF) as the final result of our proposed method. Table III shows the evaluation of the effectiveness of each module in our method. For comparison, we build a baseline only using a single discriminator with VoxResNet, which shows the same model structure as DANN [9]. From Table III, we observe that both “w/o NGM” and “w/o PL” achieve better performance than “Baseline”, which demonstrates the effectiveness of our proposed two modules in dealing with the challenges for UDA in subtomogram segmentation. In the meantime, compared with using these two modules only, Vox-UDA achieves a significant performance improvement (i.e., improved by and improved by ).
IV-C4 Hyperparameter Analysis
We herein further evaluate the hyperparameters in our approach. As shown in Table IV, we evaluate the sampled number , the high-pass filter rate , the weight for consistency losses, the domain hyperparameter and range hyperparameter . is used to control the number of sampled targets for noise generation. As we discussed in the previous sections, the main goal of the noise generation module is to simulate target-like noises for the inputs from the source domain. Because the noise level of the whole target domain dataset is not evenly distributed, we choose to use random sampling instead of the whole dataset. Therefore, is the key point. Either being too large or too small for such a number will lead to a negative impact on the model’s performance. As the results reported in Table IV, achieves the best performance (i.e., increased compared to and increased compared to ). is the filter rate to control how much information is retained for further processes. As aforementioned, noise is usually contained in the high-frequency information of 2D or 3D images. However, the object information between the high frequency and the low frequency of the high-pass filter is typically determined by subjective discretion. Hence, we try different percentages of how much high-frequency information should remain to see which works better in our framework. Experimental results prove that works the best in our proposed approach (i.e., increased compared to and increased compared to ). is the weight we set to control the relative importance among different consistency losses. As mentioned in Sec III-A that for segmentation tasks, high-level features focus more on the edge details and low-level features focus on the textual information, we set the same value for and for low-level features, and the same value for and for the high-level ones (see Eq.7). As can be seen in Table IV, , , and achieve the best performance compared to other settings (i.e., our approach increases on and on ). We further evaluate the effectiveness of the and . Experimental results demonstrate that our settings are the most reasonable, increasing or altering and would not lead to an improvement in the model’s performance.
V Conclusion
In this paper, we propose the first voxel-level unsupervised domain adaptation approach, termed Vox-UDA, for the subtomogram segmentation task. In detail, our Vox-UDA consists of a Noise Generation Module (NGM) and a denoised pseudo-labeling (DPL) strategy. NGM takes a subset of target samples as input and generates target-like Gaussian noise for the source domain data. DPL is based on a student-teacher learning framework, using a denoised target domain data to produce pseudo label for the original target data to boost the segmentation performance by reducing the effect of domain shift. Additionally, we propose an improved bilateral filter (IBF) to provide denoised target data for DPL, thereby enhancing the quality of the pseudo labels. The proposed IBF utilizes a 3D Laplacian operator to calculate the gradient of each voxel in the , , and directions, and replaces value differences with gradient differences to enhance the performance of bilateral filtering in the grayscale space. We have conducted large-scale experiments to demonstrate the prominent performance of our method. We anticipate our novel method can contribute more to the research in cryo-ET in terms of methodology and possibly enhanced intepretability. Furthermore, we would propose that future research endeavors focus on enhancing the scalability of our method for a broader range of biomedical 3D image segmentation tasks.
Acknowledgments
The authors acknowledge NVIDIA and its research support team for the help provided to conduct this work. This work was partially supported by the Australian Research Council (ARC) Industrial Transformation Training Centres (IITC) for Innovative Composites for the Future of Sustainable Mining Equipment under Grant IC220100028. This work was partially supported by U.S. NIH grants R01GM134020 and P41GM103712, NSF grants DBI-1949629, DBI-2238093, IIS-2007595, IIS-2211597, and MCB-2205148. This work was supported in part by Oracle Cloud credits and related resources provided by Oracle for Research, and the computational resources support from AMD HPC Fund.
References
- [1] C. M. Oikonomou and G. J. Jensen, “Cellular electron cryotomography: toward structural biology in situ,” Annual review of biochemistry, vol. 86, pp. 873–896, 2017.
- [2] W. Wan and J. A. Briggs, “Cryo-electron tomography and subtomogram averaging,” Methods in enzymology, vol. 579, pp. 329–367, 2016.
- [3] X. Zhu, J. Chen, X. Zeng, J. Liang, C. Li, S. Liu, S. Behpour, and M. Xu, “Weakly supervised 3d semantic segmentation using cross-image consensus and inter-voxel affinity relations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2021, pp. 2834–2844.
- [4] B. Zhou, H. Yu, X. Zeng, X. Yang, J. Zhang, and M. Xu, “One-shot learning with attention-guided segmentation in cryo-electron tomography,” Frontiers in Molecular Biosciences, vol. 7, p. 613347, 2021.
- [5] J. E. Heebner, C. Purnell, R. K. Hylton, M. Marsh, M. A. Grillo, and M. T. Swulius, “Deep learning-based segmentation of cryo-electron tomograms,” JoVE (Journal of Visualized Experiments), no. 189, p. e64435, 2022.
- [6] H. Zhu, C. Wang, Y. Wang, Z. Fan, M. R. Uddin, X. Gao, J. Zhang, X. Zeng, and M. Xu, “Unsupervised multi-task learning for 3d subtomogram image alignment, clustering and segmentation,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 2751–2755.
- [7] C. Li, D. Liu, H. Li, Z. Zhang, G. Lu, X. Chang, and W. Cai, “Domain adaptive nuclei instance segmentation and classification via category-aware feature alignment and pseudo-labelling,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 715–724.
- [8] P. Naylor, M. Laé, F. Reyal, and T. Walter, “Segmentation of nuclei in histopathology images by deep regression of the distance map,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 448–459, 2018.
- [9] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016.
- [10] G. van Tulder and M. de Bruijne, “Unpaired, unsupervised domain adaptation assumes your domains are already similar,” Medical Image Analysis, vol. 87, p. 102825, 2023.
- [11] J. Zhang, H. Chao, A. Dhurandhar, P.-Y. Chen, A. Tajer, Y. Xu, and P. Yan, “Spectral adversarial mixup for few-shot unsupervised domain adaptation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 728–738.
- [12] D. Liu, D. Zhang, Y. Song, F. Zhang, L. O’Donnell, H. Huang, M. Chen, and W. Cai, “Pdam: A panoptic-level feature alignment framework for unsupervised domain adaptive instance segmentation in microscopy images,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 154–165, 2020.
- [13] Z. Zhao, F. Zhou, K. Xu, Z. Zeng, C. Guan, and S. K. Zhou, “Le-uda: Label-efficient unsupervised domain adaptation for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 42, no. 3, pp. 633–646, 2022.
- [14] S. Cicek, N. Xu, Z. Wang, H. **, and S. Soatto, “Disentangled image generation for unsupervised domain adaptation,” in European Conference on Computer Vision. Springer, 2020, pp. 662–665.
- [15] H. Shin, H. Kim, S. Kim, Y. Jun, T. Eo, and D. Hwang, “Sdc-uda: Volumetric unsupervised domain adaptation framework for slice-direction continuous cross-modality medical image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 7412–7421.
- [16] Z. Xu, H. Gong, X. Wan, and H. Li, “Asc: Appearance and structure consistency for unsupervised domain adaptation in fetal brain mri segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 325–335.
- [17] J. Xian, X. Li, D. Tu, S. Zhu, C. Zhang, X. Liu, X. Li, and X. Yang, “Unsupervised cross-modality adaptation via dual structural-oriented guidance for 3d medical image segmentation,” IEEE Transactions on Medical Imaging, 2023.
- [18] F. Eisenstein, R. Danev, and M. Pilhofer, “Improved applicability and robustness of fast cryo-electron tomography data acquisition,” Journal of structural biology, vol. 208, no. 2, pp. 107–114, 2019.
- [19] W. J. Hagen, W. Wan, and J. A. Briggs, “Implementation of a cryo-electron tomography tilt-scheme optimized for high resolution subtomogram averaging,” Journal of structural biology, vol. 197, no. 2, pp. 191–198, 2017.
- [20] A. Martinez-Sanchez, L. Lamm, M. Jasnin, and H. Phelippeau, “Simulating the cellular context in synthetic datasets for cryo-electron tomography,” IEEE Transactions on Medical Imaging, pp. 1–1, 2024.
- [21] P. Harar, L. Herrmann, P. Grohs, and D. Haselbach, “Faket: Simulating cryo-electron tomograms with neural style transfer,” arXiv preprint arXiv:2304.02011, 2023.
- [22] A. Sharma, T. Kalluri, and M. Chandraker, “Instance level affinity-based transfer for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 2021, pp. 5361–5371.
- [23] N. Xiao and L. Zhang, “Dynamic weighted learning for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 2021, pp. 15 242–15 251.
- [24] J. Zhang, J. Huang, Z. Tian, and S. Lu, “Spectral unsupervised domain adaptation for visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022, pp. 9829–9840.
- [25] J. Yang, J. Liu, N. Xu, and J. Huang, “Tvt: Transferable vision transformer for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2023, pp. 520–530.
- [26] D. Guan, J. Huang, A. Xiao, S. Lu, and Y. Cao, “Uncertainty-aware unsupervised domain adaptation in object detection,” IEEE Transactions on Multimedia, vol. 24, pp. 2502–2514, 2021.
- [27] F. Yu, D. Wang, Y. Chen, N. Karianakis, T. Shen, P. Yu, D. Lymberopoulos, S. Lu, W. Shi, and X. Chen, “Sc-uda: Style and content gaps aware unsupervised domain adaptation for object detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2022, pp. 382–391.
- [28] G. Mattolin, L. Zanella, E. Ricci, and Y. Wang, “Confmix: Unsupervised domain adaptation for object detection via confidence-based mixing,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2023, pp. 423–433.
- [29] J. Yoo, I. Chung, and N. Kwak, “Unsupervised domain adaptation for one-stage object detector using offsets to bounding box,” in European Conference on Computer Vision. Springer, 2022, pp. 691–708.
- [30] J. Dong, Y. Cong, G. Sun, Z. Fang, and Z. Ding, “Where and how to transfer: Knowledge aggregation-induced transferability perception for unsupervised domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1664–1681, 2021.
- [31] S. Lee, J. Hyun, H. Seong, and E. Kim, “Unsupervised domain adaptation for semantic segmentation by content transfer,” in Proceedings of the AAAI conference on Artificial Intelligence, vol. 35, no. 9, 2021, pp. 8306–8315.
- [32] J. Zhu, Y. Guo, G. Sun, L. Yang, M. Deng, and J. Chen, “Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023.
- [33] X. Zhao, N. C. Mithun, A. Rajvanshi, H.-P. Chiu, and S. Samarasekera, “Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2024, pp. 2399–2409.
- [34] Y. Zou, Z. Yu, B. Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in Proceedings of the European conference on computer vision (ECCV). Springer, 2018, pp. 289–305.
- [35] X. Zheng, J. Zhu, Y. Liu, Z. Cao, C. Fu, and L. Wang, “Both style and distortion matter: Dual-path unsupervised domain adaptation for panoramic semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 1285–1295.
- [36] Y. Zhang, Y. Wang, L. Xu, Y. Yao, W. Qian, and L. Qi, “St-gan: A swin transformer-based generative adversarial network for unsupervised domain adaptation of cross-modality cardiac segmentation,” IEEE Journal of Biomedical and Health Informatics, 2023.
- [37] Q. Xie, Y. Li, N. He, M. Ning, K. Ma, G. Wang, Y. Lian, and Y. Zheng, “Unsupervised domain adaptation for medical image segmentation by disentanglement learning and self-training,” IEEE Transactions on Medical Imaging, 2022.
- [38] W. Ji and A. C. Chung, “Unsupervised domain adaptation for medical image segmentation using transformer with meta attention,” IEEE Transactions on Medical Imaging, 2023.
- [39] R. I. Koning, “Cryo-electron tomography of cellular microtubules,” Methods in cell biology, vol. 97, pp. 455–473, 2010.
- [40] X. Zeng and M. Xu, “Gum-net: Unsupervised geometric matching for fast and accurate 3d subtomogram image alignment and averaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2020, pp. 4073–4084.
- [41] F. P. de Isidro-Gómez, J. Vilas, P. Losana, J. Carazo, and C. O. S. Sorzano, “A deep learning approach to the automatic detection of alignment errors in cryo-electron tomographic reconstructions,” Journal of Structural Biology, vol. 216, no. 1, p. 108056, 2024.
- [42] W. Wan, S. Khavnekar, and J. Wagner, “Stopgap: an open-source package for template matching, subtomogram alignment and classification,” Acta Crystallographica Section D: Structural Biology, vol. 80, no. 5, 2024.
- [43] X. Du, H. Wang, Z. Zhu, X. Zeng, Y.-W. Chang, J. Zhang, E. Xing, and M. Xu, “Active learning to classify macromolecular structures in situ for less supervision in cryo-electron tomography,” Bioinformatics, vol. 37, no. 16, pp. 2340–2346, 2021.
- [44] N. Nguyen, C. Bohak, D. Engel, P. Mindek, O. Strnad, P. Wonka, S. Li, T. Ropinski, and I. Viola, “Finding nano-ötzi: cryo-electron tomography visualization guided by learned segmentation,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 10, pp. 4198–4214, 2022.
- [45] M. Siggel, R. K. Jensen, V. J. Maurer, J. Mahamid, and J. Kosinski, “Colabseg: An interactive tool for editing, processing, and visualizing membrane segmentations from cryo-et data,” Journal of Structural Biology, p. 108067, 2024.
- [46] H. Bandyopadhyay, Z. Deng, L. Ding, S. Liu, M. R. Uddin, X. Zeng, S. Behpour, and M. Xu, “Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization,” Bioinformatics, vol. 38, no. 4, pp. 977–984, 2022.
- [47] H. Chen, Q. Dou, L. Yu, J. Qin, and P.-A. Heng, “Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images,” NeuroImage, vol. 170, pp. 446–455, 2018.
- [48] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- [49] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning. PMLR, 2021, pp. 8162–8171.
- [50] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2noise: Learning image restoration without clean data,” in International Conference on Machine Learning. PMLR, 2018, pp. 2965–2974.
- [51] E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, and T. E. Ferrin, “Ucsf chimera—a visualization system for exploratory research and analysis,” Journal of computational chemistry, vol. 25, no. 13, pp. 1605–1612, 2004.
- [52] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- [53] K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Advances in neural information processing systems, vol. 33, pp. 596–608, 2020.
- [54] M. Elad, “On the origin of the bilateral filter and ways to improve it,” IEEE Transactions on image processing, vol. 11, no. 10, pp. 1141–1151, 2002.
- [55] L. Wang, Y. Zhang, and J. Feng, “On the euclidean distance of images,” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1334–1339, 2005.
- [56] L. J. Van Vliet, I. T. Young, and G. L. Beckers, “A nonlinear laplace operator as edge detector in noisy images,” Computer vision, graphics, and image processing, vol. 45, no. 2, pp. 167–195, 1989.
- [57] X. Zeng, A. Kahng, L. Xue, J. Mahamid, Y.-W. Chang, and M. Xu, “High-throughput cryo-et structural pattern mining by unsupervised deep iterative subtomogram clustering,” Proceedings of the National Academy of Sciences, vol. 120, no. 15, p. e2213149120, 2023.
- [58] X. Liao, W. Li, Q. Xu, X. Wang, B. **, X. Zhang, Y. Wang, and Y. Zhang, “Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 2020, pp. 9394–9402.
- [59] Q. Guo, C. Lehmer, A. Martínez-Sánchez, T. Rudack, F. Beck, H. Hartmann, M. Pérez-Berlanga, F. Frottin, M. S. Hipp, F. U. Hartl et al., “In situ structure of neuronal c9orf72 poly-ga aggregates reveals proteasome recruitment,” Cell, vol. 172, no. 4, pp. 696–705, 2018.
- [60] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- [61] M. Carlavan and L. Blanc-Féraud, “Sparse poisson noisy image deblurring,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1834–1846, 2011.
- [62] A. Maity, A. Pattanaik, S. Sagnika, and S. Pani, “A comparative study on approaches to speckle noise reduction in images,” in International Conference on Computational Intelligence and Networks. IEEE, 2015, pp. 148–155.