-
Primitive Geometry Segment Pre-training for 3D Medical Image Segmentation
Authors:
Ryu Tadokoro,
Ryosuke Yamada,
Kodai Nakashima,
Ryo Nakamura,
Hirokatsu Kataoka
Abstract:
The construction of 3D medical image datasets presents several issues, including requiring significant financial costs in data collection and specialized expertise for annotation, as well as strict privacy concerns for patient confidentiality compared to natural image datasets. Therefore, it has become a pressing issue in 3D medical image segmentation to enable data-efficient learning with limited…
▽ More
The construction of 3D medical image datasets presents several issues, including requiring significant financial costs in data collection and specialized expertise for annotation, as well as strict privacy concerns for patient confidentiality compared to natural image datasets. Therefore, it has become a pressing issue in 3D medical image segmentation to enable data-efficient learning with limited 3D medical data and supervision. A promising approach is pre-training, but improving its performance in 3D medical image segmentation is difficult due to the small size of existing 3D medical image datasets. We thus present the Primitive Geometry Segment Pre-training (PrimGeoSeg) method to enable the learning of 3D semantic features by pre-training segmentation tasks using only primitive geometric objects for 3D medical image segmentation. PrimGeoSeg performs more accurate and efficient 3D medical image segmentation without manual data collection and annotation. Further, experimental results show that PrimGeoSeg on SwinUNETR improves performance over learning from scratch on BTCV, MSD (Task06), and BraTS datasets by 3.7%, 4.4%, and 0.3%, respectively. Remarkably, the performance was equal to or better than state-of-the-art self-supervised learning despite the equal number of pre-training data. From experimental results, we conclude that effective pre-training can be achieved by looking at primitive geometric objects only. Code and dataset are available at https://github.com/SUPER-TADORY/PrimGeoSeg.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
PoF: Post-Training of Feature Extractor for Improving Generalization
Authors:
Ikuro Sato,
Ryota Yamada,
Masayuki Tanaka,
Nakamasa Inoue,
Rei Kawakami
Abstract:
It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feat…
▽ More
It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Replacing Labeled Real-image Datasets with Auto-generated Contours
Authors:
Hirokatsu Kataoka,
Ryo Hayamizu,
Ryosuke Yamada,
Kodai Nakashima,
Sora Takashima,
Xinyu Zhang,
Edgar Josafat Martinez-Noriega,
Nakamasa Inoue,
Rio Yokota
Abstract:
In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy…
▽ More
In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy when pre-trained under the same conditions (number of images, hyperparameters, and number of epochs). Images generated by formulas avoid the privacy/copyright issues, labeling cost and errors, and biases that real images suffer from, and thus have tremendous potential for pre-training general models. To understand the performance of the synthetic images, we tested two hypotheses, namely (i) object contours are what matter in FDSL datasets and (ii) increased number of parameters to create labels affects performance improvement in FDSL pre-training. To test the former hypothesis, we constructed a dataset that consisted of simple object contour combinations. We found that this dataset can match the performance of fractals. For the latter hypothesis, we found that increasing the difficulty of the pre-training task generally leads to better fine-tuning accuracy.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
Search by a Metamorphic Robotic System in a Finite 3D Cubic Grid
Authors:
Ryonosuke Yamada,
Yukiko Yamauchi
Abstract:
We consider search in a finite 3D cubic grid by a metamorphic robotic system (MRS), that consists of anonymous modules. A module can perform a sliding and rotation while the whole modules keep connectivity. As the number of modules increases, the variety of actions that the MRS can perform increases. The search problem requires the MRS to find a target in a given finite field. Doi et al. (SSS 2018…
▽ More
We consider search in a finite 3D cubic grid by a metamorphic robotic system (MRS), that consists of anonymous modules. A module can perform a sliding and rotation while the whole modules keep connectivity. As the number of modules increases, the variety of actions that the MRS can perform increases. The search problem requires the MRS to find a target in a given finite field. Doi et al. (SSS 2018) demonstrate a necessary and sufficient number of modules for search in a finite 2D square grid. We consider search in a finite 3D cubic grid and investigate the effect of common knowledge. We consider three different settings. First, we show that three modules are necessary and sufficient when all modules are equipped with a common compass, i.e., they agree on the direction and orientation of the $x$, $y$, and $z$ axes. Second, we show that four modules are necessary and sufficient when all modules agree on the direction and orientation of the vertical axis. Finally, we show that five modules are necessary and sufficient when all modules are not equipped with a common compass. Our results show that the shapes of the MRS in the 3D cubic grid have richer structure than those in the 2D square grid.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Pre-training without Natural Images
Authors:
Hirokatsu Kataoka,
Kazushige Okayasu,
Asato Matsumoto,
Eisuke Yamagata,
Ryosuke Yamada,
Nakamasa Inoue,
Akio Nakamura,
Yutaka Satoh
Abstract:
Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the…
▽ More
Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Extension of Sinkhorn Method: Optimal Movement Estimation of Agents Moving at Constant Velocity
Authors:
Daigo Okada,
Naotoshi Nakamura,
Takuya Wada,
Ayako Iwasaki,
Ryo Yamada
Abstract:
In the field of bioimaging, an important part of analyzing the motion of objects is tracking. We propose a method that applies the Sinkhorn distance for solving the optimal transport problem to track objects. The advantage of this method is that it can flexibly incorporate various assumptions in tracking as a cost matrix. First, we extend the Sinkhorn distance from two dimensions to three dimensio…
▽ More
In the field of bioimaging, an important part of analyzing the motion of objects is tracking. We propose a method that applies the Sinkhorn distance for solving the optimal transport problem to track objects. The advantage of this method is that it can flexibly incorporate various assumptions in tracking as a cost matrix. First, we extend the Sinkhorn distance from two dimensions to three dimensions. Using this three-dimensional distance, we compare the performance of two types of tracking technique, namely tracking that associates objects that are close to each other, which conventionally uses the nearest-neighbor method, and tracking that assumes that the object is moving at constant velocity, using three types of simulation data. The results suggest that when tracking objects moving at constant velocity, our method is superior to conventional nearest-neighbor tracking as long as the added noise is not excessively large. We show that the Sinkhorn method can be applied effectively to object tracking. Our simulation data analysis suggests that when objects are moving at constant velocity, our method, which sets acceleration as a cost, outperforms the traditional nearest-neighbor method in terms of tracking objects. To apply the proposed method to real bioimaging data, it is necessary to set an appropriate cost indicator based on the movement features.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.