Skip to main content

Showing 1–3 of 3 results for author: Takashima, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.14710  [pdf, other

    cs.CV

    Pre-training Vision Transformers with Very Limited Synthesized Images

    Authors: Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

    Abstract: Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematic… ▽ More

    Submitted 30 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  2. arXiv:2303.01112  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

    Authors: Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

    Abstract: Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthe… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  3. arXiv:2206.09132  [pdf, other

    cs.CV cs.AI cs.LG

    Replacing Labeled Real-image Datasets with Auto-generated Contours

    Authors: Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

    Abstract: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022