Skip to main content

Showing 1–43 of 43 results for author: Kataoka, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09401  [pdf, other

    cs.CV cs.AI

    Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

    Authors: Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

    Abstract: Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However, there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue, we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with vi… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: updated references

  2. arXiv:2401.03665  [pdf, other

    cs.CV

    Primitive Geometry Segment Pre-training for 3D Medical Image Segmentation

    Authors: Ryu Tadokoro, Ryosuke Yamada, Kodai Nakashima, Ryo Nakamura, Hirokatsu Kataoka

    Abstract: The construction of 3D medical image datasets presents several issues, including requiring significant financial costs in data collection and specialized expertise for annotation, as well as strict privacy concerns for patient confidentiality compared to natural image datasets. Therefore, it has become a pressing issue in 3D medical image segmentation to enable data-efficient learning with limited… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted to BMVC2023 (Oral)

    Report number: 152

    Journal ref: Proceedings of the British Machine Vision Conference (BMVC), 2023

  3. arXiv:2312.10737  [pdf, other

    cs.CV cs.RO

    Traffic Incident Database with Multiple Labels Including Various Perspective Environmental Information

    Authors: Shota Nishiyama, Takuma Saito, Ryo Nakamura, Go Ohtani, Hirokatsu Kataoka, Kensho Hara

    Abstract: A large dataset of annotated traffic accidents is necessary to improve the accuracy of traffic accident recognition using deep learning models. Conventional traffic accident datasets provide annotations on traffic accidents and other teacher labels, improving traffic accident recognition performance. However, the labels annotated in conventional datasets need to be more comprehensive to describe t… ▽ More

    Submitted 19 December, 2023; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Conference paper accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023 Reason for revision: Corrected due to a missing space between sentences in the preview's abstract, which led to an unintended URL interpretation

  4. arXiv:2311.03650  [pdf, other

    cs.CV

    Image Generation and Learning Strategy for Deep Document Forgery Detection

    Authors: Yamato Okamoto, Osada Genki, Iu Yahiro, Rintaro Hasegawa, Peifei Zhu, Hirokatsu Kataoka

    Abstract: In recent years, document processing has flourished and brought numerous benefits. However, there has been a significant rise in reported cases of forged document images. Specifically, recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery. Traditional approaches for forged document images created by prevalent copy-move methods are… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  5. arXiv:2310.14581  [pdf, other

    cs.CV cs.AI

    Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

    Authors: Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

    Abstract: Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose the best training data with the fixed models. This paper presents our solution to both filtering track… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at the ICCV 2023 Workshop on Towards the Next Generation of Computer Vision Datasets: DataComp Track

  6. arXiv:2310.01936  [pdf, ps, other

    cs.CV

    Constructing Image-Text Pair Dataset from Books

    Authors: Yamato Okamoto, Haruto Toyonaga, Yoshihisa Ijiri, Hirokatsu Kataoka

    Abstract: Digital archiving is becoming widespread owing to its effectiveness in protecting valuable books and providing knowledge to many people electronically. In this paper, we propose a novel approach to leverage digital archives for machine learning. If we can fully utilize such digitized data, machine learning has the potential to uncover unknown insights and ultimately acquire knowledge autonomously,… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023 workshop, Towards the Next Generation of Computer Vision Datasets: General DataCentric Submission Track

  7. arXiv:2309.17083  [pdf, other

    cs.CV

    SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

    Authors: Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

    Abstract: Pre-training is a strong strategy for enhancing visual models to efficiently train them with a limited number of labeled images. In semantic segmentation, creating annotation masks requires an intensive amount of labor and time, and therefore, a large-scale pre-training dataset with semantic labels is quite difficult to construct. Moreover, what matters in semantic segmentation pre-training has no… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV2023. Code: https://github.com/dahlian00/SegRCDB, Project page: https://dahlian00.github.io/SegRCDBPage/

  8. arXiv:2309.14759  [pdf, other

    cs.GR cs.CV

    Diffusion-based Holistic Texture Rectification and Synthesis

    Authors: Guoqing Hao, Satoshi Iizuka, Kensho Hara, Edgar Simo-Serra, Hirokatsu Kataoka, Kazuhiro Fukui

    Abstract: We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images. Traditional texture synthesis approaches focus on generating textures from pristine samples, which necessitate meticulous preparation by humans and are often unattainable in most natural images. These challenges stem from the frequent occlusions and distortions of texture samples… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  9. arXiv:2309.01369  [pdf, other

    cs.CV

    Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

    Authors: Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu Kataoka

    Abstract: The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image diffusion models, which enables real-image-and-annotation-free training. However, the pioneering training method using the diffusion-synthetic images an… ▽ More

    Submitted 15 April, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

  10. arXiv:2307.14710  [pdf, other

    cs.CV

    Pre-training Vision Transformers with Very Limited Synthesized Images

    Authors: Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

    Abstract: Formula-driven supervised learning (FDSL) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals. Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks. These synthetic images are categorized according to the parameters in the mathematic… ▽ More

    Submitted 30 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  11. arXiv:2303.02930  [pdf, other

    cs.CV

    Scapegoat Generation for Privacy Protection from Deepfake

    Authors: Gido Kato, Yoshihiro Fukuhara, Mariko Isogawa, Hideki Tsunashima, Hirokatsu Kataoka, Shigeo Morishima

    Abstract: To protect privacy and prevent malicious use of deepfake, current studies propose methods that interfere with the generation process, such as detection and destruction approaches. However, these methods suffer from sub-optimal generalization performance to unseen models and add undesirable noise to the original image. To address these problems, we propose a new problem formulation for deepfake pre… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 5 pages, 5 figures

    MSC Class: 68T07

  12. arXiv:2303.01112  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

    Authors: Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

    Abstract: Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthe… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  13. arXiv:2207.14455  [pdf, other

    cs.CV

    Neural Density-Distance Fields

    Authors: Itsuki Ueda, Yoshihiro Fukuhara, Hirokatsu Kataoka, Hiroaki Aizawa, Hidehiko Shishido, Itaru Kitahara

    Abstract: The success of neural fields for 3D vision tasks is now indisputable. Following this trend, several methods aiming for visual localization (e.g., SLAM) have been proposed to estimate distance or density fields using neural fields. However, it is difficult to achieve high localization performance by only density fields-based methods such as Neural Radiance Field (NeRF) since they do not provide den… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 (poster). project page: https://ueda0319.github.io/neddf/

  14. arXiv:2206.09132  [pdf, other

    cs.CV cs.AI cs.LG

    Replacing Labeled Real-image Datasets with Auto-generated Contours

    Authors: Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

    Abstract: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022

  15. arXiv:2203.09109  [pdf, other

    cs.CV cs.CL

    Community-Driven Comprehensive Scientific Paper Summarization: Insight from cvpaper.challenge

    Authors: Shintaro Yamamoto, Hirokatsu Kataoka, Ryota Suzuki, Seitaro Shinagawa, Shigeo Morishima

    Abstract: The present paper introduces a group activity involving writing summaries of conference proceedings by volunteer participants. The rapid increase in scientific papers is a heavy burden for researchers, especially non-native speakers, who need to survey scientific literature. To alleviate this problem, we organized a group of non-native English speakers to write summaries of papers presented at a c… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  16. arXiv:2103.14146  [pdf, other

    cs.CV

    Describing and Localizing Multiple Changes with Transformers

    Authors: Yue Qiu, Shintaro Yamamoto, Kodai Nakashima, Ryota Suzuki, Kenji Iwata, Hirokatsu Kataoka, Yutaka Satoh

    Abstract: Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single change.However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from… ▽ More

    Submitted 14 September, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted by ICCV2021. 18 pages, 15 figures, project page: https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/

  17. arXiv:2103.13023  [pdf, other

    cs.CV

    Can Vision Transformers Learn without Natural Images?

    Authors: Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue

    Abstract: Can we complete pre-training of Vision Transformers (ViT) without natural images and human-annotated labels? Although a pre-trained ViT seems to heavily rely on a large-scale dataset and human-annotated labels, recent large-scale datasets contain several problems in terms of privacy violations, inadequate fairness protection, and labor-intensive annotation. In the present paper, we pre-train ViT w… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Project page: https://hirokatsukataoka16.github.io/Vision-Transformers-without-Natural-Images/

  18. arXiv:2101.08515  [pdf, other

    cs.CV cs.LG

    Pre-training without Natural Images

    Authors: Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh

    Abstract: Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: ACCV 2020 Best Paper Honorable Mention Award, Codes are publicly available: https://github.com/hirokatsukataoka16/FractalDB-Pretrained-ResNet-PyTorch

  19. arXiv:2101.07406  [pdf, ps, other

    cs.CV

    Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data

    Authors: Nakamasa Inoue, Eisuke Yamagata, Hirokatsu Kataoka

    Abstract: We propose a novel network initialization method using Perlin noise for training image classification networks with a limited amount of data. Our main idea is to initialize the network parameters by solving an artificial noise classification problem, where the aim is to classify Perlin noise samples into their noise categories. Specifically, the proposed method consists of two steps. First, it gen… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

    Comments: Accepted to ICPR2020

  20. arXiv:2007.06866  [pdf, other

    cs.CV

    Alleviating Over-segmentation Errors by Detecting Action Boundaries

    Authors: Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka

    Abstract: We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptiv… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: under review

  21. arXiv:2005.09183  [pdf, other

    cs.CV cs.CL cs.IR

    Retrieving and Highlighting Action with Spatiotemporal Reference

    Authors: Seito Kasai, Yuchi Ishikawa, Masaki Hayashi, Yoshimitsu Aoki, Kensho Hara, Hirokatsu Kataoka

    Abstract: In this paper, we present a framework that jointly retrieves and spatiotemporally highlights actions in videos by enhancing current deep cross-modal retrieval methods. Our work takes on the novel task of action highlighting, which visualizes where and when actions occur in an untrimmed video setting. Action highlighting is a fine-grained task, compared to conventional action recognition tasks whic… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted to ICIP 2020

  22. arXiv:2004.04968  [pdf, other

    cs.CV

    Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

    Authors: Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, Yutaka Satoh

    Abstract: How can we collect and use a video dataset to further improve spatiotemporal 3D Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question in video recognition, we have conducted an exploration study using a couple of large-scale video datasets and 3D CNNs. In the early era of deep neural networks, 2D CNNs have been better than 3D CNNs in the context of video recogni… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

    Comments: Codes and pre-trained models are publicly available: https://github.com/kenshohara/3D-ResNets-PyTorch

  23. arXiv:2003.12263  [pdf, other

    cs.CV

    Weakly Supervised Dataset Collection for Robust Person Detection

    Authors: Munetaka Minoguchi, Ken Okayama, Yutaka Satoh, Hirokatsu Kataoka

    Abstract: To construct an algorithm that can provide robust person detection, we present a dataset with over 8 million images that was produced in a weakly supervised manner. Through labor-intensive human annotation, the person detection research community has produced relatively small datasets containing on the order of 100,000 images, such as the EuroCity Persons dataset, which includes 240,000 bounding b… ▽ More

    Submitted 1 May, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

    Comments: Project page: https://github.com/cvpaperchallenge/FashionCultureDataBase_DLoader The paper is under consideration at Pattern Recognition Letters

  24. arXiv:2003.00187  [pdf, other

    cs.CV cs.LG

    Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

    Authors: Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

    Abstract: Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization… ▽ More

    Submitted 12 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted to ICPR2020

  25. arXiv:1905.07666  [pdf, other

    cs.CV

    What Do Adversarially Robust Models Look At?

    Authors: Takahiro Itazuri, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima

    Abstract: In this paper, we address the open question: "What do adversarially robust models look at?" Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, i… ▽ More

    Submitted 18 May, 2019; originally announced May 2019.

  26. arXiv:1811.06943  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Automatic Paper Summary Generation from Visual and Textual Information

    Authors: Shintaro Yamamoto, Yoshihiro Fukuhara, Ryota Suzuki, Shigeo Morishima, Hirokatsu Kataoka

    Abstract: Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts. In response to this situation, this paper proposes the paper summary generation (PSG) task using a simple but effective method to automatically generate an academic paper summary… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: International Conference on Machine Vision 2018, Munich, Germany

  27. arXiv:1809.10581  [pdf

    cs.SD eess.AS

    Acoustic Probing for Estimating the Storage Time and Firmness of Tomatoes and Mandarin Oranges

    Authors: Hidetomo Kataoka, Takashi Ijiri, Kohei Matsumura, Jeremy White, Akira Hirabayashi

    Abstract: This paper introduces an acoustic probing technique to estimate the storage time and firmness of fruits; we emit an acoustic signal to fruit from a small speaker and capture the reflected signal with a tiny microphone. We collect reflected signals for fruits with various storage times and firmness conditions, using them to train regressors for estimation. To evaluate the feasibility of our acousti… ▽ More

    Submitted 30 April, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: 8 pages, 9 figures. After submitting the first version, we have continued measurements and found some results indicating a possibility that the conditions of our measurement devices had an influence to the estimation results. We are still continuing experiments

  28. arXiv:1809.08391  [pdf, other

    cs.CV cs.AI

    Understanding Fake Faces

    Authors: Ryota Natsume, Kazuki Inoue, Yoshihiro Fukuhara, Shintaro Yamamoto, Shigeo Morishima, Hirokatsu Kataoka

    Abstract: Face recognition research is one of the most active topics in computer vision (CV), and deep neural networks (DNN) are now filling the gap between human-level and computer-driven performance levels in face verification algorithms. However, although the performance gap appears to be narrowing in terms of accuracy-based expectations, a curious question has arisen; specifically, "Face understanding o… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.

    Comments: 11 pages, 3 figures, ECCV 2018 Workshop on Brain-Driven Computer Vision (BDCV)

  29. arXiv:1805.11850  [pdf, ps, other

    cs.CV cs.CL

    Neural Joking Machine : Humorous image captioning

    Authors: Kota Yoshida, Munetaka Minoguchi, Kenichiro Wani, Akio Nakamura, Hirokatsu Kataoka

    Abstract: What is an effective expression that draws laughter from human beings? In the present paper, in order to consider this question from an academic standpoint, we generate an image caption that draws a "laugh" by a computer. A system that outputs funny captions based on the image caption proposed in the computer vision field is constructed. Moreover, we also propose the Funny Score, which flexibly gi… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted to CVPR 2018 Language & Vision Workshop

  30. arXiv:1804.02675  [pdf, ps, other

    cs.CV

    Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB

    Authors: Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, Yutaka Satoh

    Abstract: In this paper, we propose a novel approach for traffic accident anticipation through (i) Adaptive Loss for Early Anticipation (AdaLEA) and (ii) a large-scale self-annotated incident database for anticipation. The proposed AdaLEA allows a model to gradually learn an earlier anticipation as training progresses. The loss function adaptively assigns penalty weights depending on how early the model can… ▽ More

    Submitted 8 April, 2018; originally announced April 2018.

    Comments: Accepted to CVPR 2018

  31. arXiv:1804.02555  [pdf, ps, other

    cs.CV cs.RO

    Drive Video Analysis for the Detection of Traffic Near-Miss Incidents

    Authors: Hirokatsu Kataoka, Teppei Suzuki, Shoko Oikawa, Yasuhiro Matsui, Yutaka Satoh

    Abstract: Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely. Accordingly, as a means of providing learning depth, this paper presents a novel traffic database that contains… ▽ More

    Submitted 7 April, 2018; originally announced April 2018.

    Comments: Accepted to ICRA 2018

  32. arXiv:1711.09577  [pdf, other

    cs.CV

    Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

    Authors: Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

    Abstract: The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow 3D archi… ▽ More

    Submitted 1 April, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: Accepted to CVPR 2018

  33. arXiv:1708.07632  [pdf, other

    cs.CV

    Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

    Authors: Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

    Abstract: Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatio-temporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases. However, the architecture of 3D CNNs is relatively shallow against to the… ▽ More

    Submitted 25 August, 2017; originally announced August 2017.

    Comments: To appear in ICCV 2017 Workshop (Chalearn)

  34. arXiv:1707.06436  [pdf, ps, other

    cs.CV

    cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey

    Authors: Hirokatsu Kataoka, Soma Shirakabe, Yun He, Shunya Ueta, Teppei Suzuki, Kaori Abe, Asako Kanezaki, Shin'ichiro Morita, Toshiyuki Yabe, Yoshihiro Kanehara, Hiroya Yatsuyanagi, Shinya Maruyama, Ryosuke Takasawa, Masataka Fuchida, Yudai Miyashita, Kazushige Okayasu, Yuta Matsuzaki

    Abstract: The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV.

    Submitted 20 July, 2017; originally announced July 2017.

  35. arXiv:1705.03595  [pdf, ps, other

    cs.CV cs.LG

    Collaborative Descriptors: Convolutional Maps for Preprocessing

    Authors: Hirokatsu Kataoka, Kaori Abe, Akio Nakamura, Yutaka Satoh

    Abstract: The paper presents a novel concept for collaborative descriptors between deeply learned and hand-crafted features. To achieve this concept, we apply convolutional maps for pre-processing, namely the convovlutional maps are used as input of hand-crafted features. We recorded an increase in the performance rate of +17.06 % (multi-class object recognition) and +24.71 % (car detection) from grayscale… ▽ More

    Submitted 9 May, 2017; originally announced May 2017.

    Comments: CVPR 2017 Workshop Submission

  36. arXiv:1704.02199  [pdf, ps, other

    cs.CV

    Could you guess an interesting movie from the posters?: An evaluation of vision-based features on movie poster database

    Authors: Yuta Matsuzaki, Kazushige Okayasu, Takaaki Imanari, Naomichi Kobayashi, Yoshihiro Kanehara, Ryousuke Takasawa, Akio Nakamura, Hirokatsu Kataoka

    Abstract: In this paper, we aim to estimate the Winner of world-wide film festival from the exhibited movie poster. The task is an extremely challenging because the estimation must be done with only an exhibited movie poster, without any film ratings and box-office takings. In order to tackle this problem, we have created a new database which is consist of all movie posters included in the four biggest film… ▽ More

    Submitted 7 April, 2017; originally announced April 2017.

    Comments: 4 pages, 4 figures

  37. arXiv:1703.07920  [pdf, ps, other

    cs.CV cs.DB cs.MM

    Changing Fashion Cultures

    Authors: Kaori Abe, Teppei Suzuki, Shunya Ueta, Akio Nakamura, Yutaka Satoh, Hirokatsu Kataoka

    Abstract: The paper presents a novel concept that analyzes and visualizes worldwide fashion trends. Our goal is to reveal cutting-edge fashion trends without displaying an ordinary fashion style. To achieve the fashion-based analysis, we created a new fashion culture database (FCDB), which consists of 76 million geo-tagged images in 16 cosmopolitan cities. By gras** a fashion trend of mixed fashion styles… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

    Comments: 9 pages, 9 figures

  38. arXiv:1608.08395  [pdf, ps, other

    cs.CV cs.RO

    Motion Representation with Acceleration Images

    Authors: Hirokatsu Kataoka, Yun He, Soma Shirakabe, Yutaka Satoh

    Abstract: Information of time differentiation is extremely important cue for a motion representation. We have applied first-order differential velocity from a positional information, moreover we believe that second-order differential acceleration is also a significant feature in a motion representation. However, an acceleration image based on a typical optical flow includes motion noises. We have not employ… ▽ More

    Submitted 30 August, 2016; originally announced August 2016.

  39. arXiv:1608.07876  [pdf, ps, other

    cs.CV cs.MM

    Human Action Recognition without Human

    Authors: Yun He, Soma Shirakabe, Yutaka Satoh, Hirokatsu Kataoka

    Abstract: The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human… ▽ More

    Submitted 28 August, 2016; originally announced August 2016.

  40. arXiv:1605.08247  [pdf, ps, other

    cs.CV cs.LG cs.MM cs.RO

    cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

    Authors: Hirokatsu Kataoka, Yudai Miyashita, Tomoaki Yamabe, Soma Shirakabe, Shin'ichi Sato, Hironori Hoshino, Ryo Kato, Kaori Abe, Takaaki Imanari, Naomichi Kobayashi, Shinichiro Morita, Akio Nakamura

    Abstract: The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to gra… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

    Comments: Survey Paper

  41. arXiv:1605.00324  [pdf, ps, other

    cs.CV

    Dominant Codewords Selection with Topic Model for Action Recognition

    Authors: Hirokatsu Kataoka, Masaki Hayashi, Kenji Iwata, Yutaka Satoh, Yoshimitsu Aoki, Slobodan Ilic

    Abstract: In this paper, we propose a framework for recognizing human activities that uses only in-topic dominant codewords and a mixture of intertopic vectors. Latent Dirichlet allocation (LDA) is used to develop approximations of human motion primitives; these are mid-level representations, and they adaptively integrate dominant vectors when classifying human activities. In LDA topic modeling, action vide… ▽ More

    Submitted 1 May, 2016; originally announced May 2016.

    Comments: in CVPRW16

  42. arXiv:1604.07513  [pdf, ps, other

    cs.CV cs.AI

    Semantic Change Detection with Hypermaps

    Authors: Teppei Suzuki, Soma Shirakabe, Yudai Miyashita, Akio Nakamura, Yutaka Satoh, Hirokatsu Kataoka

    Abstract: Change detection is the study of detecting changes between two different images of a scene taken at different times. By the detected change areas, however, a human cannot understand how different the two images. Therefore, a semantic understanding is required in the change detection research such as disaster investigation. The paper proposes the concept of semantic change detection, which involves… ▽ More

    Submitted 15 March, 2017; v1 submitted 26 April, 2016; originally announced April 2016.

  43. arXiv:1509.07627  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

    Authors: Hirokatsu Kataoka, Kenji Iwata, Yutaka Satoh

    Abstract: In this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture and very deep convolutional network (VGGNet) architecture. To date, most CNN researchers have employed the last layers before output, which were extracted from the fully connected feature layers. However, since it is unlikely that feature representation effectiveness is dependent on the problem, t… ▽ More

    Submitted 25 September, 2015; originally announced September 2015.

    Comments: 5 pages, 3 figures