Skip to main content

Showing 1–50 of 55 results for author: Aizawa, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05924  [pdf

    cs.HC

    Privacy Protection and Video Manipulation in Immersive Media

    Authors: Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: In comparison to traditional footage, 360° videos can convey engaging, immersive experiences and even be utilized to create interactive virtual environments. Like regular recordings, these videos need to consider the privacy of recorded people and could be targets for video manipulations. However, due to their properties like enhanced presence, the effects on users might differ from traditional, n… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  2. arXiv:2404.13993  [pdf, other

    cs.MM cs.CV

    Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

    Authors: Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

    Abstract: Recognizing characters and predicting speakers of dialogue are critical for comic processing tasks, such as voice generation or translation. However, because characters vary by comic title, supervised learning approaches like training character classifiers which require specific annotations for each comic title are infeasible. This motivates us to propose a novel zero-shot approach, allowing machi… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  3. arXiv:2403.20331  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

    Authors: Atsuyuki Miyai, **gkang Yang, **gyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

    Abstract: This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to withhold answers when faced with unsolvable problems in the context of Visual Question Answering (VQA) tasks. UPD encompasses three distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Inco… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/AtsuMiyai/UPD

  4. arXiv:2403.16141  [pdf, other

    cs.CV

    Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

    Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: Recent advancements in the study of Neural Radiance Fields (NeRF) for dynamic scenes often involve explicit modeling of scene dynamics. However, this approach faces challenges in modeling scene dynamics in urban environments, where moving objects of various categories and scales are present. In such settings, it becomes crucial to effectively eliminate moving objects to accurately reconstruct stat… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Project website: https://otonari726.github.io/entitynerf/

  5. arXiv:2312.10806  [pdf, other

    cs.CV

    Cross-Lingual Learning in Multilingual Scene Text Recognition

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to m… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at ICASSP2024, 5 pages, 2 figures

  6. arXiv:2312.08872  [pdf, other

    cs.CV

    The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

    Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

    Abstract: Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel bloc… ▽ More

    Submitted 9 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  7. arXiv:2311.13602  [pdf, other

    cs.CV

    Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

    Authors: Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa

    Abstract: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024 (Oral), Project website: https://udonda.github.io/RALF/ , GitHub: https://github.com/CyberAgentAILab/RALF

  8. arXiv:2310.00847  [pdf, other

    cs.CV

    Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Out-of-distribution (OOD) detection is critical for safety-sensitive machine learning applications and has been extensively studied, yielding a plethora of methods developed in the literature. However, most studies for OOD detection did not use pre-trained models and trained a backbone from scratch. In recent years, transferring knowledge from large pre-trained models to downstream tasks by lightw… ▽ More

    Submitted 12 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  9. arXiv:2307.16204  [pdf, other

    cs.CV

    Open-Set Domain Adaptation with Visual-Language Foundation Models

    Authors: Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge obtained from a source domain with labeled data to a target domain with unlabeled data. Owing to the lack of labeled data in the target domain and the possible presence of unknown classes, open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training p… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  10. arXiv:2306.17469  [pdf, other

    cs.CV

    Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

    Authors: Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

    Abstract: The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characte… ▽ More

    Submitted 22 April, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to ICME2024

  11. arXiv:2306.01293  [pdf, other

    cs.CV

    LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: We present a novel vision-language prompt learning approach for few-shot out-of-distribution (OOD) detection. Few-shot OOD detection aims to detect OOD images from classes that are unseen during training using only a few labeled in-distribution (ID) images. While prompt learning methods such as CoOp have shown effectiveness and efficiency in few-shot ID classification, they still face limitations… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  12. Guided Image Synthesis via Initial Image Editing in Diffusion Model

    Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

    Abstract: Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pi… ▽ More

    Submitted 6 August, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: ACM MM 23

  13. arXiv:2304.04521  [pdf, other

    cs.CV

    Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

    Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

    Abstract: Extracting in-distribution (ID) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which has traditionally been done manually. Automating this preprocessing with deep learning techniques presents two key challenges. First, images should be collected using only the name of the ID class without training on the ID data. Second, as we can see wh… ▽ More

    Submitted 23 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: v3: I fixed some typos from v2

  14. arXiv:2212.03635  [pdf, other

    cs.CV cs.GR

    Non-uniform Sampling Strategies for NeRF on 360{\textdegree} images

    Authors: Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: In recent years, the performance of novel view synthesis using perspective images has dramatically improved with the advent of neural radiance fields (NeRF). This study proposes two novel techniques that effectively build NeRF for 360{\textdegree} omnidirectional images. Due to the characteristics of a 360{\textdegree} image of ERP format that has spatial distortion in their high latitude regions… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Accepted at the 33rd British Machine Vision Conference (BMVC) 2022

  15. arXiv:2211.10437  [pdf, other

    cs.CV

    A Structure-Guided Diffusion Model for Large-Hole Image Completion

    Authors: Daichi Horita, Jiaolong Yang, Dong Chen, Yuki Koyama, Kiyoharu Aizawa, Nicu Sebe

    Abstract: Image completion techniques have made significant progress in filling missing regions (i.e., holes) in images. However, large-hole completion remains challenging due to limited structural information. In this paper, we address this problem by integrating explicit structural guidance into diffusion-based image completion, forming our structure-guided diffusion model (SGDM). It consists of two casca… ▽ More

    Submitted 6 September, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: BMVC2023. Code: https://github.com/UdonDa/Structure_Guided_Diffusion_Model

  16. arXiv:2211.00918  [pdf, other

    eess.IV cs.CV

    Universal Deep Image Compression via Content-Adaptive Optimization with Adapters

    Authors: Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

    Abstract: Deep image compression performs better than conventional codecs, such as JPEG, on natural images. However, deep image compression is learning-based and encounters a problem: the compression performance deteriorates significantly for out-of-domain images. In this study, we highlight this problem and address a novel task: universal deep image compression. This task aims to compress images belonging… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  17. arXiv:2210.12681  [pdf, other

    cs.CV

    Rethinking Rotation in Self-Supervised Contrastive Learning: Adaptive Positive or Negative Data Augmentation

    Authors: Atsuyuki Miyai, Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa

    Abstract: Rotation is frequently listed as a candidate for data augmentation in contrastive learning but seldom provides satisfactory improvements. We argue that this is because the rotated image is always treated as either positive or negative. The semantics of an image can be rotation-invariant or rotation-variant, so whether the rotated image is treated as positive or negative should be determined based… ▽ More

    Submitted 24 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  18. Saliency-based Multiple Region of Interest Detection from a Single 360° image

    Authors: Yuuki Sawabe, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: 360° images are informative -- it contains omnidirectional visual information around the camera. However, the areas that cover a 360° image is much larger than the human's field of view, therefore important information in different view directions is easily overlooked. To tackle this issue, we propose a method for predicting the optimal set of Region of Interest (RoI) from a single 360° image usin… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Journal ref: in IEEE Access, vol. 10, pp. 89124-89133, 2022

  19. Evaluating the Stability of Deep Image Quality Assessment With Respect to Image Scaling

    Authors: Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

    Abstract: Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: IEICE Transactions on Information and Systems (Letter)

  20. arXiv:2207.04675  [pdf, other

    cs.CV

    COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each par… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022. 25 pages, 16 figures

  21. arXiv:2206.10329  [pdf, other

    cs.CV

    SVG Vector Font Generation for Chinese Characters with Transformer

    Authors: Haruka Aoki, Kiyoharu Aizawa

    Abstract: Designing fonts for Chinese characters is highly labor-intensive and time-consuming. While the latest methods successfully generate the English alphabet vector font, despite the high demand for automatic font generation, Chinese vector font generation has been an unsolved problem owing to its complex shape and numerous characters. This study addressed the problem of automatically generating Chines… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to ICIP 2022

  22. arXiv:2204.04634  [pdf, other

    cs.CV cs.MM

    Intersection Prediction from Single 360° Image via Deep Detection of Possible Direction of Travel

    Authors: Naoki Sugimoto, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: Movie-Map, an interactive first-person-view map that engages the user in a simulated walking experience, comprises short 360° video segments separated by traffic intersections that are seamlessly connected according to the viewer's direction of travel. However, in wide urban-scale areas with numerous intersecting roads, manual intersection segmentation requires significant human effort. Therefore,… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in BMVC

  23. arXiv:2204.01027  [pdf, other

    cs.CV

    Distortion-Aware Self-Supervised 360° Depth Estimation from A Single Equirectangular Projection Image

    Authors: Yuya Hasegawa, Ikehata Satoshi, Kiyoharu Aizawa

    Abstract: 360° images are widely available over the last few years. This paper proposes a new technique for single 360° image depth prediction under open environments. Depth prediction from a 360° single image is not easy for two reasons. One is the limitation of supervision datasets - the currently available dataset is limited to indoor scenes. The other is the problems caused by Equirectangular Projection… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

  24. arXiv:2202.03176  [pdf, other

    cs.CV

    Field-of-View IoU for Object Detection in 360° Images

    Authors: Miao Cao, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: 360° cameras have gained popularity over the last few years. In this paper, we propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360° images. Although most object detection neural networks designed for the perspective images are applicable to 360° images in equirectangular projection (ERP) format, their performance deteriorates owing to t… ▽ More

    Submitted 22 September, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  25. arXiv:2110.10456  [pdf, other

    cs.CV

    Noisy Annotation Refinement for Object Detection

    Authors: Jiafeng Mao, Qing Yu, Yoko Yamakata, Kiyoharu Aizawa

    Abstract: Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem s… ▽ More

    Submitted 7 December, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

  26. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, **gyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  27. arXiv:2103.04685  [pdf, other

    cs.LG

    A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels

    Authors: Daiki Tanaka, Daiki Ikami, Kiyoharu Aizawa

    Abstract: Positive-unlabeled learning refers to the process of training a binary classifier using only positive and unlabeled data. Although unlabeled data can contain positive data, all unlabeled data are regarded as negative data in existing positive-unlabeled learning methods, which resulting in diminishing performance. We provide a new perspective on this problem -- considering unlabeled data as noisy-l… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  28. arXiv:2103.04400  [pdf, other

    cs.CV

    What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other t… ▽ More

    Submitted 5 June, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  29. Building Movie Map -- A Tool for Exploring Areas in a City -- and its Evaluation

    Authors: Naoki Sugimoto, Yoshihito Ebine, Kiyoharu Aizawa

    Abstract: We propose a new Movie Map system, with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. In the acquisition stage, omnidirectional videos are taken along streets in target areas. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are subsequently… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Journal ref: ACM Multimedia 2020

  30. arXiv:2011.02206  [pdf, other

    cs.CV

    Few-Shot Font Generation with Deep Metric Learning

    Authors: Haruka Aoki, Koki Tsubota, Hikaru Ikuta, Kiyoharu Aizawa

    Abstract: Designing fonts for languages with a large number of characters, such as Japanese and Chinese, is an extremely labor-intensive and time-consuming task. In this study, we addressed the problem of automatically generating Japanese typographic fonts from only a few font samples, where the synthesized glyphs are expected to have coherent characteristics, such as skeletons, contours, and serifs. Existi… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted to ICPR 2020

  31. arXiv:2011.01655  [pdf, other

    cs.CV

    The Aleatoric Uncertainty Estimation Using a Separate Formulation with Virtual Residuals

    Authors: Takumi Kawashima, Qing Yu, Akari Asai, Daiki Ikami, Kiyoharu Aizawa

    Abstract: We propose a new optimization framework for aleatoric uncertainty estimation in regression problems. Existing methods can quantify the error in the target estimation, but they tend to underestimate it. To obtain the predictive uncertainty inherent in an observation, we propose a new separable formulation for the estimation of a signal and of its uncertainty, avoiding the effect of overfitting. By… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Journal ref: ICPR2020

  32. arXiv:2009.07557  [pdf, other

    cs.CV cs.MM

    SLGAN: Style- and Latent-guided Generative Adversarial Network for Desirable Makeup Transfer and Removal

    Authors: Daichi Horita, Kiyoharu Aizawa

    Abstract: There are five features to consider when using generative adversarial networks to apply makeup to photos of the human face. These features include (1) facial components, (2) interactive color adjustments, (3) makeup variations, (4) robustness to poses and expressions, and the (5) use of multiple reference images. Several related works have been proposed, mainly using generative adversarial network… ▽ More

    Submitted 24 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: 9 pages, 9 figures

  33. arXiv:2007.12619  [pdf, other

    eess.IV cs.CV

    Channel-Level Variable Quantization Network for Deep Image Compression

    Authors: Zhisheng Zhong, Hiroaki Akutsu, Kiyoharu Aizawa

    Abstract: Deep image compression systems mainly contain four components: encoder, quantizer, entropy model, and decoder. To optimize these four components, a joint rate-distortion framework was proposed, and many deep neural network-based methods achieved great success in image compression. However, almost all convolutional neural network-based methods treat channel-wise feature maps equally, reducing the f… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2020

  34. arXiv:2007.11330  [pdf, other

    cs.CV

    Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning

    Authors: Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa

    Abstract: Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available. While existing SSL methods assume that samples in the labeled and unlabeled data share the classes of their samples, we address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  35. Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

    Authors: Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, Hikaru Ikuta

    Abstract: Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the fra… ▽ More

    Submitted 12 May, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: 10 pages, 8 figures

    ACM Class: I.4

    Journal ref: IEEE MultiMedia 2020

  36. arXiv:1908.04951  [pdf, other

    cs.CV

    Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy

    Authors: Qing Yu, Kiyoharu Aizawa

    Abstract: Since deep learning models have been implemented in many commercial applications, it is important to detect out-of-distribution (OOD) inputs correctly to maintain the performance of the models, ensure the quality of the collected data, and prevent the applications from being used for other-than-intended purposes. In this work, we propose a two-head deep convolutional neural network (CNN) and maxim… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

    Journal ref: ICCV2019

  37. arXiv:1908.03792  [pdf, other

    cs.CV

    Object-Aware Instance Labeling for Weakly Supervised Object Detection

    Authors: Satoshi Kosugi, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Weakly supervised object detection (WSOD), where a detector is trained with only image-level annotations, is attracting more and more attention. As a method to obtain a well-performing detector, the detector and the instance labels are updated iteratively. In this study, for more efficient iterative updating, we focus on the instance labeling problem, a problem of which label should be annotated t… ▽ More

    Submitted 10 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019 (oral)

  38. arXiv:1905.01312  [pdf, other

    cs.CV

    TriDepth: Triangular Patch-based Deep Depth Prediction

    Authors: Masaya Kaneko, Ken Sakurada, Kiyoharu Aizawa

    Abstract: We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs). Point-cloud is generally used for CNN-based 3D scene reconstruction; however it has some drawbacks: (1) it is redundant as a representation for planar surfaces, and (2) no spatial relationships between points are available (e.g, texture and surface). As a more efficient repr… ▽ More

    Submitted 11 March, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: Project webpage: https://meshdepth.github.io/

  39. arXiv:1904.12628  [pdf, other

    cs.CV cs.MM

    Computational Attention System for Children, Adults and Elderly

    Authors: Onkar Krishna, Kiyoharu Aizawa, Go Irie

    Abstract: The existing computational visual attention systems have focused on the objective to basically simulate and understand the concept of visual attention system in adults. Consequently, the impact of observer's age in scene viewing behavior has rarely been considered. This study quantitatively analyzed the age-related differences in gaze landings during scene viewing for three different class of imag… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

  40. Recognition of Multiple Food Items in a Single Photo for Use in a Buffet-Style Restaurant

    Authors: Masashi Anzawa, Sosuke Amano, Yoko Yamakata, Keiko Motonaga, Akiko Kamei, Kiyoharu Aizawa

    Abstract: We investigate image recognition of multiple food items in a single photo, focusing on a buffet restaurant application, where menu changes at every meal, and only a few images per class are available. After detecting food areas, we perform hierarchical recognition. We evaluate our results, comparing to two baseline methods.

    Submitted 3 March, 2019; originally announced March 2019.

    Comments: 5 pages, 7 figures

    Report number: Vol.E102-D No.2 pp.410-414

    Journal ref: IEICE TRANSACTIONS on Information and Systems, 2019

  41. arXiv:1809.00665  [pdf, other

    cs.CV

    Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning

    Authors: Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Akiko Aizawa, Kiyoharu Aizawa

    Abstract: Face hallucination is a technique that reconstruct high-resolution (HR) faces from low-resolution (LR) faces, by using the prior knowledge learned from HR/LR face pairs. Most state-of-the-arts leverage position-patch prior knowledge of human face to estimate the optimal representation coefficients for each image patch. However, they focus only the position information and usually ignore the contex… ▽ More

    Submitted 14 September, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

    Comments: 13 pages, 15 figures, Accepted by IEEE TCYB

  42. arXiv:1808.08544  [pdf, other

    cs.CV

    Scale Drift Correction of Camera Geo-Localization using Geo-Tagged Images

    Authors: Kazuya Iwami, Satoshi Ikehata, Kiyoharu Aizawa

    Abstract: Camera geo-localization from a monocular video is a fundamental task for video analysis and autonomous navigation. Although 3D reconstruction is a key technique to obtain camera poses, monocular 3D reconstruction in a large environment tends to result in the accumulation of errors in rotation, translation, and especially in scale: a problem known as scale drift. To overcome these errors, we propos… ▽ More

    Submitted 26 August, 2018; originally announced August 2018.

    Comments: ECCV Workshop CVRSUAD

  43. arXiv:1805.02997  [pdf, other

    cs.CV

    Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data

    Authors: Yi Yu, Suhua Tang, Kiyoharu Aizawa, Akiko Aizawa

    Abstract: In this work, travel destination and business location are taken as venues. Discovering a venue by a photo is very important for context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photos generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning mod… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

  44. Personalized Classifier for Food Image Recognition

    Authors: Shota Horiguchi, Sosuke Amano, Makoto Ogawa, Kiyoharu Aizawa

    Abstract: Currently, food image recognition tasks are evaluated against fixed datasets. However, in real-world conditions, there are cases in which the number of samples in each class continues to increase and samples from novel classes appear. In particular, dynamic datasets in which each individual user creates samples and continues the updating process often have content that varies considerably between… ▽ More

    Submitted 8 April, 2018; originally announced April 2018.

    Comments: Accepted to IEEE Transaction on Multimedia. http://ieeexplore.ieee.org/document/8316919/

    Journal ref: IEEE Transactions on Multimedia 20.10 (2018): 2836-2848

  45. arXiv:1803.11370  [pdf, other

    cs.CV

    Parallel Grid Pooling for Data Augmentation

    Authors: Akito Takeki, Daiki Ikami, Go Irie, Kiyoharu Aizawa

    Abstract: Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicab… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

  46. arXiv:1803.11365  [pdf, other

    cs.CV

    Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

    Authors: Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, we present a framework for a novel task, cross-domain weakly supervised object detection, which addresses this question. For this paper, we have access to images with instance-level annotations in a source domain (e.g., natural image) and images with image-level annotations in a target dom… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: To appear at CVPR2018 (poster), including supplementary materials

  47. arXiv:1803.11364  [pdf, other

    cs.CV cs.LG stat.ML

    Joint Optimization Framework for Learning with Noisy Labels

    Authors: Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this problem,… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: To appear at CVPR 2018 (poster), including supplementary material

    Journal ref: CVPR 2018, pp.5552--5550

  48. arXiv:1803.08670  [pdf, ps, other

    cs.CV cs.MM

    Object Detection for Comics using Manga109 Annotations

    Authors: Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection ta… ▽ More

    Submitted 26 March, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: http://www.manga109.org/en/

  49. Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features

    Authors: Shota Horiguchi, Daiki Ikami, Kiyoharu Aizawa

    Abstract: The extraction of useful deep features is important for many computer vision tasks. Deep features extracted from classification networks have proved to perform well in those tasks. To obtain features of greater usefulness, end-to-end distance metric learning (DML) has been applied to train the feature extractor directly. However, in these DML studies, there were no equitable comparisons between fe… ▽ More

    Submitted 13 April, 2019; v1 submitted 29 December, 2017; originally announced December 2017.

    Comments: 6 pages

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

  50. arXiv:1709.03708  [pdf, other

    cs.CV cs.MM

    PQk-means: Billion-scale Clustering for Product-quantized Codes

    Authors: Yusuke Matsui, Keisuke Ogaki, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Data clustering is a fundamental operation in data analysis. For handling large-scale data, the standard k-means clustering method is not only slow, but also memory-inefficient. We propose an efficient clustering method for billion-scale feature vectors, called PQk-means. By first compressing input vectors into short product-quantized (PQ) codes, PQk-means achieves fast and memory-efficient cluste… ▽ More

    Submitted 12 September, 2017; originally announced September 2017.

    Comments: To appear in ACMMM 2017