Skip to main content

Showing 1–50 of 55 results for author: Okatani, T

.
  1. arXiv:2407.05312  [pdf, other

    cs.CV

    An Improved Method for Personalizing Diffusion Models

    Authors: Yan Zeng, Masanori Suganuma, Takayuki Okatani

    Abstract: Diffusion models have demonstrated impressive image generation capabilities. Personalized approaches, such as textual inversion and Dreambooth, enhance model individualization using specific images. These methods enable generating images of specific objects based on diverse textual contexts. Our proposed approach aims to retain the model's original knowledge during new information integration, res… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  2. arXiv:2401.09861  [pdf, other

    cs.CV cs.AI

    Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models

    Authors: Li Sun, Liuan Wang, Jun Sun, Takayuki Okatani

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced the comprehension of multimedia content, bringing together diverse modalities such as text, images, and videos. However, a critical challenge faced by these models, especially when processing video inputs, is the occurrence of hallucinations - erroneous perceptions or interpretations, particularly at the ev… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 7 pages, 7 figures

  3. arXiv:2311.03747  [pdf

    cs.CV

    SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers

    Authors: Xiangyong Lu, Masanori Suganuma, Takayuki Okatani

    Abstract: Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although many lightweight networks have been developed for mobile/edge devices, they primarily… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: 11 pages, 2 figures, WACV2024

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2024)

  4. arXiv:2310.04671  [pdf, other

    cs.CV

    Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction

    Authors: Korawat Charoenpitaks, Van-Quang Nguyen, Masanori Suganuma, Masahiro Takahashi, Ryoma Niihara, Takayuki Okatani

    Abstract: This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from stati… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Main Paper: 11 pages, Supplementary Materials: 25 pages

    Journal ref: IEEE Trans. Intell. Veh. (2024) 1-11

  5. arXiv:2307.03243  [pdf, other

    cs.CV

    That's BAD: Blind Anomaly Detection by Implicit Local Feature Clustering

    Authors: Jie Zhang, Masanori Suganuma, Takayuki Okatani

    Abstract: Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (\textit{i.e.}, anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalie… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  6. arXiv:2307.03101  [pdf, other

    cs.CV

    Contextual Affinity Distillation for Image Anomaly Detection

    Authors: Jie Zhang, Masanori Suganuma, Takayuki Okatani

    Abstract: Previous works on unsupervised industrial anomaly detection mainly focus on local structural anomalies such as cracks and color contamination. While achieving significantly high detection performance on this kind of anomaly, they are faced with logical anomalies that violate the long-range dependencies such as a normal object placed in the wrong position. In this paper, based on previous knowledge… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  7. arXiv:2307.02897  [pdf, other

    cs.CV

    RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution

    Authors: Han Zou, Masanori Suganuma, Takayuki Okatani

    Abstract: Smartphones equipped with a multi-camera system comprising multiple cameras with different field-of-view (FoVs) are becoming more prevalent. These camera configurations are compatible with reference-based SR and video SR, which can be executed simultaneously while recording video on the device. Thus, combining these two SR methods can improve image quality. Recently, Lee et al. have presented such… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  8. arXiv:2307.02875  [pdf, other

    cs.CV

    Reference-based Motion Blur Removal: Learning to Utilize Sharpness in the Reference Image

    Authors: Han Zou, Masanori Suganuma, Takayuki Okatani

    Abstract: Despite the recent advancement in the study of removing motion blur in an image, it is still hard to deal with strong blurs. While there are limits in removing blurs from a single image, it has more potential to use multiple images, e.g., using an additional image as a reference to deblur a blurry image. A typical setting is deburring an image using a nearby sharp image(s) in a video sequence, as… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  9. arXiv:2302.09208  [pdf, other

    cs.CV

    Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

    Authors: Tatsuro Yamane, Pang-jo Chun, Ji Dang, Takayuki Okatani

    Abstract: In this paper, a bridge member damage cause estimation framework is proposed by calculating the image position using Structure from Motion (SfM) and acquiring its information via Visual Question Answering (VQA). For this, a VQA model was developed that uses bridge images for dataset creation and outputs the damage or member name and its existence based on the images and questions. In the developed… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  10. arXiv:2212.13105  [pdf, other

    cs.CV

    SuperGF: Unifying Local and Global Features for Visual Localization

    Authors: Wenzheng Song, Ran Yan, Boshu Lei, Takayuki Okatani

    Abstract: Advanced visual localization techniques encompass image retrieval challenges and 6 Degree-of-Freedom (DoF) camera pose estimation, such as hierarchical localization. Thus, they must extract global and local features from input images. Previous methods have achieved this through resource-intensive or accuracy-reducing means, such as combinatorial pipelines or multi-task distillation. In this study,… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

  11. arXiv:2207.09775  [pdf, other

    cs.CV

    Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation

    Authors: Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani

    Abstract: Open-set object detection (OSOD) has recently gained attention. It is to detect unknown objects while correctly detecting known objects. In this paper, we first point out that the recent studies' formalization of OSOD, which generalizes open-set recognition (OSR) and thus considers an unlimited variety of unknown objects, has a fundamental issue. This issue emerges from the difference between imag… ▽ More

    Submitted 29 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 17 pages, 7 figures

  12. arXiv:2207.09666  [pdf, other

    cs.CV cs.AI cs.CL

    GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

    Authors: Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

    Abstract: Current state-of-the-art methods for image captioning employ region-based features, as they provide object-level information that is essential to describe the content of images; they are usually extracted by an object detector such as Faster R-CNN. However, they have several issues, such as lack of contextual information, the risk of inaccurate detection, and the high computational cost. The first… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022; 14 pages with appendix; Code: https://github.com/davidnvq/grit

  13. arXiv:2207.03047  [pdf, other

    cs.CV eess.IV

    Single-image Defocus Deblurring by Integration of Defocus Map Prediction Tracing the Inverse Problem Computation

    Authors: Qian Ye, Masanori Suganuma, Takayuki Okatani

    Abstract: In this paper, we consider the problem in defocus image deblurring. Previous classical methods follow two-steps approaches, i.e., first defocus map estimation and then the non-blind deblurring. In the era of deep learning, some researchers have tried to address these two problems by CNN. However, the simple concatenation of defocus map, which represents the blur level, leads to suboptimal performa… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  14. arXiv:2207.02539  [pdf, other

    cs.CV

    Learning Regularized Multi-Scale Feature Flow for High Dynamic Range Imaging

    Authors: Qian Ye, Masanori Suganuma, Jun Xiao, Takayuki Okatani

    Abstract: Reconstructing ghosting-free high dynamic range (HDR) images of dynamic scenes from a set of multi-exposure images is a challenging task, especially with large object motion and occlusions, leading to visible artifacts using existing methods. To address this problem, we propose a deep network that tries to learn multi-scale feature flow guided by the regularized loss. It first extracts multi-scale… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  15. arXiv:2207.00067  [pdf, other

    cs.CV cs.AI

    Rethinking Unsupervised Domain Adaptation for Semantic Segmentation

    Authors: Zhijie Wang, Masanori Suganuma, Takayuki Okatani

    Abstract: Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after t… ▽ More

    Submitted 22 January, 2024; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: Under review in Pattern Recognition Letters

  16. arXiv:2112.09515  [pdf, other

    cs.CV cs.AI

    Symmetry-aware Neural Architecture for Embodied Visual Navigation

    Authors: Shuang Liu, Takayuki Okatani

    Abstract: Visual exploration is a task that seeks to visit all the navigable areas of an environment as quickly as possible. The existing methods employ deep reinforcement learning (RL) as the standard tool for the task. However, they tend to be vulnerable to statistical shifts between the training and test data, resulting in poor generalization over novel environments that are out-of-distribution (OOD) fro… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  17. arXiv:2109.06432  [pdf, other

    cs.CV cs.AI

    Improved Few-shot Segmentation by Redefinition of the Roles of Multi-level CNN Features

    Authors: Zhijie Wang, Masanori Suganuma, Takayuki Okatani

    Abstract: This study is concerned with few-shot segmentation, i.e., segmenting the region of an unseen object class in a query image, given support image(s) of its instances. The current methods rely on the pretrained CNN features of the support and query images. The key to good performance depends on the proper fusion of their mid-level and high-level features; the former contains shape-oriented informatio… ▽ More

    Submitted 14 September, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  18. arXiv:2109.06422  [pdf, other

    cs.CV cs.AI

    Cross-Region Domain Adaptation for Class-level Alignment

    Authors: Zhijie Wang, Xing Liu, Masanori Suganuma, Takayuki Okatani

    Abstract: Semantic segmentation requires a lot of training data, which necessitates costly annotation. There have been many studies on unsupervised domain adaptation (UDA) from one domain to another, e.g., from computer graphics to real images. However, there is still a gap in accuracy between UDA and supervised training on native domain data. It is arguably attributable to class-level misalignment between… ▽ More

    Submitted 6 October, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: Under review in Computer Vision and Image Understanding

  19. arXiv:2109.03585  [pdf, other

    cs.CV

    Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes

    Authors: Wenzheng Song, Masanori Suganuma, Xing Liu, Noriyuki Shimobayashi, Daisuke Maruta, Takayuki Okatani

    Abstract: This paper considers matching images of low-light scenes, aiming to widen the frontier of SfM and visual SLAM applications. Recent image sensors can record the brightness of scenes with more than eight-bit precision, available in their RAW-format image. We are interested in making full use of such high-precision information to match extremely low-light scene images that conventional methods cannot… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 15 pages, 14 figures, ICCV2021

    MSC Class: 68T40; 68T07

  20. arXiv:2108.08585  [pdf, other

    cs.CV

    Progressive and Selective Fusion Network for High Dynamic Range Imaging

    Authors: Qian Ye, Jun Xiao, Kin-man Lam, Takayuki Okatani

    Abstract: This paper considers the problem of generating an HDR image of a scene from its LDR images. Recent studies employ deep learning and solve the problem in an end-to-end fashion, leading to significant performance improvements. However, it is still hard to generate a good quality image from LDR images of a dynamic scene captured by a hand-held camera, e.g., occlusion due to the large motion of foregr… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  21. arXiv:2106.00596  [pdf, other

    cs.CV

    Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks

    Authors: Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

    Abstract: There is a growing interest in the community in making an embodied AI agent perform a complicated task while interacting with an environment following natural language directives. Recent studies have tackled the problem using ALFRED, a well-designed dataset for the task, but achieved only very low accuracy. This paper proposes a new method, which outperforms the previous methods by a large margin.… ▽ More

    Submitted 6 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: To appear in IJCAI2021. 8-page main paper and Appendix following. Appendix E for details of entry submission to EAI 2021. Github: https://github.com/davidnvq/lwit-alfred

  22. arXiv:2101.03326  [pdf, other

    cs.CV

    Pushing the Envelope of Thin Crack Detection

    Authors: Liang Xu, Taro Hatsutani, Xing Liu, Engkarat Techapanurak, Han Zou, Takayuki Okatani

    Abstract: In this study, we consider the problem of detecting cracks from the image of a concrete surface for automated inspection of infrastructure, such as bridges. Its overall accuracy is determined by how accurately thin cracks with sub-pixel widths can be detected. Our interest is in making it possible to detect cracks close to the limit of thinness if it can be defined. Toward this end, we first propo… ▽ More

    Submitted 9 January, 2021; originally announced January 2021.

  23. arXiv:2101.02500  [pdf, other

    cs.CV cs.AI

    Bridging In- and Out-of-distribution Samples for Their Better Discriminability

    Authors: Engkarat Techapanurak, Anh-Chuong Dang, Takayuki Okatani

    Abstract: This paper proposes a method for OOD detection. Questioning the premise of previous studies that ID and OOD samples are separated distinctly, we consider samples lying in the intermediate of the two and use them for training a network. We generate such samples using multiple image transformations that corrupt inputs in various ways and with different severity levels. We estimate where the generate… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

  24. arXiv:2101.02447  [pdf, other

    cs.CV cs.AI

    Practical Evaluation of Out-of-Distribution Detection Methods for Image Classification

    Authors: Engkarat Techapanurak, Takayuki Okatani

    Abstract: We reconsider the evaluation of OOD detection methods for image recognition. Although many studies have been conducted so far to build better OOD detection methods, most of them follow Hendrycks and Gimpel's work for the method of experimental evaluation. While the unified evaluation method is necessary for a fair comparison, there is a question of if its choice of tasks and datasets reflect real-… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

  25. arXiv:2005.03463  [pdf, other

    eess.IV cs.CV

    How Can CNNs Use Image Position for Segmentation?

    Authors: Rito Murase, Masanori Suganuma, Takayuki Okatani

    Abstract: Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs. The study further claims that the position information enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there is a technical issue w… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: 11 pages

  26. arXiv:1911.11390  [pdf, other

    cs.CV

    Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs

    Authors: Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani

    Abstract: It has been a primary concern in recent studies of vision and language tasks to design an effective attention mechanism dealing with interactions between the two modalities. The Transformer has recently been extended and applied to several bi-modal tasks, yielding promising results. For visual dialog, it becomes necessary to consider interactions between three or more inputs, i.e., an image, a que… ▽ More

    Submitted 17 July, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Accepted to ECCV 2020, 14 pages. Slight change in title

  27. arXiv:1911.08790  [pdf, other

    cs.CV

    Analysis of Deep Networks for Monocular Depth Estimation Through Adversarial Attacks with Proposal of a Defense Method

    Authors: Junjie Hu, Takayuki Okatani

    Abstract: In this paper, we consider adversarial attacks against a system of monocular depth estimation (MDE) based on convolutional neural networks (CNNs). The motivation is two-fold. One is to study the security of MDE systems, which has not been actively considered in the community. The other is to improve our understanding of the computational mechanism of CNNs performing MDE. Toward this end, we apply… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

  28. arXiv:1910.09212  [pdf, other

    cs.CV

    Analysis and a Solution of Momentarily Missed Detection for Anchor-based Object Detectors

    Authors: Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani

    Abstract: The employment of convolutional neural networks has led to significant performance improvement on the task of object detection. However, when applying existing detectors to continuous frames in a video, we often encounter momentary miss-detection of objects, that is, objects are undetected exceptionally at a few frames, although they are correctly detected at all other frames. In this paper, we an… ▽ More

    Submitted 16 January, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted to WACV 2020, 9 pages

  29. arXiv:1907.04508  [pdf, other

    cs.CV

    Restoring Images with Unknown Degradation Factors by Recurrent Use of a Multi-branch Network

    Authors: Xing Liu, Masanori Suganuma, Xiyang Luo, Takayuki Okatani

    Abstract: The employment of convolutional neural networks has achieved unprecedented performance in the task of image restoration for a variety of degradation factors. However, high-performance networks have been specifically designed for a single degradation factor. In this paper, we tackle a harder problem, restoring a clean image from its degraded version with an unknown degradation factor, subject to th… ▽ More

    Submitted 21 January, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

  30. arXiv:1905.13560  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Evaluating Artificial Systems for Pairwise Ranking Tasks Sensitive to Individual Differences

    Authors: Xing Liu, Takayuki Okatani

    Abstract: Owing to the advancement of deep learning, artificial systems are now rival to humans in several pattern recognition tasks, such as visual recognition of object categories. However, this is only the case with the tasks for which correct answers exist independent of human perception. There is another type of tasks for which what to predict is human perception itself, in which there are often indivi… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  31. arXiv:1905.10628  [pdf, other

    cs.CV

    Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity

    Authors: Engkarat Techapanurak, Masanori Suganuma, Takayuki Okatani

    Abstract: The ability to detect out-of-distribution (OOD) samples is vital to secure the reliability of deep neural networks in real-world applications. Considering the nature of OOD samples, detection methods should not have hyperparameters that need to be tuned depending on incoming OOD samples. However, most of the recently proposed methods do not meet this requirement, leading to compromised performance… ▽ More

    Submitted 25 November, 2019; v1 submitted 25 May, 2019; originally announced May 2019.

    Comments: Extend the supplementary material

  32. arXiv:1905.08609  [pdf, other

    cs.CV

    Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment

    Authors: Mingzhen Shao, Zhun Sun, Mete Ozay, Takayuki Okatani

    Abstract: We address a problem of estimating pose of a person's head from its RGB image. The employment of CNNs for the problem has contributed to significant improvement in accuracy in recent works. However, we show that the following two methods, despite their simplicity, can attain further improvement: (i) proper adjustment of the margin of bounding box of a detected face, and (ii) choice of loss functio… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: IEEE International Conference on Automatic Face & Gesture Recognition (FG2019)

  33. arXiv:1904.03380  [pdf, other

    cs.CV

    Visualization of Convolutional Neural Networks for Monocular Depth Estimation

    Authors: Junjie Hu, Yan Zhang, Takayuki Okatani

    Abstract: Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: how CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of i… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

  34. arXiv:1903.08817  [pdf, other

    cs.CV

    Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration

    Authors: Xing Liu, Masanori Suganuma, Zhun Sun, Takayuki Okatani

    Abstract: In this paper, we study design of deep neural networks for tasks of image restoration. We propose a novel style of residual connections dubbed "dual residual connection", which exploits the potential of paired operations, e.g., up- and down-sampling or convolution with large- and small-size kernels. We design a modular block implementing this connection style; it is equipped with two containers to… ▽ More

    Submitted 7 April, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: i) Accepted to CVPR 2019 ii) Code, trained models and additional results for visual comparison will be provided at https://github.com/liu-vis/DualResidualNetworks

  35. arXiv:1901.04870  [pdf, other

    cs.CV

    Toward Explainable Fashion Recommendation

    Authors: Pongsate Tangseng, Takayuki Okatani

    Abstract: Many studies have been conducted so far to build systems for recommending fashion items and outfits. Although they achieve good performances in their respective tasks, most of them cannot explain their judgments to the users, which compromises their usefulness. Toward explainable fashion recommendation, this study proposes a system that is able not only to provide a goodness score for an outfit bu… ▽ More

    Submitted 27 July, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

  36. arXiv:1812.00733  [pdf, other

    cs.CV

    Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions

    Authors: Masanori Suganuma, Xing Liu, Takayuki Okatani

    Abstract: Many studies have been conducted so far on image restoration, the problem of restoring a clean image from its distorted version. There are many different types of distortion which affect image quality. Previous studies have focused on single types of distortion, proposing methods for removing them. However, image quality degrades due to multiple factors in the real world. Thus, depending on applic… ▽ More

    Submitted 7 April, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: CVPR 2019

  37. arXiv:1812.00500  [pdf, other

    cs.CV

    Multi-task Learning of Hierarchical Vision-Language Representation

    Authors: Duy-Kien Nguyen, Takayuki Okatani

    Abstract: It is still challenging to build an AI system that can perform tasks that involve vision and language at human level. So far, researchers have singled out individual tasks separately, for each of which they have designed networks and trained them on its dedicated datasets. Although this approach has seen a certain degree of success, it comes with difficulties of understanding relations among diffe… ▽ More

    Submitted 2 December, 2018; originally announced December 2018.

  38. arXiv:1804.09979  [pdf, other

    cs.CV

    Recommending Outfits from Personal Closet

    Authors: Pongsate Tangseng, Kota Yamaguchi, Takayuki Okatani

    Abstract: We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet. The challenge in outfit grading is that the input to the system is a bag of item pictures that are unordered and vary in size. We build a deep neural network-based system that can take variable-length items a… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

  39. arXiv:1804.00775  [pdf, other

    cs.CV

    Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

    Authors: Duy-Kien Nguyen, Takayuki Okatani

    Abstract: A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers. Specifically, we present a simple architecture that is fully symmetric between visual an… ▽ More

    Submitted 1 December, 2018; v1 submitted 2 April, 2018; originally announced April 2018.

    Comments: In Proceeding of CVPR'2018

  40. arXiv:1803.08673  [pdf, other

    cs.CV

    Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries

    Authors: Junjie Hu, Mete Ozay, Yan Zhang, Takayuki Okatani

    Abstract: This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper,… ▽ More

    Submitted 22 September, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

  41. arXiv:1803.00370  [pdf, other

    cs.NE

    Exploiting the Potential of Standard Convolutional Autoencoders for Image Restoration by Evolutionary Search

    Authors: Masanori Suganuma, Mete Ozay, Takayuki Okatani

    Abstract: Researchers have applied deep neural networks to image restoration tasks, in which they proposed various network architectures, loss functions, and training methods. In particular, adversarial training, which is employed in recent studies, seems to be a key ingredient to success. In this paper, we show that simple convolutional autoencoders (CAEs) built upon only standard network components, i.e.,… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Comments: Our code is available at https://github.com/sg-nm/Evolutionary-Autoencoders

  42. arXiv:1801.07939  [pdf, ps, other

    cs.CV

    Deep Structured Energy-Based Image Inpainting

    Authors: Fazil Altinel, Mete Ozay, Takayuki Okatani

    Abstract: In this paper, we propose a structured image inpainting method employing an energy based model. In order to learn structural relationship between patterns observed in images and missing regions of the images, we employ an energy-based structured prediction method. The structural relationship is learned by minimizing an energy function which is defined by a simple convolutional neural network. The… ▽ More

    Submitted 30 August, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: Accepted to 24th International Conference on Pattern Recognition (ICPR 2018). 6 pages, 7 figures

  43. arXiv:1712.04138  [pdf, other

    cs.CV

    A vision based system for underwater docking

    Authors: Shuang Liu, Mete Ozay, Takayuki Okatani, Hongli Xu, Kai Sun, Yang Lin

    Abstract: Autonomous underwater vehicles (AUVs) have been deployed for underwater exploration. However, its potential is confined by its limited on-board battery energy and data storage capacity. This problem has been addressed using docking systems by underwater recharging and data transfer for AUVs. In this work, we propose a vision based framework for underwater docking following these systems. The propo… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

  44. arXiv:1711.01791  [pdf, other

    cs.CV

    HyperNetworks with statistical filtering for defending adversarial examples

    Authors: Zhun Sun, Mete Ozay, Takayuki Okatani

    Abstract: Deep learning algorithms have been known to be vulnerable to adversarial perturbations in various tasks such as image classification. This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks. However, training and manipulating networks according to particular defense schemes increases computational complexity of the learning algorit… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  45. arXiv:1708.01892  [pdf, other

    cs.CV

    End-to-end learning potentials for structured attribute prediction

    Authors: Kota Yamaguchi, Takayuki Okatani, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo

    Abstract: We present a structured inference approach in deep neural networks for multiple attribute prediction. In attribute prediction, a common approach is to learn independent classifiers on top of a good feature representation. However, such classifiers assume conditional independence on features and do not explicitly consider the dependency between attributes in the inference process. We propose to for… ▽ More

    Submitted 6 August, 2017; originally announced August 2017.

  46. arXiv:1707.07831  [pdf, other

    stat.ML cs.LG

    Linear Discriminant Generative Adversarial Networks

    Authors: Zhun Sun, Mete Ozay, Takayuki Okatani

    Abstract: We develop a novel method for training of GANs for unsupervised and class conditional generation of images, called Linear Discriminant GAN (LD-GAN). The discriminator of an LD-GAN is trained to maximize the linear separability between distributions of hidden representations of generated and targeted samples, while the generator is updated based on the decision hyper-planes computed by performing L… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

  47. arXiv:1707.07830  [pdf, other

    cs.CV

    Improving Robustness of Feature Representations to Image Deformations using Powered Convolution in CNNs

    Authors: Zhun Sun, Mete Ozay, Takayuki Okatani

    Abstract: In this work, we address the problem of improvement of robustness of feature representations learned using convolutional neural networks (CNNs) to image deformation. We argue that higher moment statistics of feature distributions could be shifted due to image deformations, and the shift leads to degrade of performance and cannot be reduced by ordinary normalization methods as observed in experimen… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

  48. arXiv:1706.04635  [pdf, other

    cs.LG cs.IT stat.ML

    Information Potential Auto-Encoders

    Authors: Yan Zhang, Mete Ozay, Zhun Sun, Takayuki Okatani

    Abstract: In this paper, we suggest a framework to make use of mutual information as a regularization criterion to train Auto-Encoders (AEs). In the proposed framework, AEs are regularized by minimization of the mutual information between input and encoding variables of AEs during the training phase. In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-para… ▽ More

    Submitted 6 August, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Information Theory

  49. arXiv:1704.00509  [pdf, other

    cs.CV

    Truncating Wide Networks using Binary Tree Architectures

    Authors: Yan Zhang, Mete Ozay, Shuohao Li, Takayuki Okatani

    Abstract: Recent study shows that a wide deep network can obtain accuracy comparable to a deeper but narrower network. Compared to narrower and deeper networks, wide networks employ relatively less number of layers and have various important benefits, such that they have less running time on parallel computing devices, and they are less affected by gradient vanishing problems. However, the parameter size of… ▽ More

    Submitted 3 April, 2017; originally announced April 2017.

    Comments: 10 pages

  50. arXiv:1701.06123  [pdf, ps, other

    cs.CV cs.LG cs.NE

    Optimization on Product Submanifolds of Convolution Kernels

    Authors: Mete Ozay, Takayuki Okatani

    Abstract: Recent advances in optimization methods used for training convolutional neural networks (CNNs) with kernels, which are normalized according to particular constraints, have shown remarkable success. This work introduces an approach for training CNNs using ensembles of joint spaces of kernels constructed using different constraints. For this purpose, we address a problem of optimization on ensembles… ▽ More

    Submitted 27 November, 2017; v1 submitted 22 January, 2017; originally announced January 2017.

    Comments: 7 pages