Search | arXiv e-print repository

CLRerNet: Improving Confidence of Lane Detection with LaneIoU

Abstract: Lane marker detection is a crucial component of the autonomous driving and driver assistance systems. Modern deep lane detection methods with row-based lane representation exhibit excellent performance on lane detection benchmarks. Through preliminary oracle experiments, we firstly disentangle the lane representation components to determine the direction of our approach. We show that correct lane… ▽ More Lane marker detection is a crucial component of the autonomous driving and driver assistance systems. Modern deep lane detection methods with row-based lane representation exhibit excellent performance on lane detection benchmarks. Through preliminary oracle experiments, we firstly disentangle the lane representation components to determine the direction of our approach. We show that correct lane positions are already among the predictions of an existing row-based detector, and the confidence scores that accurately represent intersection-over-union (IoU) with ground truths are the most beneficial. Based on the finding, we propose LaneIoU that better correlates with the metric, by taking the local lane angles into consideration. We develop a novel detector coined CLRerNet featuring LaneIoU for the target assignment cost and loss functions aiming at the improved quality of confidence scores. Through careful and fair benchmark including cross validation, we demonstrate that CLRerNet outperforms the state-of-the-art by a large margin - enjoying F1 score of 81.43% compared with 80.47% of the existing method on CULane, and 86.47% compared with 86.10% on CurveLanes. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2108.13699 [pdf, other]

End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Authors: Hiroto Honda, Motoki Kimura, Takumi Karasawa, Yusuke Uchida

Abstract: Vanishing points (VPs) play a vital role in various computer vision tasks, especially for recognizing the 3D scenes from an image. In the real-world scenario of automobile applications, it is costly to manually obtain the external camera parameters when the camera is attached to the vehicle or the attachment is accidentally perturbed. In this paper we introduce a simple but effective end-to-end va… ▽ More Vanishing points (VPs) play a vital role in various computer vision tasks, especially for recognizing the 3D scenes from an image. In the real-world scenario of automobile applications, it is costly to manually obtain the external camera parameters when the camera is attached to the vehicle or the attachment is accidentally perturbed. In this paper we introduce a simple but effective end-to-end vanishing point detection. By automatically calculating intersection of the extrapolated lane marker annotations, we obtain geometrically consistent VP labels and mitigate human annotation errors caused by manual VP labeling. With the calculated VP labels we train end-to-end VP Detector via heatmap estimation. The VP Detector realizes higher accuracy than the methods utilizing manual annotation or lane detection, paving the way for accurate online camera calibration. △ Less

Submitted 31 August, 2021; originally announced August 2021.

arXiv:2012.02604 [pdf, other]

Prediction of Lane Number Using Results From Lane Detection

Authors: Panumate Chetprayoon, Fumihiko Takahashi, Yusuke Uchida

Abstract: The lane number that the vehicle is traveling in is a key factor in intelligent vehicle fields. Many lane detection algorithms were proposed and if we can perfectly detect the lanes, we can directly calculate the lane number from the lane detection results. However, in fact, lane detection algorithms sometimes underperform. Therefore, we propose a new approach for predicting the lane number, where… ▽ More The lane number that the vehicle is traveling in is a key factor in intelligent vehicle fields. Many lane detection algorithms were proposed and if we can perfectly detect the lanes, we can directly calculate the lane number from the lane detection results. However, in fact, lane detection algorithms sometimes underperform. Therefore, we propose a new approach for predicting the lane number, where we combine the drive recorder image with the lane detection results to predict the lane number. Experiments on our own dataset confirmed that our approach delivered outstanding results without significantly increasing computational cost. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: GCCE 2020

arXiv:2011.02172 [pdf, other]

Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

Authors: Naoki Kato, Hiroto Honda, Yusuke Uchida

Abstract: The effectiveness of the approaches to predict 3D poses from 2D poses estimated in each frame of a video has been demonstrated for 3D human pose estimation. However, 2D poses without appearance information of persons have much ambiguity with respect to the joint depths. In this paper, we propose to estimate a 3D pose in each frame of a video and refine it considering temporal information. The prop… ▽ More The effectiveness of the approaches to predict 3D poses from 2D poses estimated in each frame of a video has been demonstrated for 3D human pose estimation. However, 2D poses without appearance information of persons have much ambiguity with respect to the joint depths. In this paper, we propose to estimate a 3D pose in each frame of a video and refine it considering temporal information. The proposed approach reduces the ambiguity of the joint depths and improves the 3D pose estimation accuracy. △ Less

Submitted 4 November, 2020; originally announced November 2020.

arXiv:1902.01206 [pdf, ps, other]

Recycling Solutions for Vertex Coloring Heuristics

Authors: Yasutaka Uchida, Kaito Yajima, Kazuya Haraguchi

Abstract: The vertex coloring problem is a well-known NP-hard problem and has many applications in operations research and in scheduling. A conventional approach to the problem solves the k-colorability problem iteratively, decreasing k one by one. Whether a heuristic algorithm finds a legal k-coloring quickly or not is largely affected by an initial solution. We highlight a simple initial solution generato… ▽ More The vertex coloring problem is a well-known NP-hard problem and has many applications in operations research and in scheduling. A conventional approach to the problem solves the k-colorability problem iteratively, decreasing k one by one. Whether a heuristic algorithm finds a legal k-coloring quickly or not is largely affected by an initial solution. We highlight a simple initial solution generator, which we call the recycle method, which makes use of the legal (k+1)-coloring that has been found. An initial solution generated by the method is expected to guide a heuristic algorithm to find a legal k-coloring more quickly than conventional methods, as demonstrated by experimental studies. The results suggest that the recycle method should be used as the standard initial solution generator for both local search algorithms and modern hybrid methods. △ Less

Submitted 6 April, 2021; v1 submitted 26 January, 2019; originally announced February 2019.

Comments: A preliminary version of the paper is accepted at ISS2019 (International Symposium on Scheduling 2019). The final version appears in the Journal of the Operations Research Society of Japan

arXiv:1811.03331 [pdf, other]

Improving Multi-Person Pose Estimation using Label Correction

Authors: Naoki Kato, Tianqi Li, Kohei Nishino, Yusuke Uchida

Abstract: Significant attention is being paid to multi-person pose estimation methods recently, as there has been rapid progress in the field owing to convolutional neural networks. Especially, recent method which exploits part confidence maps and Part Affinity Fields (PAFs) has achieved accurate real-time prediction of multi-person keypoints. However, human annotated labels are sometimes inappropriate for… ▽ More Significant attention is being paid to multi-person pose estimation methods recently, as there has been rapid progress in the field owing to convolutional neural networks. Especially, recent method which exploits part confidence maps and Part Affinity Fields (PAFs) has achieved accurate real-time prediction of multi-person keypoints. However, human annotated labels are sometimes inappropriate for learning models. For example, if there is a limb that extends outside an image, a keypoint for the limb may not have annotations because it exists outside of the image, and thus the labels for the limb can not be generated. If a model is trained with data including such missing labels, the output of the model for the location, even though it is correct, is penalized as a false positive, which is likely to cause negative effects on the performance of the model. In this paper, we point out the existence of some patterns of inappropriate labels, and propose a novel method for correcting such labels with a teacher model trained on such incomplete data. Experiments on the COCO dataset show that training with the corrected labels improves the performance of the model and also speeds up training. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1809.01890 [pdf, other]

Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Authors: Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto Honda, Yusuke Uchida

Abstract: We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both hi… ▽ More We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: Accepted to ECCV 2018 Workshop: Computer Vision for Fashion, Art and Design. Project page is at https://dena.com/intl/anime-generation

arXiv:1802.02601 [pdf, ps, other]

doi 10.1007/s13735-018-0147-1

Digital Watermarking for Deep Neural Networks

Authors: Yuki Nagai, Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

Abstract: Although deep neural networks have made tremendous progress in the area of multimedia representation, training neural models requires a large amount of data and time. It is well-known that utilizing trained models as initial weights often achieves lower training error than neural networks that are not pre-trained. A fine-tuning step helps to reduce both the computational cost and improve performan… ▽ More Although deep neural networks have made tremendous progress in the area of multimedia representation, training neural models requires a large amount of data and time. It is well-known that utilizing trained models as initial weights often achieves lower training error than neural networks that are not pre-trained. A fine-tuning step helps to reduce both the computational cost and improve performance. Therefore, sharing trained models has been very important for the rapid progress of research and development. In addition, trained models could be important assets for the owner(s) who trained them, hence we regard trained models as intellectual property. In this paper, we propose a digital watermarking technology for ownership authorization of deep neural networks. First, we formulate a new problem: embedding watermarks into deep neural networks. We also define requirements, embedding situations, and attack types on watermarking in deep neural networks. Second, we propose a general framework for embedding a watermark in model parameters, using a parameter regularizer. Our approach does not impair the performance of networks into which a watermark is placed because the watermark is embedded while training the host network. Finally, we perform comprehensive experiments to reveal the potential of watermarking deep neural networks as the basis of this new research effort. We show that our framework can embed a watermark during the training of a deep neural network from scratch, and during fine-tuning and distilling, without impairing its performance. The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark remains complete even after 65% of parameters are pruned. △ Less

Submitted 6 February, 2018; originally announced February 2018.

Comments: This is a pre-print of an article published in International Journal of Multimedia Information Retrieval. The final authenticated version is available online at: https://doi.org/10.1007/s13735-018-0147-1 . arXiv admin note: substantial text overlap with arXiv:1701.04082

arXiv:1701.04082 [pdf, other]

doi 10.1145/3078971.3078974

Embedding Watermarks into Deep Neural Networks

Authors: Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, Shin'ichi Satoh

Abstract: Deep neural networks have recently achieved significant progress. Sharing trained models of these deep neural networks is very important in the rapid progress of researching or develo** deep neural network systems. At the same time, it is necessary to protect the rights of shared trained models. To this end, we propose to use a digital watermarking technology to protect intellectual property or… ▽ More Deep neural networks have recently achieved significant progress. Sharing trained models of these deep neural networks is very important in the rapid progress of researching or develo** deep neural network systems. At the same time, it is necessary to protect the rights of shared trained models. To this end, we propose to use a digital watermarking technology to protect intellectual property or detect intellectual property infringement of trained models. Firstly, we formulate a new problem: embedding watermarks into deep neural networks. We also define requirements, embedding situations, and attack types for watermarking to deep neural networks. Secondly, we propose a general framework to embed a watermark into model parameters using a parameter regularizer. Our approach does not hurt the performance of networks into which a watermark is embedded. Finally, we perform comprehensive experiments to reveal the potential of watermarking to deep neural networks as a basis of this new problem. We show that our framework can embed a watermark in the situations of training a network from scratch, fine-tuning, and distilling without hurting the performance of a deep neural network. The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark completely remains even after removing 65% of parameters were pruned. The implementation of this research is: https://github.com/yu4u/dnn-watermark △ Less

Submitted 20 April, 2017; v1 submitted 15 January, 2017; originally announced January 2017.

Journal ref: ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pages 269-277

arXiv:1610.06266 [pdf, other]

Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Authors: Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

Abstract: In this paper, we propose a stand-alone mobile visual search system based on binary features and the bag-of-visual words framework. The contribution of this study is three-fold: (1) We propose an adaptive substring extraction method that adaptively extracts informative bits from the original binary vector and stores them in the inverted index. These substrings are used to refine visual word-based… ▽ More In this paper, we propose a stand-alone mobile visual search system based on binary features and the bag-of-visual words framework. The contribution of this study is three-fold: (1) We propose an adaptive substring extraction method that adaptively extracts informative bits from the original binary vector and stores them in the inverted index. These substrings are used to refine visual word-based matching. (2) A modified local NBNN scoring method is proposed in the context of image retrieval, which considers the density of binary features in scoring each feature matching. (3) In order to suppress false positives, we introduce a convexity check step that imposes a convexity constraint on the configuration of a transformed reference image. The proposed system improves retrieval accuracy by 11% compared with a conventional method without increasing the database size. Furthermore, our system with the convexity check does not lead to false positive results. △ Less

Submitted 19 October, 2016; originally announced October 2016.

arXiv:1609.08291 [pdf, other]

Image Retrieval with Fisher Vectors of Binary Features

Authors: Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

Abstract: Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary features such as ORB, FREAK, and BRISK. Considering the significant performance improvement for accuracy in both image classification and retrieval by the Fisher vec… ▽ More Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary features such as ORB, FREAK, and BRISK. Considering the significant performance improvement for accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive similar benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features modeled by the Bernoulli mixture model. We also propose accelerating the Fisher vector by using the approximate value of posterior probability. Experiments show that the Fisher vector representation significantly improves the accuracy of image retrieval compared with a bag of binary words approach. △ Less

Submitted 27 September, 2016; originally announced September 2016.

arXiv:1607.08368 [pdf, ps, other]

Local Feature Detectors, Descriptors, and Image Representations: A Survey

Authors: Yusuke Uchida

Abstract: With the advances in both stable interest region detectors and robust and distinctive descriptors, local feature-based image or object retrieval has become a popular research topic. %All of the local feature-based image retrieval system involves two important processes: local feature extraction and image representation. The other key technology for image retrieval systems is image representation s… ▽ More With the advances in both stable interest region detectors and robust and distinctive descriptors, local feature-based image or object retrieval has become a popular research topic. %All of the local feature-based image retrieval system involves two important processes: local feature extraction and image representation. The other key technology for image retrieval systems is image representation such as the bag-of-visual words (BoVW), Fisher vector, or Vector of Locally Aggregated Descriptors (VLAD) framework. In this paper, we review local features and image representations for image retrieval. Because many and many methods are proposed in this area, these methods are grouped into several classes and summarized. In addition, recent deep learning-based approaches for image retrieval are briefly reviewed. △ Less

Submitted 28 July, 2016; originally announced July 2016.

Comments: 20 pages

Showing 1–12 of 12 results for author: Uchida, Y