CoKe: Localized Contrastive Learning for Robust Keypoint Detection

Bai, Yutong; Wang, Angtian; Kortylewski, Adam; Yuille, Alan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2009.14115v3 (cs)

[Submitted on 29 Sep 2020 (v1), revised 23 Nov 2020 (this version, v3), latest version 5 Dec 2022 (v4)]

Title:CoKe: Localized Contrastive Learning for Robust Keypoint Detection

Authors:Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille

View PDF

Abstract:Today's most popular approaches to keypoint detection involve very complex network architectures that aim to learn holistic representations of all keypoints. In this work, we take a step back and ask: Can we simply learn a local keypoint representation from the output of a standard backbone architecture? This will help make the network simpler and more robust, particularly if large parts of the object are occluded. We demonstrate that this is possible by looking at the problem from the perspective of representation learning. Specifically, the keypoint kernels need to be chosen to optimize three types of distances in the feature space: Features of the same keypoint should be similar to each other, while differing from those of other keypoints, and also being distinct from features from the background clutter. We formulate this optimization process within a framework, which we call CoKe, which includes supervised contrastive learning. CoKe needs to make several approximations to enable representation learning process on large datasets. In particular, we introduce a clutter bank to approximate non-keypoint features, and a momentum update to compute the keypoint representation while training the feature extractor. Our experiments show that CoKe achieves state-of-the-art results compared to approaches that jointly represent all keypoints holistically (Stacked Hourglass Networks, MSS-Net) as well as to approaches that are supervised by detailed 3D object geometry (StarMap). Moreover, CoKe is robust and performs exceptionally well when objects are partially occluded and significantly outperforms related work on a range of diverse datasets (PASCAL3D+, MPII, ObjectNet3D).

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2009.14115 [cs.CV]
	(or arXiv:2009.14115v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2009.14115

Submission history

From: Yutong Bai [view email]
[v1] Tue, 29 Sep 2020 16:00:43 UTC (5,413 KB)
[v2] Wed, 30 Sep 2020 01:32:46 UTC (5,414 KB)
[v3] Mon, 23 Nov 2020 16:22:35 UTC (7,991 KB)
[v4] Mon, 5 Dec 2022 08:56:16 UTC (3,567 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CoKe: Localized Contrastive Learning for Robust Keypoint Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CoKe: Localized Contrastive Learning for Robust Keypoint Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators