MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Pathre, Pranjali; Sahu, Anurag; Rao, Ashwin; Prabhu, Avinash; Nigam, Meher Shashwat; Karandikar, Tanvi; Pandya, Harit; Krishna, K. Madhava

Computer Science > Computer Vision and Pattern Recognition

arXiv:2211.16882 (cs)

[Submitted on 30 Nov 2022]

Title:MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Authors:Pranjali Pathre, Anurag Sahu, Ashwin Rao, Avinash Prabhu, Meher Shashwat Nigam, Tanvi Karandikar, Harit Pandya, K. Madhava Krishna

View PDF

Abstract:In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2211.16882 [cs.CV]
	(or arXiv:2211.16882v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2211.16882
Journal reference:	IEEE International Conference on Robotics and Biomimetics (ROBIO) 2022

Submission history

From: Ashwin Rao [view email]
[v1] Wed, 30 Nov 2022 10:32:04 UTC (16,720 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators