-
Natural Language to Code Using Transformers
Authors:
Uday Kusupati,
Venkata Ravi Teja Ailavarapu
Abstract:
We tackle the problem of generating code snippets from natural language descriptions using the CoNaLa dataset. We use the self-attention based transformer architecture and show that it performs better than recurrent attention-based encoder decoder. Furthermore, we develop a modified form of back translation and use cycle consistent losses to train the model in an end-to-end fashion. We achieve a B…
▽ More
We tackle the problem of generating code snippets from natural language descriptions using the CoNaLa dataset. We use the self-attention based transformer architecture and show that it performs better than recurrent attention-based encoder decoder. Furthermore, we develop a modified form of back translation and use cycle consistent losses to train the model in an end-to-end fashion. We achieve a BLEU score of 16.99 beating the previously reported baseline of the CoNaLa challenge.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Normal Assisted Stereo Depth Estimation
Authors:
Uday Kusupati,
Shuo Cheng,
Rui Chen,
Hao Su
Abstract:
Accurate stereo depth estimation plays a critical role in various 3D tasks in both indoor and outdoor environments. Recently, learning-based multi-view stereo methods have demonstrated competitive performance with a limited number of views. However, in challenging scenarios, especially when building cross-view correspondences is hard, these methods still cannot produce satisfying results. In this…
▽ More
Accurate stereo depth estimation plays a critical role in various 3D tasks in both indoor and outdoor environments. Recently, learning-based multi-view stereo methods have demonstrated competitive performance with a limited number of views. However, in challenging scenarios, especially when building cross-view correspondences is hard, these methods still cannot produce satisfying results. In this paper, we study how to leverage a normal estimation model and the predicted normal maps to improve the depth quality. We couple the learning of a multi-view normal estimation module and a multi-view depth estimation module. In addition, we propose a novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs. We find that the joint learning can improve both the prediction of normal and depth, and the accuracy & smoothness can be further improved by enforcing the consistency. Experiments on MVS, SUN3D, RGBD, and Scenes11 demonstrate the effectiveness of our method and state-of-the-art performance.
△ Less
Submitted 31 May, 2020; v1 submitted 23 November, 2019;
originally announced November 2019.
-
Learning 3D Human Pose from Structure and Motion
Authors:
Rishabh Dabral,
Anurag Mundhada,
Uday Kusupati,
Safeer Afaque,
Abhishek Sharma,
Arjun Jain
Abstract:
3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data. We propose two anatomically inspired loss functions and use them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal network that exploits temporal an…
▽ More
3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data. We propose two anatomically inspired loss functions and use them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations. We carefully analyze the proposed contributions through loss surface visualizations and sensitivity analysis to facilitate deeper understanding of their working mechanism. Our complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics card.
△ Less
Submitted 3 July, 2018; v1 submitted 25 November, 2017;
originally announced November 2017.