Skip to main content

Showing 1–3 of 3 results for author: Kota, B U

.
  1. arXiv:2106.11539  [pdf, other

    cs.CV

    DocFormer: End-to-End Transformer for Document Understanding

    Authors: Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha

    Abstract: We present DocFormer -- a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which encourage multi-modal interaction. DocFormer uses te… ▽ More

    Submitted 20 September, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted to ICCV 2021 main conference

  2. arXiv:1603.01431  [pdf, other

    stat.ML cs.LG

    Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

    Authors: Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju

    Abstract: While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks. Specifically, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate… ▽ More

    Submitted 12 July, 2016; v1 submitted 4 March, 2016; originally announced March 2016.

    Comments: 11 pages, ICML 2016, appendix added to the last version

  3. arXiv:1512.01691  [pdf, other

    cs.CV

    Maximum Entropy Binary Encoding for Face Template Protection

    Authors: Rohit Kumar Pandey, Yingbo Zhou, Bhargava Urala Kota, Venu Govindaraju

    Abstract: In this paper we present a framework for secure identification using deep neural networks, and apply it to the task of template protection for face authentication. We use deep convolutional neural networks (CNNs) to learn a map** from face images to maximum entropy binary (MEB) codes. The map** is robust enough to tackle the problem of exact matching, yielding the same code for new samples of… ▽ More

    Submitted 5 December, 2015; originally announced December 2015.

    Comments: arXiv admin note: text overlap with arXiv:1506.04340