Skip to main content

Showing 1–27 of 27 results for author: Morariu, V I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08354  [pdf, other

    cs.CV cs.AI cs.LG

    DocSynthv2: A Practical Autoregressive Modeling for Document Generation

    Authors: Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

    Abstract: While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both la… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Spotlight (Oral) Acceptance to CVPR 2024 Workshop for Graphic Design Understanding and Generation (GDUG)

  2. arXiv:2403.08049  [pdf, other

    cs.HC cs.AI cs.LG

    TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks

    Authors: Yuexi Chen, Vlad I. Morariu, Anh Truong, Zhicheng Liu

    Abstract: Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach procedural skills, offer more browsable alternatives than timeline-based videos. However, manually creating such tutorials is tedious, and existing automated solutions are often restricted to a particular domain. While AI models hold promise, it is unclear how to effectively harness their powers, given the multi-mod… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: CHI 2024, supplementary materials: https://hdi.cs.umd.edu/papers/TutoAI_CHI24_Supp.pdf

  3. arXiv:2310.16790  [pdf, other

    cs.CL cs.AI cs.LG

    Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

    Authors: Zhendong Chu, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova

    Abstract: To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate. In contrast, real-world applications often resort to massive low-quality labeled data through non-expert annotators via crowdsourcing and external knowledge bases via distant supervision as a cost-effective alternat… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 14 pages

  4. arXiv:2211.14958  [pdf, other

    cs.CV

    MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

    Authors: Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, **gbo Shang, Vlad I. Morariu

    Abstract: Document images are a ubiquitous source of data where the text is organized in a complex hierarchical structure ranging from fine granularity (e.g., words), medium granularity (e.g., regions such as paragraphs or figures), to coarse granularity (e.g., the whole page). The spatial hierarchical relationships between content at different levels of granularity are crucial for document image understand… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  5. arXiv:2204.10939  [pdf, other

    cs.CL cs.CV

    Unified Pretraining Framework for Document Understanding

    Authors: Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun

    Abstract: Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with self-supervised objectives. However, most of the existing document pretraining methods are still langua… ▽ More

    Submitted 28 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures, NeurIPS 2021 (Updated Camera Ready)

  6. arXiv:2106.03331  [pdf, other

    cs.CV cs.CL

    SelfDoc: Self-Supervised Document Representation Learning

    Authors: Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu

    Abstract: We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of every semantically meaningful component in a document, and it models the contextualization between each block of content. Unlike existing document pre-training… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: To appear in CVPR'2021

  7. arXiv:2104.08689  [pdf, other

    cs.CV

    RPCL: A Framework for Improving Cross-Domain Detection with Auxiliary Tasks

    Authors: Kai Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Yun Fu

    Abstract: Cross-Domain Detection (XDD) aims to train an object detector using labeled image from a source domain but have good performance in the target domain with only unlabeled images. Existing approaches achieve this either by aligning the feature maps or the region proposals from the two domains, or by transferring the style of source images to that of target image. Contrasted with prior work, this pap… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 10 pages, 5 figures

  8. arXiv:2006.03204  [pdf, other

    cs.CV cs.AI cs.LG

    Black-box Explanation of Object Detectors via Saliency Maps

    Authors: Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, Kate Saenko

    Abstract: We propose D-RISE, a method for generating visual explanations for the predictions of object detectors. Utilizing the proposed similarity metric that accounts for both localization and categorization aspects of object detection allows our method to produce saliency maps that show image areas that most affect the prediction. D-RISE can be considered "black-box" in the software testing sense, as it… ▽ More

    Submitted 10 June, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: CVPR 2021 (oral). Project page https://cs-people.bu.edu/vpetsiuk/drise/

  9. arXiv:2003.13197  [pdf, other

    cs.CV

    Cross-Domain Document Object Detection: Benchmark Suite and Method

    Authors: Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu

    Abstract: Decomposing images of document pages into high-level semantic regions (e.g., figures, tables, paragraphs), document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding. DOD remains a challenging problem as document objects vary significantly in layout, size, aspect ratio, texture, etc. An additional challenge arises in practice because lar… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: To appear in CVPR 2020

  10. arXiv:1904.03885  [pdf, other

    cs.CV cs.CL cs.LG

    Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

    Authors: Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad I. Morariu, Larry S. Davis

    Abstract: This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of groun… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

  11. arXiv:1805.08688  [pdf, other

    cs.CV cs.LG

    Fused Deep Neural Networks for Efficient Pedestrian Detection

    Authors: Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis

    Abstract: In this paper, we present an efficient pedestrian detection system, designed by fusion of multiple deep neural network (DNN) systems. Pedestrian candidates are first generated by a single shot convolutional multi-box detector at different locations with various scales and aspect ratios. The candidate generator is designed to provide the majority of ground truth pedestrian annotations at the cost o… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: 11 pages

  12. arXiv:1805.04953  [pdf, other

    cs.CV

    Learning Rich Features for Image Manipulation Detection

    Authors: Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis

    Abstract: Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned. We propose a two-stream Faster R-CNN network and train it endto- end to detect the tampered regions given a manipulated image. One of the two streams is an RGB stream whose purpose i… ▽ More

    Submitted 13 May, 2018; originally announced May 2018.

    Comments: CVPR 2018 Camera Ready

  13. arXiv:1804.01429  [pdf, other

    cs.CV

    Layout-induced Video Representation for Recognizing Agent-in-Place Actions

    Authors: Ruichi Yu, Hongcheng Wang, Ang Li, **gxiao Zheng, Vlad I. Morariu, Larry S. Davis

    Abstract: We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Repres… ▽ More

    Submitted 1 April, 2019; v1 submitted 4 April, 2018; originally announced April 2018.

  14. arXiv:1803.11276  [pdf, other

    cs.CV

    Two-Stream Neural Networks for Tampered Face Detection

    Authors: Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis

    Abstract: We propose a two-stream network for face tampering detection. We train GoogLeNet to detect tampering artifacts in a face classification stream, and train a patch based triplet network to leverage features capturing local noise residuals and camera characteristics as a second stream. In addition, we use two different online face swap** applications to create a new dataset that consists of 2010 ta… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Journal ref: 2017 CVPR workshop

  15. arXiv:1711.05908  [pdf, other

    cs.CV

    NISP: Pruning Networks using Neuron Importance Score Propagation

    Authors: Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis

    Abstract: To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., prune one layer to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contrast, we argue that it is essential to prune neurons in… ▽ More

    Submitted 21 March, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

  16. arXiv:1711.05282  [pdf, other

    cs.CV

    C-WSL: Count-guided Weakly Supervised Localization

    Authors: Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis

    Abstract: We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL). C-WSL uses a simple count-based region selection algorithm to select high-quality regions, each of which covers a single object instance during training, and improves existing WSL methods by training with the se… ▽ More

    Submitted 25 July, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: ECCV2018

  17. arXiv:1711.05187  [pdf, other

    cs.CV

    Dynamic Zoom-in Network for Fast Object Detection in Large Images

    Authors: Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

    Abstract: We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Buil… ▽ More

    Submitted 27 March, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: CVPR2018

  18. arXiv:1707.09423  [pdf, other

    cs.CV

    Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

    Authors: Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

    Abstract: Understanding visual relationships involves identifying the subject, the object, and a predicate relating them. We leverage the strong correlations between the predicate and the (subj,obj) pair (both semantically and spatially) to predict the predicates conditioned on the subjects and the objects. Modeling the three entities jointly more accurately reflects their relationships, but complicates lea… ▽ More

    Submitted 2 August, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

    Comments: ICCV 2017

  19. arXiv:1612.03268  [pdf, other

    cs.CV cs.LG cs.NE

    Generalized Deep Image to Image Regression

    Authors: Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis

    Abstract: We present a Deep Convolutional Neural Network architecture which serves as a generic image-to-image regressor that can be trained end-to-end without any further machinery. Our proposed architecture: the Recursively Branched Deconvolutional Network (RBDN) develops a cheap multi-context image representation very early on using an efficient recursive branching scheme with extensive parameter sharing… ▽ More

    Submitted 10 December, 2016; originally announced December 2016.

    Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon

  20. arXiv:1611.09932  [pdf, other

    cs.CV

    Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition

    Authors: Yaming Wang, Vlad I. Morariu, Larry S. Davis

    Abstract: Compared to earlier multistage frameworks using CNN features, recent end-to-end deep approaches for fine-grained recognition essentially enhance the mid-level learning capability of CNNs. Previous approaches achieve this by introducing an auxiliary network to infuse localization information into the main classification network, or a sophisticated feature encoding method to capture higher order fea… ▽ More

    Submitted 11 June, 2018; v1 submitted 29 November, 2016; originally announced November 2016.

  21. arXiv:1611.09392  [pdf, other

    cs.CV cs.CG cs.IR

    Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

    Authors: Ang Li, ** Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis

    Abstract: Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users' text descriptions. However, dire… ▽ More

    Submitted 11 April, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: CVPR 2017

  22. arXiv:1609.02948  [pdf, other

    cs.CV

    The Role of Context Selection in Object Detection

    Authors: Ruichi Yu, Xi Chen, Vlad I. Morariu, Larry S. Davis

    Abstract: We investigate the reasons why context in object detection has limited utility by isolating and evaluating the predictive power of different context cues under ideal conditions in which context provided by an oracle. Based on this study, we propose a region-based context re-scoring method with dynamic context selection to remove noise and emphasize informative context. We introduce latent indicato… ▽ More

    Submitted 9 September, 2016; originally announced September 2016.

  23. arXiv:1608.00525  [pdf, other

    cs.CV

    Modeling Context Between Objects for Referring Expression Understanding

    Authors: Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis

    Abstract: Referring expressions usually describe an object using properties of the object and relationships of the object with other objects. We propose a technique that integrates context between objects to understand referring expressions. Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region. The context regions are discovered… ▽ More

    Submitted 1 August, 2016; originally announced August 2016.

    Comments: To appear at ECCV 16

  24. arXiv:1605.01130  [pdf, other

    cs.CV

    Mining Discriminative Triplets of Patches for Fine-Grained Classification

    Authors: Yaming Wang, Jonghyun Choi, Vlad I. Morariu, Larry S. Davis

    Abstract: Fine-grained classification involves distinguishing between similar sub-categories based on subtle differences in highly localized regions; therefore, accurate localization of discriminative regions remains a major challenge. We describe a patch-based framework to address this problem. We introduce triplets of patches with geometric constraints to improve the accuracy of patch localization, and au… ▽ More

    Submitted 3 May, 2016; originally announced May 2016.

  25. arXiv:1512.03384  [pdf, other

    cs.CV

    VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products

    Authors: Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis

    Abstract: VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known. The retrieved web images representing the query and each database video are treated as unordered collections of images, and each collection is represented using a single Fisher Vector built on CNN features. Our experiments show that a… ▽ More

    Submitted 10 April, 2017; v1 submitted 10 December, 2015; originally announced December 2015.

  26. arXiv:1511.07710  [pdf, other

    cs.CV cs.AI

    Searching for Objects using Structure in Indoor Scenes

    Authors: Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis

    Abstract: To identify the location of objects of a particular class, a passive computer vision system generally processes all the regions in an image to finally output few regions. However, we can use structure in the scene to search for objects without processing the entire image. We propose a search technique that sequentially processes image regions such that the regions that are more likely to correspon… ▽ More

    Submitted 24 November, 2015; originally announced November 2015.

    Comments: Appeared in British Machine Vision Conference (BMVC) 2015

  27. arXiv:1509.07845  [pdf, other

    cs.CV cs.CL cs.IR

    Selecting Relevant Web Trained Concepts for Automated Event Retrieval

    Authors: Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis

    Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. Howeve… ▽ More

    Submitted 25 September, 2015; originally announced September 2015.