Skip to main content

Showing 1–33 of 33 results for author: Berg, A C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13804  [pdf, other

    cs.CV cs.CL cs.LG

    Learning from Models and Data for Visual Grounding

    Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez

    Abstract: We introduce SynGround, a novel framework that combines data-driven learning and knowledge transfer from various large-scale pretrained models to enhance the visual grounding capabilities of a pretrained vision-and-language model. The knowledge transfer from the models initiates the generation of image descriptions through an image description generator. These descriptions serve dual purposes: the… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Project Page: https://catherine-r-he.github.io/SynGround/

  2. arXiv:2312.04554  [pdf, other

    cs.CV cs.CL cs.LG

    Improved Visual Grounding through Self-Consistent Explanations

    Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez

    Abstract: Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with par… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project Page: https://catherine-r-he.github.io/SelfEQ/

  3. arXiv:2311.00134  [pdf, other

    cs.CV

    Joint Depth Prediction and Semantic Segmentation with Multi-View SAM

    Authors: Mykhailo Shvets, Dongxu Zhao, Marc Niethammer, Roni Sengupta, Alexander C. Berg

    Abstract: Multi-task approaches to joint depth and segmentation prediction are well-studied for monocular images. Yet, predictions from a single-view are inherently limited, while multiple views are available in many robotics applications. On the other end of the spectrum, video-based and full 3D methods require numerous frames to perform reconstruction and segmentation. With this work we propose a Multi-Vi… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: To appear in the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision

  4. arXiv:2304.02643  [pdf, other

    cs.CV cs.AI cs.LG

    Segment Anything

    Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

    Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project web-page: https://segment-anything.com

  5. arXiv:2202.04639  [pdf, other

    cs.CV

    Point-Level Region Contrast for Object Detection Pre-Training

    Authors: Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg

    Abstract: In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. This approach is motivated by the two key factors in detection: localization and recognition. While accurate localization favors models that operate at the pixel- or point-level, correct recognition typically relies on a more holistic, region-level view of objects. Incorpo… ▽ More

    Submitted 18 April, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: CVPR 2022 (Oral)

  6. arXiv:2112.02185  [pdf, other

    cs.LG

    Neural Pseudo-Label Optimism for the Bank Loan Problem

    Authors: Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster

    Abstract: We study a class of classification problems best exemplified by the \emph{bank loan} problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to `… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: 10 pages main, 14 pages appendix

  7. arXiv:2103.16562  [pdf, other

    cs.CV

    Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

    Authors: Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

    Abstract: We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects. The new quality me… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR 2021, project page: https://bowenc0221.github.io/boundary-iou

  8. arXiv:2012.09854  [pdf, other

    cs.CV cs.AI cs.GR cs.LG stat.ML

    Worldsheet: Wrap** the World in a 3D Sheet for View Synthesis from a Single Image

    Authors: Ronghang Hu, Nikhila Ravi, Alexander C. Berg, Deepak Pathak

    Abstract: We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. The main insight is that simply shrink-wrap** a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient to generate photorealistic unseen views with large viewpoint changes. To operationalize this, we propose a novel differentiab… ▽ More

    Submitted 18 August, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: ICCV 2021; 17 pages

  9. arXiv:2007.00077  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Similarity Search for Efficient Active Learning and Search of Rare Concepts

    Authors: Cody Coleman, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter Bailis, Alexander C. Berg, Robert Nowak, Roshan Sumbaly, Matei Zaharia, I. Zeki Yalniz

    Abstract: Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for la… ▽ More

    Submitted 22 July, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

  10. arXiv:1908.03621  [pdf, other

    cs.CV

    A Mask-RCNN Baseline for Probabilistic Object Detection

    Authors: Phil Ammirato, Alexander C. Berg

    Abstract: The Probabilistic Object Detection Challenge evaluates object detection methods using a new evaluation measure, Probability-based Detection Quality (PDQ), on a new synthetic image dataset. We present our submission to the challenge, a fine-tuned version of Mask-RCNN with some additional post-processing. Our method, submitted under username pammirato, is currently second on the leaderboard with a s… ▽ More

    Submitted 14 October, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

    Comments: 2nd place in 1st PODC at CVPR 2019

  11. arXiv:1906.06597  [pdf, other

    cs.CV

    IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things

    Authors: Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg

    Abstract: In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted Instance Segmentation as a new feature for semantic segmentation. It also supports back propagation so is trainable end-to-end. Our experiments show the effectiveness of IMP on both Clothing Parsing (with complex layering, large deformations, and non-convex objects), and on Street Scene Segme… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

  12. arXiv:1904.07714  [pdf, other

    cs.CV cs.AI cs.PF

    Low-Power Computer Vision: Status, Challenges, Opportunities

    Authors: Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee , et al. (19 additional authors not shown)

    Abstract: Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batte… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Preprint, Accepted by IEEE Journal on Emerging and Selected Topics in Circuits and Systems. arXiv admin note: substantial text overlap with arXiv:1810.01732

  13. arXiv:1903.06791  [pdf, other

    cs.CV

    Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

    Authors: Chen Feng, Tao Sheng, Zhiyu Liang, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Matthew Ardi, Alexander C. Berg, Yiran Chen, Bo Chen, Kent Gauen, Yung-Hsiang Lu

    Abstract: The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015 that encourages joint hardware and software solutions for computer vision systems with low latency and power. Track 1 of the competition in 2018 focused on the innovation of software solutions with fixed inference engine and hardware. This decision allows participants to submit models online and not wor… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

    Comments: Accepted At The 2nd Workshop on Machine Learning on the Phone and other Consumer Devices (MLPCD 2)

  14. arXiv:1901.03353  [pdf, other

    cs.CV

    RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

    Authors: Cheng-Yang Fu, Mykhailo Shvets, Alexander C. Berg

    Abstract: Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: in… ▽ More

    Submitted 10 January, 2019; originally announced January 2019.

  15. arXiv:1810.01732  [pdf

    cs.CV

    2018 Low-Power Image Recognition Challenge

    Authors: Sergei Alyamkin, Matthew Ardi, Achille Brighton, Alexander C. Berg, Yiran Chen, Hsin-Pai Cheng, Bo Chen, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Jongkook Go, Alexander Goncharenko, Xuyang Guo, Hong Hanh Nguyen, Andrew Howard, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Alexander Kondratyev, Seungjae Lee, Suwoong Lee, Junhyeok Lee, Zhiyu Liang, Xin Liu , et al. (16 additional authors not shown)

    Abstract: The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times.… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

    Comments: 13 pages, workshop in 2018 CVPR, competition, low-power, image recognition

  16. arXiv:1803.04610  [pdf, other

    cs.CV

    Target Driven Instance Detection

    Authors: Phil Ammirato, Cheng-Yang Fu, Mykhailo Shvets, Jana Kosecka, Alexander C. Berg

    Abstract: While state-of-the-art general object detectors are getting better and better, there are not many systems specifically designed to take advantage of the instance detection problem. For many applications, such as household robotics, a system may need to recognize a few very specific instances at a time. Speed can be critical in these applications, as can the need to recognize previously unseen inst… ▽ More

    Submitted 1 October, 2019; v1 submitted 12 March, 2018; originally announced March 2018.

  17. arXiv:1801.03049  [pdf, other

    cs.CV cs.LG

    Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers

    Authors: Eunbyung Park, Alexander C. Berg

    Abstract: This paper improves state-of-the-art visual object trackers that use online adaptation. Our core contribution is an offline meta-learning-based method to adjust the initial deep networks used in online adaptation-based tracking. The meta learning is driven by the goal of deep networks that can quickly be adapted to robustly model a particular target in future frames. Ideally the resulting models f… ▽ More

    Submitted 19 March, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

    Comments: Code: https://github.com/silverbottlep/meta_trackers

  18. arXiv:1707.08559  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Video Highlight Prediction Using Audience Chat Reactions

    Authors: Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg

    Abstract: Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis. We present methods addressing the problem of automatic video highlight prediction based on joint visual features and textual analysis of the real-world audience discourse with complex slang, in both English and traditional Chinese. We present a novel dataset based on League of Legends champions… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: EMNLP 2017

  19. arXiv:1703.02921  [pdf, other

    cs.CV

    Transformation-Grounded Image Generation Network for Novel 3D View Synthesis

    Authors: Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, Alexander C. Berg

    Abstract: We present a transformation-grounded image generation network for novel 3D view synthesis from a single image. Instead of taking a 'blank slate' approach, we first explicitly infer the parts of the geometry visible both in the input and novel views and then re-cast the remaining synthesis problem as image completion. Specifically, we both predict a flow to move the pixels from the input to the nov… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

    Comments: To appear in CVPR 2017

  20. arXiv:1702.08272  [pdf, other

    cs.CV

    A Dataset for Develo** and Benchmarking Active Vision

    Authors: Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Kosecka, Alexander C. Berg

    Abstract: We present a new public dataset with a focus on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes. We train a fast object category detector for instance detection on our data. Using the dataset we show that, although increasingly accurate… ▽ More

    Submitted 3 March, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: To appear at ICRA 2017

  21. arXiv:1702.07836  [pdf, other

    cs.CV cs.RO

    Synthesizing Training Data for Object Detection in Indoor Scenes

    Authors: Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, Jana Kosecka

    Abstract: Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training dat… ▽ More

    Submitted 7 September, 2017; v1 submitted 25 February, 2017; originally announced February 2017.

    Comments: Added more experiments and link to project webpage

  22. arXiv:1701.06659  [pdf, other

    cs.CV

    DSSD : Deconvolutional Single Shot Detector

    Authors: Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg

    Abstract: The main contribution of this paper is an approach for introducing additional context into state-of-the-art general object detection. To achieve this we first combine a state-of-the-art classifier (Residual-101[14]) with a fast detection framework (SSD[18]). We then augment SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accura… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

  23. arXiv:1611.00393  [pdf, other

    cs.CV

    Combining Multiple Cues for Visual Madlibs Question Answering

    Authors: Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg

    Abstract: This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach employs a combination of networks trained for specialized tasks such as scene recognition, person activity classification, and attribute prediction. We also present a… ▽ More

    Submitted 7 February, 2018; v1 submitted 1 November, 2016; originally announced November 2016.

    Comments: submitted to IJCV -- under review

  24. arXiv:1609.05590  [pdf, other

    cs.CV

    Fast Single Shot Detection and Pose Estimation

    Authors: Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, Jana Kosecka, Alexander C. Berg

    Abstract: For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for slidingwindow detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stag… ▽ More

    Submitted 18 September, 2016; originally announced September 2016.

  25. arXiv:1608.03914  [pdf, other

    cs.CV

    When was that made?

    Authors: Sirion Vittayakorn, Alexander C. Berg, Tamara L. Berg

    Abstract: In this paper, we explore deep learning methods for estimating when objects were made. Automatic methods for this task could potentially be useful for historians, collectors, or any individual interested in estimating when their artifact was created. Direct applications include large-scale data organization or retrieval. Toward this goal, we utilize features from existing deep networks and also fi… ▽ More

    Submitted 12 August, 2016; originally announced August 2016.

  26. arXiv:1608.03410  [pdf, other

    cs.CV

    Solving Visual Madlibs with Multiple Cues

    Authors: Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg

    Abstract: This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene cl… ▽ More

    Submitted 11 August, 2016; originally announced August 2016.

    Comments: accepted at BMVC 2016

  27. arXiv:1608.00272  [pdf, other

    cs.CV cs.CL

    Modeling Context in Referring Expressions

    Authors: Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg

    Abstract: Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performan… ▽ More

    Submitted 10 August, 2016; v1 submitted 31 July, 2016; originally announced August 2016.

    Comments: 19 pages, 6 figures, in ECCV 2016; authors, references and acknowledgement updated

  28. SSD: Single Shot MultiBox Detector

    Authors: Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

    Abstract: We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box… ▽ More

    Submitted 29 December, 2016; v1 submitted 7 December, 2015; originally announced December 2015.

    Comments: ECCV 2016

  29. arXiv:1511.06449  [pdf, other

    cs.CV cs.LG

    Learning to decompose for object detection and instance segmentation

    Authors: Eunbyung Park, Alexander C. Berg

    Abstract: Although deep convolutional neural networks(CNNs) have achieved remarkable results on object detection and segmentation, pre- and post-processing steps such as region proposals and non-maximum suppression(NMS), have been required. These steps result in high computational complexity and sensitivity to hyperparameters, e.g. thresholds for NMS. In this work, we propose a novel end-to-end trainable de… ▽ More

    Submitted 10 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR 2016 Workshop

  30. arXiv:1511.03650   

    cs.CV

    Piecewise Linear Activation Functions For More Efficient Deep Networks

    Authors: Cheng-Yang Fu, Alexander C. Berg

    Abstract: This submission has been withdrawn by arXiv administrators because it is intentionally incomplete, which is in violation of our policies.

    Submitted 7 December, 2015; v1 submitted 11 November, 2015; originally announced November 2015.

    Comments: Withdrawn by arXiv admins

  31. arXiv:1506.04579  [pdf, other

    cs.CV

    ParseNet: Looking Wider to See Better

    Authors: Wei Liu, Andrew Rabinovich, Alexander C. Berg

    Abstract: We present a technique for adding global context to deep convolutional networks for semantic segmentation. The approach is simple, using the average feature for a layer to augment the features at each location. In addition, we study several idiosyncrasies of training, significantly increasing the performance of baseline networks (e.g. from FCN). When we add our proposed global feature, and a techn… ▽ More

    Submitted 19 November, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

    Comments: ICLR 2016 submission

  32. arXiv:1506.00278  [pdf, other

    cs.CV cs.CL

    Visual Madlibs: Fill in the blank Image Generation and Question Answering

    Authors: Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg

    Abstract: In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or i… ▽ More

    Submitted 31 May, 2015; originally announced June 2015.

    Comments: 10 pages; 8 figures; 4 tables

  33. arXiv:1409.0575  [pdf, other

    cs.CV

    ImageNet Large Scale Visual Recognition Challenge

    Authors: Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei

    Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that ha… ▽ More

    Submitted 29 January, 2015; v1 submitted 1 September, 2014; originally announced September 2014.

    Comments: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional references

    ACM Class: I.4.8; I.5.2