Showing 1–2 of 2 results for author: Poland, D

Search v0.5.6 released 2020-02-24

arXiv:2402.01126 [pdf]

cs.CV

Seeing Objects in a Cluttered World: Computational Objectness from Motion in Video

Authors: Douglas Poland, Amar Saini

Abstract: Perception of the visually disjoint surfaces of our cluttered world as whole objects, physically distinct from those overlap** them, is a cognitive phenomenon called objectness that forms the basis of our visual perception. Shared by all vertebrates and present at birth in humans, it enables object-centric representation and reasoning about the visual world. We present a computational approach t… ▽ More Perception of the visually disjoint surfaces of our cluttered world as whole objects, physically distinct from those overlap** them, is a cognitive phenomenon called objectness that forms the basis of our visual perception. Shared by all vertebrates and present at birth in humans, it enables object-centric representation and reasoning about the visual world. We present a computational approach to objectness that leverages motion cues and spatio-temporal attention using a pair of supervised spatio-temporal R(2+1)U-Nets. The first network detects motion boundaries and classifies the pixels at those boundaries in terms of their local foreground-background sense. This motion boundary sense (MBS) information is passed, along with a spatio-temporal object attention cue, to an attentional surface perception (ASP) module which infers the form of the attended object over a sequence of frames and classifies its 'pixels' as visible or obscured. The spatial form of the attention cue is flexible, but it must loosely track the attended object which need not be visible. We demonstrate the ability of this simple but novel approach to infer objectness from phenomenology without object models, and show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake. We show that our data diversity and augmentation minimizes bias and facilitates transfer to real video. Finally, we describe how this computational objectness capability can grow in sophistication and anchor a robust modular video object perception framework. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 10 pages, 11 figures, plus 18 pages of Supplemental Information

Report number: LLNL-JRNL-859920 ACM Class: I.4
arXiv:1503.01817 [pdf, other]

cs.MM cs.CY

doi 10.1145/2812802

YFCC100M: The New Data in Multimedia Research

Authors: Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, Li-Jia Li

Abstract: We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of met… ▽ More We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset. △ Less

Submitted 25 April, 2016; v1 submitted 5 March, 2015; originally announced March 2015.

ACM Class: H.3.7

Journal ref: Communications of the ACM, 59(2), pp. 64-73, 2016

Search v0.5.6 released 2020-02-24