Skip to main content

Showing 1–50 of 54 results for author: Fowlkes, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16972  [pdf, other

    cs.CV

    CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching

    Authors: Samia Shafique, Shu Kong, Charless Fowlkes

    Abstract: Shoeprints are a common type of evidence found at crime scenes and are used regularly in forensic investigations. However, existing methods cannot effectively employ deep learning techniques to match noisy and occluded crime-scene shoeprints to a shoe database due to a lack of training data. Moreover, all existing methods match crime-scene shoeprints to clean reference prints, yet our analysis sho… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  2. arXiv:2312.04117  [pdf, other

    cs.CV

    Instance Tracking in 3D Scenes from Egocentric Videos

    Authors: Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes

    Abstract: Egocentric sensors such as AR/VR devices capture human-object interactions and offer the potential to provide task-assistance by recalling 3D locations of objects of interest in the surrounding environment. This capability requires instance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We explore this problem by first introducing a new benchmark dataset, consisting of RGB and… ▽ More

    Submitted 6 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024. Also presented at First Joint Egocentric Vision (EgoVis) Workshop @ CVPR 2024

  3. arXiv:2303.04105  [pdf, other

    cs.LG cs.CV

    Your representations are in the network: composable and parallel adaptation for large scale models

    Authors: Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca Zancato, Benjamin Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieve… ▽ More

    Submitted 31 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  4. arXiv:2205.02361  [pdf, other

    cs.CV

    Creating a Forensic Database of Shoeprints from Online Shoe Tread Photos

    Authors: Samia Shafique, Bailey Kong, Shu Kong, Charless C. Fowlkes

    Abstract: Shoe tread impressions are one of the most common types of evidence left at crime scenes. However, the utility of such evidence is limited by the lack of databases of footwear prints that cover the large and growing number of distinct shoe models. Moreover, the database is preferred to contain the 3D shape, or depth, of shoe-tread photos so as to allow for extracting shoeprints to match a query (c… ▽ More

    Submitted 20 October, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: published in WACV 2023; 8 pages including 11 figures and 3 tables; contains reference and appendix

  5. arXiv:2205.00508  [pdf, other

    cs.CV

    The Best of Both Worlds: Combining Model-based and Nonparametric Approaches for 3D Human Body Estimation

    Authors: Zhe Wang, Jimei Yang, Charless Fowlkes

    Abstract: Nonparametric based methods have recently shown promising results in reconstructing human bodies from monocular images while model-based methods can help correct these estimates and improve prediction. However, estimating model parameters from global image features may lead to noticeable misalignment between the estimated meshes and image evidence. To address this issue and leverage the best of bo… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Journal ref: CVPR ABAW 2022

  6. arXiv:2203.16708  [pdf, other

    cs.LG cs.CV

    Task Adaptive Parameter Sharing for Multi-Task Learning

    Authors: Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto

    Abstract: Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new ta… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 Camera Ready. 15 pages, 11 figures

  7. arXiv:2201.08131  [pdf, other

    cs.CV

    GeoFill: Reference-Based Image Inpainting with Better Geometric Understanding

    Authors: Yunhan Zhao, Connelly Barnes, Yuqian Zhou, Eli Shechtman, Sohrab Amirghodsi, Charless Fowlkes

    Abstract: Reference-guided image inpainting restores image pixels by leveraging the content from another single reference image. The primary challenge is how to precisely place the pixels from the reference image into the hole region. Therefore, understanding the 3D geometry that relates pixels between two views is a crucial step towards building a better model. Given the complexity of handling various type… ▽ More

    Submitted 8 October, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Accepted to WACV 2023

  8. arXiv:2109.02161  [pdf, other

    cs.AI

    Modular Framework for Visuomotor Language Grounding

    Authors: Kolby Nottingham, Litian Liang, Daeyun Shin, Charless C. Fowlkes, Roy Fox, Sameer Singh

    Abstract: Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. However, data collection for these tasks is expensive and end-to-end approaches suffer from data inefficiency. We propose the structuring of language, acting, and visual tasks into separate modules that can be trained independently. Using a Language, Action, and Vision (LAV) frame… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

  9. arXiv:2107.08039  [pdf, other

    cs.CV cs.LG

    Representation Consolidation for Training Expert Students

    Authors: Zhizhong Li, Avinash Ravichandran, Charless Fowlkes, Marzia Polito, Rahul Bhotika, Stefano Soatto

    Abstract: Traditionally, distillation has been used to train a student model to emulate the input/output functionality of a teacher. A more useful goal than emulation, yet under-explored, is for the student to learn feature representations that transfer well to future tasks. However, we observe that standard distillation of task-specific teachers actually *reduces* the transferability of student representat… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  10. arXiv:2105.14158  [pdf, other

    cs.CV

    SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation

    Authors: Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes

    Abstract: Temporal action segmentation is a task to classify each frame in the video with an action label. However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset. Thus in this work we propose an unsupervised method, namely SSCAP, that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments ac… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: WACV 2022 camera ready

  11. arXiv:2102.00084  [pdf, other

    cs.CV cs.LG

    A linearized framework and a new benchmark for model selection for fine-tuning

    Authors: Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, Pietro Perona

    Abstract: Fine-tuning from a collection of models pre-trained on different domains (a "model zoo") is emerging as a technique to improve test accuracy in the low-data regime. However, model selection, i.e. how to pre-select the right model to fine-tune from a model zoo without performing any training, remains an open topic. We use a linearized framework to approximate fine-tuning, and introduce two new base… ▽ More

    Submitted 29 January, 2021; originally announced February 2021.

    Comments: 14 pages

  12. arXiv:2101.08482  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

    Authors: Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

    Abstract: We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN s… ▽ More

    Submitted 18 June, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: accepted by CVPR21 as Oral presentation

  13. arXiv:2007.03887  [pdf, other

    cs.CV

    Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias

    Authors: Yunhan Zhao, Shu Kong, Charless Fowlkes

    Abstract: Monocular depth predictors are typically trained on large-scale training sets which are naturally biased w.r.t the distribution of camera poses. As a result, trained predictors fail to make reliable depth predictions for testing examples captured under uncommon camera poses. To address this issue, we propose two novel techniques that exploit the camera pose during training and prediction. First, w… ▽ More

    Submitted 28 March, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted at CVPR2021, Oral

  14. arXiv:2006.11747  [pdf, other

    cs.CV

    Weak Supervision and Referring Attention for Temporal-Textual Association Learning

    Authors: Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

    Abstract: A system capturing the association between video frames and textual queries offer great potential for better video analysis. However, training such a system in a fully supervised way inevitably demands a meticulously curated video dataset with temporal-textual annotations. Therefore we provide a Weak-Supervised alternative with our proposed Referring Attention mechanism to learn temporal-textual a… ▽ More

    Submitted 27 June, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: 12 pages, 6 figures

  15. arXiv:2005.04884  [pdf, other

    cs.CV

    Celeganser: Automated Analysis of Nematode Morphology and Age

    Authors: Linfeng Wang, Shu Kong, Zachary Pincus, Charless Fowlkes

    Abstract: The nematode Caenorhabditis elegans (C. elegans) serves as an important model organism in a wide variety of biological studies. In this paper we introduce a pipeline for automated analysis of C. elegans imagery for the purpose of studying life-span, health-span and the underlying genetic determinants of aging. Our system detects and segments the worm, and predicts body coordinates at each pixel lo… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: Computer Vision for Microscopy Image Analysis (CVMI) 2020

  16. arXiv:2004.03143  [pdf, other

    cs.CV

    Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation

    Authors: Zhe Wang, Daeyun Shin, Charless C. Fowlkes

    Abstract: Monocular estimation of 3d human pose has attracted increased attention with the availability of large ground-truth motion capture datasets. However, the diversity of training data available is limited and it is not clear to what extent methods generalize outside the specific datasets they are trained on. In this work we carry out a systematic study of the diversity and biases present in specific… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: http://wangzheallen.github.io/cross-dataset-generalization

  17. arXiv:2002.12114  [pdf, other

    cs.CV

    Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

    Authors: Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes

    Abstract: Leveraging synthetically rendered data offers great potential to improve monocular depth estimation and other geometric estimation tasks, but closing the synthetic-real domain gap is a non-trivial and important task. While much recent work has focused on unsupervised domain adaptation, we consider a more realistic scenario where a large amount of synthetic training data is supplemented by a small… ▽ More

    Submitted 25 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: camera-ready version, CVPR2020

  18. arXiv:1908.06552  [pdf, other

    cs.CV

    Weakly-supervised Action Localization with Background Modeling

    Authors: Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes

    Abstract: We describe a latent approach that learns to detect actions in long sequences given training videos with only whole-video class labels. Our approach makes use of two innovations to attention-modeling in weakly-supervised learning. First, and most notably, our framework uses an attention model to extract both foreground and background frames whose appearance is explicitly modeled. Most prior works… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

    Comments: To appear at ICCV 2019

  19. arXiv:1905.07718  [pdf, other

    cs.CV

    Geometric Pose Affordance: 3D Human Pose with Scene Constraints

    Authors: Zhe Wang, Liyan Chen, Shaurya Rathore, Daeyun Shin, Charless Fowlkes

    Abstract: Full 3D estimation of human pose from a single image remains a challenging task despite many recent advances. In this paper, we explore the hypothesis that strong prior information about scene geometry can be used to improve pose estimation accuracy. To tackle this question empirically, we have assembled a novel $\textbf{Geometric Pose Affordance}$ dataset, consisting of multi-view imagery of peop… ▽ More

    Submitted 8 December, 2021; v1 submitted 19 May, 2019; originally announced May 2019.

    Comments: $\href{https://wangzheallen.github.io/GPA.html}{Project Page}$, in submission to CVIU

  20. arXiv:1904.03589  [pdf, other

    cs.CV

    Modularized Textual Grounding for Counterfactual Resilience

    Authors: Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

    Abstract: Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries. To achieve high grounding precision, current textual grounding methods heavily rely on large-scale training data with manual annotations at the pixel level. Such annotations are expensive to obtain and thus severely narrow the model's scope of rea… ▽ More

    Submitted 1 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: 13 pages, 12 figures, IEEE Conference on Computer Vision and Pattern Recognition, 2019

  21. arXiv:1904.01693  [pdf, other

    cs.CV

    Multigrid Predictive Filter Flow for Unsupervised Learning on Videos

    Authors: Shu Kong, Charless Fowlkes

    Abstract: We introduce multigrid Predictive Filter Flow (mgPFF), a framework for unsupervised learning on videos. The mgPFF takes as input a pair of frames and outputs per-pixel filters to warp one frame to the other. Compared to optical flow used for war** frames, mgPFF is more powerful in modeling sub-pixel movement and dealing with corruption (e.g., motion blur). We develop a multigrid coarse-to-fine m… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: webpage (https://www.ics.uci.edu/~skong2/mgpff.html)

  22. Sparse Representations for Object and Ego-motion Estimation in Dynamic Scenes

    Authors: Hirak J Kashyap, Charless Fowlkes, Jeffrey L Krichmar

    Abstract: Dynamic scenes that contain both object motion and egomotion are a challenge for monocular visual odometry (VO). Another issue with monocular VO is the scale ambiguity, i.e. these methods cannot estimate scene depth and camera motion in real scale. Here, we propose a learning based approach to predict camera motion parameters directly from optic flow, by marginalizing depthmap variations and outli… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

    Comments: With supplementary material

  23. arXiv:1902.06729  [pdf, other

    cs.CV

    3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers

    Authors: Daeyun Shin, Zhile Ren, Erik B. Sudderth, Charless C. Fowlkes

    Abstract: We tackle the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewer-centered, multi-layer representation of scene geometry adapted from recent methods for single object shape completion. To improve the accuracy of view-centered representat… ▽ More

    Submitted 27 August, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: Accepted at ICCV 2019. Paper title changed. Project web page: https://research.dshin.org/iccv19/multi-layer-depth

  24. arXiv:1902.03545  [pdf, other

    cs.LG cs.AI stat.ML

    Task2Vec: Task Embedding for Meta-Learning

    Authors: Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona

    Abstract: We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined over those labels, we process images through a "probe network" and compute an embedding based on estimates of the Fisher information matrix associated with the… ▽ More

    Submitted 10 February, 2019; originally announced February 2019.

  25. arXiv:1811.11482  [pdf, other

    eess.IV cs.CV cs.GR cs.LG

    Image Reconstruction with Predictive Filter Flow

    Authors: Shu Kong, Charless Fowlkes

    Abstract: We propose a simple, interpretable framework for solving a wide range of image reconstruction problems such as denoising and deconvolution. Given a corrupted input image, the model synthesizes a spatially varying linear filter which, when applied to the input image, reconstructs the desired output. The model parameters are learned using supervised or self-supervised training. We test this model on… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: https://www.ics.uci.edu/~skong2/pff.html

  26. arXiv:1807.00493  [pdf, other

    cs.CV

    Active Testing: An Efficient and Robust Framework for Estimating Accuracy

    Authors: Phuc Nguyen, Deva Ramanan, Charless Fowlkes

    Abstract: Much recent work on visual recognition aims to scale up learning to massive, noisily-annotated datasets. We address the problem of scaling- up the evaluation of such models to large-scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the test set and ignore the rest, or else correct errors in annotation as they are f… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: accepted to ICML 2018

  27. arXiv:1806.10309  [pdf, other

    cs.CG

    CeMNet: Self-supervised learning for accurate continuous ego-motion estimation

    Authors: Minhaeng Lee, Charless C. Fowlkes

    Abstract: In this paper, we propose a novel self-supervised learning model for estimating continuous ego-motion from video. Our model learns to estimate camera motion by watching RGBD or RGB video streams and determining translational and rotation velocities that correctly predict the appearance of future frames. Our approach differs from other recent work on self-supervised structure-from-motion in its use… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

  28. arXiv:1805.06447  [pdf, other

    cs.CV

    Resisting Large Data Variations via Introspective Transformation Network

    Authors: Yunhan Zhao, Ye Tian, Charless Fowlkes, Wei Shen, Alan Yuille

    Abstract: Training deep networks that generalize to a wide range of variations in test data is essential to building accurate and robust image classifiers. One standard strategy is to apply data augmentation to synthetically enlarge the training set. However, data augmentation is essentially a brute-force method which generates uniform samples from some pre-defined set of transformations. In this paper, we… ▽ More

    Submitted 26 June, 2020; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: camera-ready version, WACV 2020

  29. arXiv:1805.01556  [pdf, other

    cs.CV

    Pixel-wise Attentional Gating for Parsimonious Pixel Labeling

    Authors: Shu Kong, Charless Fowlkes

    Abstract: To achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget, we propose a \emph{Pixel-wise Attentional Gating} unit (\emph{PAG}) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network. PAG is a generic, architecture-independent, problem-agnostic mechanism that can be readily "plugged in" to an existing mo… ▽ More

    Submitted 18 December, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: https://www.ics.uci.edu/~skong2/PAG.html

  30. arXiv:1805.01024  [pdf

    cs.CV cs.MM

    Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model

    Authors: Feng Zhou, Shu Kong, Charless Fowlkes, Tao Chen, Baiying Lei

    Abstract: Automated facial expression analysis has a variety of applications in human-computer interaction. Traditional methods mainly analyze prototypical facial expressions of no more than eight discrete emotions as a classification task. However, in practice, spontaneous facial expressions in naturalistic environment can represent not only a wide range of emotions, but also different intensities within a… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: code: http://www.ics.uci.edu/~skong2/DimensionalEmotionModel.html

  31. arXiv:1804.06032  [pdf, other

    cs.CV

    Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction

    Authors: Daeyun Shin, Charless C. Fowlkes, Derek Hoiem

    Abstract: The goal of this paper is to compare surface-based and volumetric 3D object shape representations, as well as viewer-centered and object-centered reference frames for single-view 3D shape prediction. We propose a new algorithm for predicting depth maps from multiple viewpoints, with a single depth or RGB image as input. By modifying the network and the way models are evaluated, we can directly com… ▽ More

    Submitted 11 June, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

    Comments: CVPR 2018

  32. Cross-Domain Image Matching with Deep Feature Maps

    Authors: Bailey Kong, James Supancic, Deva Ramanan, Charless C. Fowlkes

    Abstract: We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features ex… ▽ More

    Submitted 1 October, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

  33. arXiv:1801.07853  [pdf, other

    cs.CV

    Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

    Authors: Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes

    Abstract: Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can captu… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.

    Comments: 8 pages, 5 figures, state-of-the-art VQA system; https://github.com/wangzheallen/STL-VQA

  34. arXiv:1712.08273  [pdf, other

    cs.CV cs.LG cs.MM

    Recurrent Pixel Embedding for Instance Grou**

    Authors: Shu Kong, Charless Fowlkes

    Abstract: We introduce a differentiable, end-to-end trainable framework for solving pixel-level grou** problems such as instance segmentation consisting of two novel components. First, we regress pixels into a hyper-spherical embedding space so that pixels from the same group have high cosine similarity while those from different groups have similarity below a specified margin. We analyze the choice of em… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

  35. arXiv:1710.01820  [pdf, other

    cs.CV

    Energy-Based Spherical Sparse Coding

    Authors: Bailey Kong, Charless C. Fowlkes

    Abstract: In this paper, we explore an efficient variant of convolutional sparse coding with unit norm code vectors where reconstruction quality is evaluated using an inner product (cosine distance). To use these codes for discriminative classification, we describe a model we term Energy-Based Spherical Sparse Coding (EB-SSC) in which the hypothesized class label introduces a learned linear bias into the co… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

  36. arXiv:1705.07238  [pdf, other

    cs.CV

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Authors: Shu Kong, Charless Fowlkes

    Abstract: Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant… ▽ More

    Submitted 5 December, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

  37. arXiv:1612.01689  [pdf, other

    cs.CV

    Cluster-Wise Ratio Tests for Fast Camera Localization

    Authors: Raúl Díaz, Charless C. Fowlkes

    Abstract: Feature point matching for camera localization suffers from scalability problems. Even when feature descriptors associated with 3D scene points are locally unique, as coverage grows, similar or repeated features become increasingly common. As a result, the standard distance ratio-test used to identify reliable image feature points is overly restrictive and rejects many good candidate matches. We p… ▽ More

    Submitted 20 May, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

  38. arXiv:1611.05109  [pdf, other

    cs.CV

    Low-rank Bilinear Pooling for Fine-Grained Classification

    Authors: Shu Kong, Charless Fowlkes

    Abstract: Pooling second-order local feature statistics to form a high-dimensional bilinear feature has been shown to achieve state-of-the-art performance on a variety of fine-grained classification tasks. To address the computational demands of high feature dimensionality, we propose to represent the covariance features as a matrix and apply a low-rank bilinear classifier. The resulting classifier can be e… ▽ More

    Submitted 29 November, 2016; v1 submitted 15 November, 2016; originally announced November 2016.

  39. Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions

    Authors: Shaofei Wang, Charless C. Fowlkes

    Abstract: We describe an end-to-end framework for learning parameters of min-cost flow multi-target tracking problem with quadratic trajectory interactions including suppression of overlap** tracks and contextual cues about cooccurrence of different objects. Our approach utilizes structured prediction with a tracking-specific loss function to learn the complete set of model parameters. In this learning fr… ▽ More

    Submitted 5 October, 2016; originally announced October 2016.

    Comments: arXiv admin note: text overlap with arXiv:1412.2066

  40. arXiv:1606.01621  [pdf, other

    cs.CV cs.IR cs.MM

    Photo Aesthetics Ranking Network with Attributes and Content Adaptation

    Authors: Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes

    Abstract: Real-world applications could benefit from the ability to automatically generate a fine-grained ranking of photo aesthetics. However, previous methods for image aesthetics analysis have primarily focused on the coarse, binary categorization of images into high- or low-aesthetic categories. In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the r… ▽ More

    Submitted 26 July, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

  41. arXiv:1605.02264  [pdf, other

    cs.CV

    Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

    Authors: Golnaz Ghiasi, Charless C. Fowlkes

    Abstract: CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization… ▽ More

    Submitted 30 July, 2016; v1 submitted 7 May, 2016; originally announced May 2016.

  42. arXiv:1605.02240  [pdf, other

    cs.CV

    On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference

    Authors: Anish Acharya, Uddipan Mukherjee, Charless Fowlkes

    Abstract: Estimates of image gradients play a ubiquitous role in image segmentation and classification problems since gradients directly relate to the boundaries or the edges of a scene. This paper proposes an unified approach to gradient estimation based on fractional calculus that is computationally cheap and readily applicable to any existing algorithm that relies on image gradients. We show experiments… ▽ More

    Submitted 7 May, 2016; originally announced May 2016.

  43. arXiv:1605.00775  [pdf, other

    cs.CV q-bio.PE q-bio.QM

    Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification

    Authors: Shu Kong, Surangi Punyasena, Charless Fowlkes

    Abstract: We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology. We introduce a novel criteria for selecting meaningful and discriminative exemplar patches. We optimize this function during training using a greedy submodular function o… ▽ More

    Submitted 3 May, 2016; originally announced May 2016.

    Comments: CVMI 2016

  44. arXiv:1603.09439  [pdf, other

    cs.CV

    The Open World of Micro-Videos

    Authors: Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, Deva Ramanan

    Abstract: Micro-videos are six-second videos popular on social media networks with several unique properties. Firstly, because of the authoring process, they contain significantly more diversity and narrative structure than existing collections of video "snippets". Secondly, because they are often captured by hand-held mobile cameras, they contain specialized viewpoints including third-person, egocentric, a… ▽ More

    Submitted 31 March, 2016; v1 submitted 30 March, 2016; originally announced March 2016.

  45. arXiv:1512.02413  [pdf, other

    cs.CV

    Tracking Objects with Higher Order Interactions using Delayed Column Generation

    Authors: Shaofei Wang, Steffen Wolf, Charless Fowlkes, Julian Yarkony

    Abstract: We study the problem of multi-target tracking and data association in video. We formulate this in terms of selecting a subset of high-quality tracks subject to the constraint that no pair of selected tracks is associated with a common detection (of an object). This objective is equivalent to the classic NP-hard problem of finding a maximum-weight set packing (MWSP) where tracks correspond to sets… ▽ More

    Submitted 9 August, 2016; v1 submitted 8 December, 2015; originally announced December 2015.

  46. arXiv:1507.03698  [pdf, other

    cs.CV

    Lifting GIS Maps into Strong Geometric Context for Scene Understanding

    Authors: Raúl Díaz, Minhaeng Lee, Jochen Schubert, Charless C. Fowlkes

    Abstract: Contextual information can have a substantial impact on the performance of visual tasks such as semantic segmentation, object detection, and geometric estimation. Data stored in Geographic Information Systems (GIS) offers a rich source of contextual information that has been largely untapped by computer vision. We propose to leverage such information for scene understanding by combining GIS reso… ▽ More

    Submitted 8 January, 2016; v1 submitted 13 July, 2015; originally announced July 2015.

  47. arXiv:1507.02407  [pdf, other

    cs.DS cs.CG cs.CV

    Planar Ultrametric Rounding for Image Segmentation

    Authors: Julian Yarkony, Charless C. Fowlkes

    Abstract: We study the problem of hierarchical clustering on planar graphs. We formulate this in terms of an LP relaxation of ultrametric rounding. To solve this LP efficiently we introduce a dual cutting plane scheme that uses minimum cost perfect matching as a subroutine in order to efficiently explore the space of planar partitions. We apply our algorithm to the problem of hierarchical image segmentation… ▽ More

    Submitted 9 September, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

    MSC Class: 68T45

  48. arXiv:1506.08347  [pdf, other

    cs.CV

    Occlusion Coherence: Detecting and Localizing Occluded Faces

    Authors: Golnaz Ghiasi, Charless C. Fowlkes

    Abstract: The presence of occluders significantly impacts object recognition accuracy. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and landmark localization that explicitly models part occlusion. The propo… ▽ More

    Submitted 24 August, 2016; v1 submitted 27 June, 2015; originally announced June 2015.

  49. Do We Need More Training Data?

    Authors: Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan

    Abstract: Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defi… ▽ More

    Submitted 4 March, 2015; originally announced March 2015.

  50. arXiv:1412.4181  [pdf, other

    cs.CV

    Oriented Edge Forests for Boundary Detection

    Authors: Sam Hallman, Charless C. Fowlkes

    Abstract: We present a simple, efficient model for learning boundary detection based on a random forest classifier. Our approach combines (1) efficient clustering of training examples based on simple partitioning of the space of local edge orientations and (2) scale-dependent calibration of individual tree output probabilities prior to multiscale combination. The resulting model outperforms published result… ▽ More

    Submitted 28 June, 2015; v1 submitted 12 December, 2014; originally announced December 2014.

    Comments: updated to include contents of CVPR version + new figure showing example segmentation results