Skip to main content

Showing 1–50 of 65 results for author: Perona, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07320  [pdf, other

    cs.CV stat.AP

    A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation

    Authors: Riccardo Fogliato, Pratik Patil, Mathew Monfort, Pietro Perona

    Abstract: Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framew… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2405.15243  [pdf, other

    cs.CV

    Less is More: Discovering Concise Network Explanations

    Authors: Neehar Kondapaneni, Markus Marks, Oisin MacAodha, Pietro Perona

    Abstract: We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations to enhance the interpretability of deep neural image classifiers. Our method automatically finds visual explanations that are critical for discriminating between classes. This is achieved by simultaneously optimizing three criteria: the explanations should be few,… ▽ More

    Submitted 13 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures; ICLR Re-Align Workshop 2024; Project Page: https://www.vision.caltech.edu/dcne/ Github: https://github.com/nkondapa/DiscoveringConciseNetworkExplanations

  3. arXiv:2403.12029  [pdf, other

    cs.CV cs.AI cs.LG

    Align and Distill: Unifying and Improving Domain Adaptive Object Detection

    Authors: Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, Grant Van Horn

    Abstract: Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inc… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 30 pages, 10 figures

  4. arXiv:2402.10795  [pdf, other

    cs.LG cs.CY cs.HC

    Diversified Ensembling: An Experiment in Crowdsourced Machine Learning

    Authors: Ira Globus-Harris, Declan Harrison, Michael Kearns, Pietro Perona, Aaron Roth

    Abstract: Crowdsourced machine learning on competition platforms such as Kaggle is a popular and often effective method for generating accurate models. Typically, teams vie for the most accurate model, as measured by overall error on a holdout set, and it is common towards the end of such competitions for teams at the top of the leaderboard to ensemble or average their models outside the platform mechanism… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  5. arXiv:2402.05398  [pdf, other

    cs.CV

    On the Effect of Image Resolution on Semantic Segmentation

    Authors: Ritambhara Singh, Abhishek Jain, Pietro Perona, Shivani Agarwal, Junfeng Yang

    Abstract: High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model ca… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2209.08667 by other authors

  6. arXiv:2310.00031  [pdf, other

    cs.CV

    Text-image Alignment for Diffusion-based Perception

    Authors: Neehar Kondapaneni, Markus Marks, Manuel Knott, Rogerio Guimaraes, Pietro Perona

    Abstract: Diffusion models are generative models with impressive text-to-image synthesis capabilities and have spurred a new wave of creative methods for classical machine learning tasks. However, the best way to harness the perceptual knowledge of these generative models for visual tasks is still an open question. Specifically, it is unclear how to use the prompting interface when applying diffusion backbo… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Project page: https://www.vision.caltech.edu/tadp/, Code page: github.com/damaggu/TADP

  7. arXiv:2308.05441  [pdf, other

    cs.CV

    Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

    Authors: Hao Liang, Pietro Perona, Guha Balakrishnan

    Abstract: We propose an experimental method for measuring bias in face recognition systems. Existing methods to measure bias depend on benchmark datasets that are collected in the wild and annotated for protected (e.g., race, gender) and non-protected (e.g., pose, lighting) attributes. Such observational datasets only permit correlational conclusions, e.g., "Algorithm A's accuracy is different on female and… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: accepted to iccv2023; 18 figures

  8. arXiv:2306.02564  [pdf, other

    cs.LG cs.CV

    Spatial Implicit Neural Representations for Global-Scale Species Map**

    Authors: Elijah Cole, Grant Van Horn, Christian Lange, Alexander Shepard, Patrick Leary, Pietro Perona, Scott Loarie, Oisin Mac Aodha

    Abstract: Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging l… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  9. arXiv:2306.01198  [pdf, other

    stat.ME cs.CV stat.ML

    Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations

    Authors: Riccardo Fogliato, Pratik Patil, Pietro Perona

    Abstract: Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the l… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  10. arXiv:2305.15584  [pdf, ps, other

    cs.LG cs.CV

    Understanding Label Bias in Single Positive Multi-Label Learning

    Authors: Julio Arroyo, Pietro Perona, Elijah Cole

    Abstract: Annotating data for multi-label classification is prohibitively expensive because every category of interest must be confirmed to be present or absent. Recent work on single positive multi-label (SPML) learning shows that it is possible to train effective multi-label classifiers using only one positive label per image. However, the standard benchmarks for SPML are derived from traditional multi-la… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2023, Tiny Papers Track

  11. arXiv:2212.07401  [pdf, other

    cs.CV cs.AI

    BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

    Authors: Jennifer J. Sun, Lili Karashchuk, Amil Dravid, Serim Ryou, Sonia Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona

    Abstract: Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a ne… ▽ More

    Submitted 2 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Project page: https://sites.google.com/view/b-kind/3d Code: https://github.com/neuroethology/BKinD-3D

  12. arXiv:2207.10553  [pdf, other

    cs.LG cs.AI cs.CV cs.MA

    MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior

    Authors: Jennifer J. Sun, Markus Marks, Andrew Ulmer, Dipam Chakraborty, Brian Geuther, Edward Hayes, Heng Jia, Vivek Kumar, Sebastian Oleszko, Zachary Partridge, Milan Peelman, Alice Robie, Catherine E. Schretter, Keith Sheppard, Chao Sun, Param Uttarwar, Julian M. Wagner, Eric Werner, Joseph Parker, Pietro Perona, Yisong Yue, Kristin Branson, Ann Kennedy

    Abstract: We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. This dataset is collected from a variety of biology experiments, and includes triplets of interacting mice (4.7 million frames video+pose tracking data, 10 million frames pose only), symbiotic beetle-ant interactions (10 million frames video data), and groups of… ▽ More

    Submitted 30 June, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

    Comments: To appear in ICML 2023, Project website: https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset

  13. arXiv:2207.10225  [pdf, other

    cs.CV cs.LG

    On Label Granularity and Object Localization

    Authors: Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

    Abstract: Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation w… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  14. arXiv:2207.10157  [pdf, other

    cs.CV cs.HC

    Visual Knowledge Tracing

    Authors: Neehar Kondapaneni, Pietro Perona, Oisin Mac Aodha

    Abstract: Each year, thousands of people learn new visual categorization tasks -- radiologists learn to recognize tumors, birdwatchers learn to distinguish similar species, and crowd workers learn how to annotate valuable data for applications like autonomous driving. As humans learn, their brain updates the visual features it extracts and attend to, which ultimately informs their final classification decis… ▽ More

    Submitted 21 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 14 pages, 4 figures, 14 supplemental pages, 11 supplemental figures, accepted to European Conference on Computer Vision (ECCV) 2022

  15. arXiv:2207.09295  [pdf, other

    cs.CV cs.LG

    The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

    Authors: Justin Kay, Peter Kulits, Suzanne Stathatos, Siqi Deng, Erik Young, Sara Beery, Grant Van Horn, Pietro Perona

    Abstract: We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. 33 pages, 12 figures

  16. arXiv:2202.11833  [pdf, other

    cs.CV cs.AI

    Near Perfect GAN Inversion

    Authors: Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, Aleix Martinez

    Abstract: To edit a real photo using Generative Adversarial Networks (GANs), we need a GAN inversion algorithm to identify the latent vector that perfectly reproduces it. Unfortunately, whereas existing inversion algorithms can synthesize images similar to real photos, they cannot generate the identical clones needed in most applications. Here, we derive an algorithm that achieves near perfect reconstructio… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  17. arXiv:2202.05508  [pdf, other

    cs.CV cs.CL cs.LG

    Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

    Authors: Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona

    Abstract: Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting… ▽ More

    Submitted 14 February, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  18. arXiv:2201.10423  [pdf, other

    cs.CV cs.GR cs.LG

    Rayleigh EigenDirections (REDs): GAN latent space traversals for multidimensional features

    Authors: Guha Balakrishnan, Raghudeep Gadde, Aleix Martinez, Pietro Perona

    Abstract: We present a method for finding paths in a deep generative model's latent space that can maximally vary one set of image features while holding others constant. Crucially, unlike past traversal approaches, ours can manipulate multidimensional features of an image such as facial identity and pixels within a specified region. Our method is principled and conceptually simple: optimal traversal direct… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  19. arXiv:2112.05121  [pdf, other

    cs.CV

    Self-Supervised Keypoint Discovery in Behavioral Videos

    Authors: Jennifer J. Sun, Serim Ryou, Roni Goldshmid, Brandon Weissbourd, John Dabiri, David J. Anderson, Ann Kennedy, Yisong Yue, Pietro Perona

    Abstract: We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video fram… ▽ More

    Submitted 27 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Code: https://github.com/neuroethology/BKinD Project page: https://sites.google.com/view/b-kind

  20. arXiv:2109.13423  [pdf, other

    cs.CV

    Weakly Supervised Keypoint Discovery

    Authors: Serim Ryou, Pietro Perona

    Abstract: In this paper, we propose a method for keypoint discovery from a 2D image using image-level supervision. Recent works on unsupervised keypoint discovery reliably discover keypoints of aligned instances. However, when the target instances have high viewpoint or appearance variation, the discovered keypoints do not match the semantic correspondences over different images. Our work aims to discover k… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  21. arXiv:2107.10400  [pdf, other

    cs.LG stat.AP

    Species Distribution Modeling for Machine Learning Practitioners: A Review

    Authors: Sara Beery, Elijah Cole, Joseph Parker, Pietro Perona, Kevin Winner

    Abstract: Conservation science depends on an accurate understanding of what's happening in a given ecosystem. How many species live there? What is the makeup of the population? How is that changing over time? Species Distribution Modeling (SDM) seeks to predict the spatial (and sometimes temporal) patterns of species occurrence, i.e. where a species is likely to be found. The last few years have seen a surg… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

    Comments: ACM COMPASS 2021

  22. arXiv:2106.09708  [pdf, other

    cs.CV cs.LG

    Multi-Label Learning from Single Positive Labels

    Authors: Elijah Cole, Oisin Mac Aodha, Titouan Lorieul, Pietro Perona, Dan Morris, Nebojsa Jojic

    Abstract: Predicting all applicable labels for a given image is known as multi-label classification. Compared to the standard multi-class case (where each image has only one label), it is considerably more challenging to annotate training data for multi-label classification. When the number of potential labels is large, human annotators find it difficult to mention all applicable labels for each training im… ▽ More

    Submitted 22 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: CVPR 2021. Supplementary material included

  23. arXiv:2104.02710  [pdf, other

    cs.LG cs.CV

    The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions

    Authors: Jennifer J. Sun, Tomomi Karigo, Dipam Chakraborty, Sharada P. Mohanty, Benjamin Wild, Quan Sun, Chen Chen, David J. Anderson, Pietro Perona, Yisong Yue, Ann Kennedy

    Abstract: Multi-agent behavior modeling aims to understand the interactions that occur between agents. We present a multi-agent dataset from behavioral neuroscience, the Caltech Mouse Social Interactions (CalMS21) Dataset. Our dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. To help accelerate behavioral studies,… ▽ More

    Submitted 18 November, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: NeurIPS2021 Datasets & Benchmarks. Dataset: https://data.caltech.edu/records/1991, Website: https://sites.google.com/view/computational-behavior/our-datasets/calms21-dataset

  24. arXiv:2103.13455  [pdf, other

    cs.CV cs.AI cs.LG stat.AP

    Matched sample selection with GANs for mitigating attribute confounding

    Authors: Chandan Singh, Guha Balakrishnan, Pietro Perona

    Abstract: Measuring biases of vision systems with respect to protected attributes like gender and age is critical as these systems gain widespread use in society. However, significant correlations between attributes in benchmark datasets make it difficult to separate algorithmic bias from dataset bias. To mitigate such attribute confounding during bias analysis, we propose a matching approach that selects a… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  25. arXiv:2102.00084  [pdf, other

    cs.CV cs.LG

    A linearized framework and a new benchmark for model selection for fine-tuning

    Authors: Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, Pietro Perona

    Abstract: Fine-tuning from a collection of models pre-trained on different domains (a "model zoo") is emerging as a technique to improve test accuracy in the low-data regime. However, model selection, i.e. how to pre-select the right model to fine-tune from a model zoo without performing any training, remains an open topic. We use a linearized framework to approximate fine-tuning, and introduce two new base… ▽ More

    Submitted 29 January, 2021; originally announced February 2021.

    Comments: 14 pages

  26. arXiv:2012.10873  [pdf, other

    cs.CV

    Sequence-to-Sequence Contrastive Learning for Text Recognition

    Authors: Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, Pietro Perona

    Abstract: We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive p… ▽ More

    Submitted 20 December, 2020; originally announced December 2020.

  27. arXiv:2012.04132  [pdf, other

    q-bio.NC cs.AI cs.CV cs.RO

    A Number Sense as an Emergent Property of the Manipulating Brain

    Authors: Neehar Kondapaneni, Pietro Perona

    Abstract: The ability to understand and manipulate numbers and quantities emerges during childhood, but the mechanism through which humans acquire and develop this ability is still poorly understood. We explore this question through a model, assuming that the learner is able to pick up and place small objects from, and to, locations of its choosing, and will spontaneously engage in such undirected manipulat… ▽ More

    Submitted 23 March, 2024; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: 16 pages, 5 figures, 15 supplemental figures

    Journal ref: Scientific reports, 14(6858) 2024

  28. arXiv:2011.13917  [pdf, other

    cs.CV cs.LG

    Task Programming: Learning Data Efficient Behavior Representations

    Authors: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Yue, Pietro Perona

    Abstract: Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annot… ▽ More

    Submitted 29 March, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: To appear in as an Oral in CVPR 2021. Code: https://github.com/neuroethology/TREBA. Project page: https://sites.google.com/view/task-programming

  29. arXiv:2007.06570  [pdf, other

    cs.CV cs.CY cs.LG

    Towards causal benchmarking of bias in face analysis algorithms

    Authors: Guha Balakrishnan, Yuanjun Xiong, Wei Xia, Pietro Perona

    Abstract: Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current methods to measure algorithmic bias in computer vision, which are based on observational datasets, are inadequate for this task because they conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic b… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Long-form version of ECCV 2020 paper

  30. arXiv:2003.01455  [pdf, other

    cs.CV

    Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

    Authors: Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka

    Abstract: Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classificat… ▽ More

    Submitted 20 June, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at CVPR 2020

  31. arXiv:2002.01708  [pdf, other

    cs.CV

    Geocoding of trees from street addresses and street-level images

    Authors: Daniel Laumer, Nico Lang, Natalie van Doorn, Oisin Mac Aodha, Pietro Perona, Jan Dirk Wegner

    Abstract: We introduce an approach for updating older tree inventories with geographic coordinates using street-level panorama images and a global optimization framework for tree instance matching. Geolocations of trees in inventories until the early 2000s where recorded using street addresses whereas newer inventories use GPS. Our method retrofits older inventories with geographic coordinates to allow conn… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

    Comments: Accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing

  32. arXiv:2001.00057  [pdf, other

    cs.CV cs.LG

    HMM-guided frame querying for bandwidth-constrained video search

    Authors: Bhairav Chidambaram, Mason McGill, Pietro Perona

    Abstract: We design an agent to search for frames of interest in video stored on a remote server, under bandwidth constraints. Using a convolutional neural network to score individual frames and a hidden Markov model to propagate predictions across frames, our agent accurately identifies temporal regions of interest based on sparse, strategically sampled frames. On a subset of the ImageNet-VID dataset, we d… ▽ More

    Submitted 31 December, 2019; originally announced January 2020.

    Comments: 4 pages, 5 figures

  33. arXiv:1911.12317  [pdf, other

    cs.CV

    PanDA: Panoptic Data Augmentation

    Authors: Yang Liu, Pietro Perona, Markus Meister

    Abstract: The recently proposed panoptic segmentation task presents a significant challenge of image understanding with computer vision by unifying semantic segmentation and instance segmentation tasks. In this paper we present an efficient and novel panoptic data augmentation (PanDA) method which operates exclusively in pixel space, requires no additional data or training, and is computationally cheap to i… ▽ More

    Submitted 3 April, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

  34. From Google Maps to a Fine-Grained Catalog of Street trees

    Authors: Steve Branson, Jan Dirk Wegner, David Hall, Nico Lang, Konrad Schindler, Pietro Perona

    Abstract: Up-to-date catalogs of the urban tree population are important for municipalities to monitor and improve quality of life in cities. Despite much research on automation of tree map**, mainly relying on dedicated airborne LiDAR or hyperspectral campaigns, trees are still mostly mapped manually in practice. We present a fully automated tree detection and species recognition pipeline to process thou… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

    Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 135, January 2018, Pages 13-30

  35. arXiv:1909.11155  [pdf, other

    cs.CV cs.LG

    Anchor Loss: Modulating Loss Scale based on Prediction Difficulty

    Authors: Serim Ryou, Seong-Gyun Jeong, Pietro Perona

    Abstract: We propose a novel loss function that dynamically rescales the cross entropy based on prediction difficulty regarding a sample. Deep neural network architectures in image classification tasks struggle to disambiguate visually similar objects. Likewise, in human pose estimation symmetric body parts often confuse the network with assigning indiscriminative scores to them. This is due to the output p… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: To appear in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019. (oral)

  36. arXiv:1907.07617  [pdf, other

    cs.CV

    The iWildCam 2019 Challenge Dataset

    Authors: Sara Beery, Dan Morris, Pietro Perona

    Abstract: Camera Traps (or Wild Cams) enable the automatic collection of large quantities of image data. Biologists all over the world use camera traps to monitor biodiversity and population density of animal species. The computer vision community has been making strides towards automating the species classification challenge in camera traps, but as we try to expand the scope of these models from specific r… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: From the Sixth Fine-Grained Visual Categorization Workshop at CVPR19. arXiv admin note: text overlap with arXiv:1904.05986

  37. arXiv:1906.05272  [pdf, other

    cs.CV cs.LG

    Presence-Only Geographical Priors for Fine-Grained Image Classification

    Authors: Oisin Mac Aodha, Elijah Cole, Pietro Perona

    Abstract: Appearance information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Human experts make use of additional cues such as where, and when, a given image was taken in order to inform their final decision. This contextual information is readily available in many online image collections but has been underutilized by existing image classifiers that foc… ▽ More

    Submitted 28 October, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: ICCV 2019

  38. arXiv:1904.05986  [pdf, other

    cs.CV

    The iWildCam 2018 Challenge Dataset

    Authors: Sara Beery, Grant van Horn, Oisin Mac Aodha, Pietro Perona

    Abstract: Camera traps are a valuable tool for studying biodiversity, but research using this data is limited by the speed of human annotation. With the vast amounts of data now available it is imperative that we develop automatic solutions for annotating camera trap data in order to allow this research to scale. A promising approach is based on deep networks trained on human-annotated images. We provide a… ▽ More

    Submitted 24 April, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: Challenge hosted at the fifth Fine-Grained Visual Categorization Workshop (FGVC5) at CVPR 2018

  39. arXiv:1904.05916  [pdf, other

    cs.CV

    Synthetic Examples Improve Generalization for Rare Classes

    Authors: Sara Beery, Yang Liu, Dan Morris, Jim Piavis, Ashish Kapoor, Markus Meister, Neel Joshi, Pietro Perona

    Abstract: The ability to detect and classify rare occurrences in images has important applications - for example, counting rare and endangered species when studying biodiversity, or detecting infrequent traffic scenarios that pose a danger to self-driving cars. Few-shot learning is an open problem: current computer vision systems struggle to categorize objects they have seen only rarely during training, and… ▽ More

    Submitted 14 May, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

  40. arXiv:1902.03545  [pdf, other

    cs.LG cs.AI stat.ML

    Task2Vec: Task Embedding for Meta-Learning

    Authors: Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona

    Abstract: We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined over those labels, we process images through a "probe network" and compute an embedding based on estimates of the Fisher information matrix associated with the… ▽ More

    Submitted 10 February, 2019; originally announced February 2019.

  41. arXiv:1807.04975  [pdf, other

    cs.CV q-bio.PE

    Recognition in Terra Incognita

    Authors: Sara Beery, Grant van Horn, Pietro Perona

    Abstract: It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera trap… ▽ More

    Submitted 24 July, 2018; v1 submitted 13 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV 2018

  42. arXiv:1805.08322  [pdf, other

    cs.AI cs.LG

    Teaching Multiple Concepts to a Forgetful Learner

    Authors: Anette Hunziker, Yuxin Chen, Oisin Mac Aodha, Manuel Gomez Rodriguez, Andreas Krause, Pietro Perona, Yisong Yue, Adish Singla

    Abstract: How can we help a forgetful learner learn multiple concepts within a limited time frame? While there have been extensive studies in designing optimal schedules for teaching a single concept given a learner's memory model, existing approaches for teaching multiple concepts are typically based on heuristic scheduling techniques without theoretical guarantees. In this paper, we look at the problem fr… ▽ More

    Submitted 25 October, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2019

  43. arXiv:1805.06880  [pdf, other

    cs.CV

    It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

    Authors: Matteo Ruggero Ronchi, Oisin Mac Aodha, Robert Eng, Pietro Perona

    Abstract: We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms… ▽ More

    Submitted 27 July, 2018; v1 submitted 17 May, 2018; originally announced May 2018.

    Comments: BMVC 2018. Project page available at http://www.vision.caltech.edu/~mronchi/projects/RelativePose

  44. arXiv:1804.02747  [pdf, other

    stat.ML cs.AI cs.LG stat.OT

    Fast Conditional Independence Test for Vector Variables with Large Sample Sizes

    Authors: Krzysztof Chalupka, Pietro Perona, Frederick Eberhardt

    Abstract: We present and evaluate the Fast (conditional) Independence Test (FIT) -- a nonparametric conditional independence test. The test is based on the idea that when $P(X \mid Y, Z) = P(X \mid Y)$, $Z$ is not useful as a feature to predict $X$, as long as $Y$ is also a regressor. On the contrary, if $P(X \mid Y, Z) \neq P(X \mid Y)$, $Z$ might improve prediction results. FIT applies to thousand-dimensi… ▽ More

    Submitted 8 April, 2018; originally announced April 2018.

  45. arXiv:1802.06924  [pdf, other

    cs.CV cs.LG stat.ML

    Teaching Categories to Human Learners with Visual Explanations

    Authors: Oisin Mac Aodha, Shihan Su, Yuxin Chen, Pietro Perona, Yisong Yue

    Abstract: We study the problem of computer-assisted teaching with explanations. Conventional approaches for machine teaching typically only provide feedback at the instance level e.g., the category or label of the instance. However, it is intuitive that clear explanations from a knowledgeable teacher can significantly improve a student's ability to learn a new concept. To address these existing limitations,… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

  46. arXiv:1802.05190  [pdf, other

    cs.LG

    Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners

    Authors: Yuxin Chen, Adish Singla, Oisin Mac Aodha, Pietro Perona, Yisong Yue

    Abstract: In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner's current state. However, most existing work in algorithmic machine teaching focuses on the batch setting, where adaptivity plays no role. In this paper, we study the case of teaching consistent, version space learners in an interactive setting. At any time step, the teac… ▽ More

    Submitted 8 December, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: NeurIPS 2018 (extended version)

  47. arXiv:1710.01691  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Context Embedding Networks

    Authors: Kun Ho Kim, Oisin Mac Aodha, Pietro Perona

    Abstract: Low dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. However, similarity is a multi-dimensional concept that varies from individual to individual. Existing models for learning embeddings from the crowd typically make simplifying… ▽ More

    Submitted 29 March, 2018; v1 submitted 22 September, 2017; originally announced October 2017.

    Comments: CVPR 2018 spotlight

  48. arXiv:1709.01450  [pdf, other

    cs.CV

    The Devil is in the Tails: Fine-grained Classification in the Wild

    Authors: Grant Van Horn, Pietro Perona

    Abstract: The world is long-tailed. What does this mean for computer vision and visual recognition? The main two implications are (1) the number of categories we need to consider in applications can be very large, and (2) the number of training examples for most categories can be very small. Current visual recognition algorithms have achieved excellent classification accuracy. However, they require many tra… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  49. arXiv:1707.06642  [pdf, other

    cs.CV

    The iNaturalist Species Classification and Detection Dataset

    Authors: Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie

    Abstract: Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset,… ▽ More

    Submitted 10 April, 2018; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: CVPR 2018

  50. arXiv:1707.05388  [pdf, other

    cs.CV

    Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation

    Authors: Matteo Ruggero Ronchi, Pietro Perona

    Abstract: We propose a new method to analyze the impact of errors in algorithms for multi-instance pose estimation and a principled benchmark that can be used to compare them. We define and characterize three classes of errors - localization, scoring, and background - study how they are influenced by instance attributes and their impact on an algorithm's performance. Our technique is applied to compare the… ▽ More

    Submitted 4 August, 2017; v1 submitted 17 July, 2017; originally announced July 2017.

    Comments: Project page available at http://www.vision.caltech.edu/~mronchi/projects/PoseErrorDiagnosis/; Code available at https://github.com/matteorr/coco-analyze; published at ICCV 17