Skip to main content

Showing 1–50 of 64 results for author: Crandall, D

.
  1. arXiv:2403.12431  [pdf, other

    cs.CV cs.AI

    Geometric Constraints in Deep Learning Frameworks: A Survey

    Authors: Vibhas K Vats, David J Crandall

    Abstract: Stereophotogrammetry is an emerging technique of scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric techniques of Shape from Stereo is built on using geometry to define constraints on scene and camera g… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: A preprint

  2. arXiv:2401.06960  [pdf, other

    cs.CV cs.AI

    Transformer for Object Re-Identification: A Survey

    Authors: Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du

    Abstract: Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from varying viewpoints. For a prolonged period, this field has been predominantly driven by deep convolutional neural networks. In recent years, the Transformer has witnessed remarkable advancements in computer vision, prompting an increasing body of research to delve into the application of Transformer in Re-ID. This… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  3. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  4. arXiv:2310.19583  [pdf, other

    cs.CV cs.LG

    GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

    Authors: Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza, Soon-heung Jung

    Abstract: Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at di… ▽ More

    Submitted 21 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted in WACV 2024 Link: https://openaccess.thecvf.com/content/WACV2024/html/Vats_GC-MVSNet_Multi-View_Multi-Scale_Geometrically-Consistent_Multi-View_Stereo_WACV_2024_paper.html

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  5. arXiv:2307.00064  [pdf, other

    cs.CV

    Situated Cameras, Situated Knowledges: Towards an Egocentric Epistemology for Computer Vision

    Authors: Samuel Goree, David Crandall

    Abstract: In her influential 1988 paper, Situated Knowledges, Donna Haraway uses vision and perspective as a metaphor to discuss scientific knowledge. Today, egocentric computer vision discusses many of the same issues, except in a literal vision context. In this short position paper, we collapse that metaphor, and explore the interactions between feminist epistemology and egocentric CV as "Egocentric Epist… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: Presented at the CVPR 2023 Ego4D workshop

  6. arXiv:2303.17061  [pdf, other

    cs.CV cs.GT

    A Tensor-based Convolutional Neural Network for Small Dataset Classification

    Authors: Zhenhua Chen, David Crandall

    Abstract: Inspired by the ConvNets with structured hidden representations, we propose a Tensor-based Neural Network, TCNN. Different from ConvNets, TCNNs are composed of structured neurons rather than scalar neurons, and the basic operation is neuron tensor transformation. Unlike other structured ConvNets, where the part-whole relationships are modeled explicitly, the relationships are learned implicitly in… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  7. arXiv:2303.02737  [pdf, other

    cs.CV

    SePaint: Semantic Map Inpainting via Multinomial Diffusion

    Authors: Zheng Chen, Deepak Duggirala, David Crandall, Lei Jiang, Lantao Liu

    Abstract: Prediction beyond partial observations is crucial for robots to navigate in unknown environments because it can provide extra information regarding the surroundings beyond the current sensing range or resolution. In this work, we consider the inpainting of semantic Bird's-Eye-View maps. We propose SePaint, an inpainting model for semantic data based on generative multinomial diffusion. To maintain… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  8. arXiv:2301.08237  [pdf, other

    cs.CV

    LoCoNet: Long-Short Context Network for Active Speaker Detection

    Authors: Xizi Wang, Feng Cheng, Gedas Bertasius, David Crandall

    Abstract: Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a video. ASD reasons from audio and visual information from two contexts: long-term intra-speaker context and short-term inter-speaker context. Long-term intra-speaker context models the temporal dependencies of the same speaker, while short-term inter-speaker context models the interactions of speakers in the same sc… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: accepted by CVPR 2024

  9. arXiv:2212.05051  [pdf, other

    cs.CV

    VindLU: A Recipe for Effective Video-and-Language Pretraining

    Authors: Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

    Abstract: The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures and sophisticated pretraining protocols, making the reproducibility, analysis and comparisons of these frameworks difficult. Hence, instead of proposing yet another new VidL model, this paper conducts a thorough e… ▽ More

    Submitted 5 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Project page: https://klauscc.github.io/vindlu.html

  10. arXiv:2209.11200  [pdf, other

    cs.CY cs.CV

    Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper

    Authors: Samuel Goree, Gabriel Appleby, David Crandall, Norman Su

    Abstract: Research papers, in addition to textual documents, are a designed interface through which researchers communicate. Recently, rapid growth has transformed that interface in many fields of computing. In this work, we examine the effects of this growth from a media archaeology perspective, through the changes to figures and tables in research papers. Specifically, we study these changes in computer v… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

    ACM Class: K.7.m

  11. arXiv:2208.07344  [pdf, other

    cs.CV

    Action Recognition based on Cross-Situational Action-object Statistics

    Authors: Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David Crandall, Chen Yu

    Abstract: Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition mode… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: Accepted to International Conference on Development and Learning (ICDL) 2022

  12. Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds

    Authors: Junbo Yin, Jianbing Shen, Xin Gao, David Crandall, Ruigang Yang

    Abstract: Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., the point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet),… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

  13. Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes. To alleviate this problem, off-the-shelf image generation techniques based on Generative Adversarial Networks (GANs) can potentially create additional training images. However, these GAN-generated images are often not helpful for actually improving the accuracy of o… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted to PAMI 2022. arXiv admin note: substantial text overlap with arXiv:1911.07164

  14. arXiv:2112.10047  [pdf, other

    cs.CV cs.AI

    Controlling the Quality of Distillation in Response-Based Network Compression

    Authors: Vibhas Vats, David Crandall

    Abstract: The performance of a distillation-based compressed network is governed by the quality of distillation. The reason for the suboptimal distillation of a large network (teacher) to a smaller network (student) is largely attributed to the gap in the learning capacities of given teacher-student pair. While it is hard to distill all the knowledge of a teacher, the quality of distillation can be controll… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: AAAI22-Workshop: 1st International Workshop on Practical Deep Learning in the Wild

  15. arXiv:2112.05533  [pdf, other

    cs.CV cs.RO

    Error Diagnosis of Deep Monocular Depth Estimation Models

    Authors: Jagpreet Chawla, Nikhil Thakurdesai, Anuj Godase, Md Reza, David Crandall, Soon-Heung Jung

    Abstract: Estimating depth from a monocular image is an ill-posed problem: when the camera projects a 3D scene onto a 2D plane, depth information is inherently and permanently lost. Nevertheless, recent work has shown impressive results in estimating 3D structure from 2D images using deep learning. In this paper, we put on an introspective hat and analyze state-of-the-art monocular depth estimation models i… ▽ More

    Submitted 15 November, 2021; originally announced December 2021.

    Comments: Presented at IROS'21

  16. arXiv:2111.00063  [pdf, other

    cs.CV cs.RO

    Polyline Based Generative Navigable Space Segmentation for Autonomous Visual Navigation

    Authors: Zheng Chen, Zhengming Ding, David Crandall, Lantao Liu

    Abstract: Detecting navigable space is a fundamental capability for mobile robots navigating in unknown or unmapped environments. In this work, we treat the visual navigable space segmentation as a scene decomposition problem and propose Polyline Segmentation Variational AutoEncoder Networks (PSV-Nets), a representation-learning-based framework to enable robots to learn the navigable space segmentation in a… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

  17. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  18. arXiv:2107.07095  [pdf, other

    cs.AI cs.CV

    Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features

    Authors: Xiaomeng Ye, Ziwei Zhao, David Leake, Xizi Wang, David Crandall

    Abstract: The case difference heuristic (CDH) approach is a knowledge-light method for learning case adaptation knowledge from the case base of a case-based reasoning system. Given a pair of cases, the CDH approach attributes the difference in their solutions to the difference in the problems they solve, and generates adaptation rules to adjust solutions accordingly when a retrieved case and new query have… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: 7 pages, 2 figures, 1 table. To be published in the IJCAI-21 Workshop on Deep Learning, Case-Based Reasoning, and AutoML: Present and Future Synergies

  19. arXiv:2107.01153  [pdf, other

    cs.CV

    A Survey on Deep Learning Technique for Video Segmentation

    Authors: Tianfei Zhou, Fatih Porikli, David Crandall, Luc Van Gool, Wenguan Wang

    Abstract: Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learnin… ▽ More

    Submitted 29 November, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Survey

  20. arXiv:2106.06694  [pdf, other

    cs.CV

    Reverse-engineer the Distributional Structure of Infant Egocentric Views for Training Generalizable Image Classifiers

    Authors: Satoshi Tsutsui, David Crandall, Chen Yu

    Abstract: We analyze egocentric views of attended objects from infants. This paper shows 1) empirical evidence that children's egocentric views have more diverse distributions compared to adults' views, 2) we can computationally simulate the infants' distribution, and 3) the distribution is beneficial for training more generalized image classifiers not only for infant egocentric vision but for third-person… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Accepted to 2021 CVPR Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  21. arXiv:2104.02621  [pdf, other

    cs.AI

    How to Accelerate Capsule Convolutions in Capsule Networks

    Authors: Zhenhua Chen, Xiwen Li, Qian Lou, David Crandall

    Abstract: How to improve the efficiency of routing procedures in CapsNets has been studied a lot. However, the efficiency of capsule convolutions has largely been neglected. Capsule convolution, which uses capsules rather than neurons as the basic computation unit, makes it incompatible with current deep learning frameworks' optimization solution. As a result, capsule convolutions are usually very slow with… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  22. arXiv:2104.01732  [pdf, other

    cs.CV

    Semantically Stealthy Adversarial Attacks against Segmentation Models

    Authors: Zhenhua Chen, Chuhua Wang, David J. Crandall

    Abstract: Segmentation models have been found to be vulnerable to targeted and non-targeted adversarial attacks. However, the resulting segmentation outputs are often so damaged that it is easy to spot an attack. In this paper, we propose semantically stealthy adversarial attacks which can manipulate targeted labels while preserving non-targeted labels at the same time. One challenge is making semantically… ▽ More

    Submitted 7 January, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 4080-4089

  23. Stepwise Goal-Driven Networks for Trajectory Prediction

    Authors: Chuhua Wang, Yuchen Wang, Mingze Xu, David J. Crandall

    Abstract: We propose to predict the future trajectories of observed agents (e.g., pedestrians or vehicles) by estimating and using their goals at multiple time scales. We argue that the goal of a moving agent may change over time, and modeling goals continuously provides more accurate and detailed information for future trajectory estimation. To this end, we present a recurrent network for trajectory predic… ▽ More

    Submitted 27 March, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted By RA-L and ICRA2022

    Journal ref: in IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2716-2723, April 2022

  24. arXiv:2011.11261  [pdf, other

    cs.CV cs.AI cs.LG

    Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning

    Authors: Zehua Zhang, David Crandall

    Abstract: We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it hierarchically to encourage multi-scale understanding. Motivated by their effectiveness in supervised learning, we first introduce spatial-temporal feature learning decoup… ▽ More

    Submitted 31 August, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

  25. arXiv:2011.08900  [pdf, other

    cs.CV

    Whose hand is this? Person Identification from Egocentric Hand Gestures

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: Recognizing people by faces and other biometrics has been extensively studied in computer vision. But these techniques do not work for identifying the wearer of an egocentric (first-person) camera because that person rarely (if ever) appears in their own first-person view. But while one's own face is not frequently visible, their hands are: in fact, hands are among the most common objects in one's… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV) 2021 (First round acceptance)

  26. arXiv:2010.03712  [pdf, other

    cs.CV

    Deep Tiered Image Segmentation For Detecting Internal Ice Layers in Radar Imagery

    Authors: Yuchen Wang, Mingze Xu, John Paden, Lora Koenig, Geoffrey Fox, David Crandall

    Abstract: Understanding the structure of Earth's polar ice sheets is important for modeling how global warming will impact polar ice and, in turn, the Earth's climate. Ground-penetrating radar is able to collect observations of the internal structure of snow and ice, but the process of manually labeling these observations is slow and laborious. Recent work has developed automatic techniques for finding the… ▽ More

    Submitted 6 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: ICME version

  27. arXiv:2007.09314  [pdf, other

    cs.CV

    Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

    Authors: Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, Jiebo Luo

    Abstract: Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustnes… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted by ECCV20

  28. arXiv:2006.02802  [pdf, other

    cs.CV

    A Computational Model of Early Word Learning from the Infant's Point of View

    Authors: Satoshi Tsutsui, Arjun Chandrasekaran, Md Alimoor Reza, David Crandall, Chen Yu

    Abstract: Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences. Researchers in cognitive science and developmental psychology have built formal models that implement in-principle learning algorithms, and then used pre-selected and pre-cleaned datasets to test the abilities of the models to find statistical regularit… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted by Annual Conference of the Cognitive Science Society (CogSci) 2020. (Oral Acceptance Rate = 177/811 = 22%)

  29. arXiv:2004.03044  [pdf, other

    cs.CV

    When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos

    Authors: Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Ella Atkins, David Crandall

    Abstract: Video anomaly detection (VAD) has been extensively studied. However, research on egocentric traffic videos with dynamic scenes lacks large-scale benchmark datasets as well as effective evaluation metrics. This paper proposes traffic anomaly detection with a \textit{when-where-what} pipeline to detect, localize, and recognize anomalous events from egocentric videos. We introduce a new dataset calle… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: 23 pages, 11 figures, 6 tables

  30. arXiv:2004.00060  [pdf, other

    cs.CV

    HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation

    Authors: Bardia Doosti, Shujon Naha, Majid Mirbagheri, David Crandall

    Abstract: Hand-object pose estimation (HOPE) aims to jointly detect the poses of both a hand and of a held object. In this paper, we propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time. Our network uses a cascade of two adaptive graph convolutional neural networks, one to estimate 2D coordinates of the hand joints and object corners, followed by… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  31. arXiv:2003.06045  [pdf, other

    cs.CV cs.RO

    Interaction Graphs for Object Importance Estimation in On-road Driving Videos

    Authors: Zehua Zhang, Ashish Tawari, Sujitha Martin, David Crandall

    Abstract: A vehicle driving along the road is surrounded by many objects, but only a small subset of them influence the driver's decisions and actions. Learning to estimate the importance of each object on the driver's real-time decision-making may help better understand human driving behavior and lead to more reliable autonomous driving systems. Solving this problem requires models that understand the inte… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

    Comments: Accepted by ICRA 2020

  32. arXiv:2003.05020  [pdf, other

    cs.CV

    Learning Video Object Segmentation from Unlabeled Videos

    Authors: Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi

    Abstract: We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities. Our approach can help advance… ▽ More

    Submitted 10 March, 2020; originally announced March 2020.

    Comments: Accepted to CVPR 2020. Code: https://github.com/carrierlxk/MuG

  33. arXiv:2001.06807  [pdf, other

    cs.CV

    Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

    Authors: Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao

    Abstract: This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as a process of iterative information fusion over video graphs. Specifically, AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. The underlying pair-wise relations are d… ▽ More

    Submitted 19 January, 2020; originally announced January 2020.

    Comments: ICCV2019(Oral). Website: https://github.com/carrierlxk/AGNN

    Journal ref: ICCV2019(Oral)

  34. arXiv:1912.08367  [pdf, other

    cs.CV

    P-CapsNets: a General Form of Convolutional Neural Networks

    Authors: Zhenhua Chen, Xiwen Li, Chuhua Wang, David Crandall

    Abstract: We propose Pure CapsNets (P-CapsNets) which is a generation of normal CNNs structurally. Specifically, we make three modifications to current CapsNets. First, we remove routing procedures from CapsNets based on the observation that the coupling coefficients can be learned implicitly. Second, we replace the convolutional layers in CapsNets to improve efficiency. Third, we package the capsules into… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

  35. arXiv:1911.07164  [pdf, other

    cs.CV

    Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: One-shot fine-grained visual recognition often suffers from the problem of training data scarcity for new fine-grained classes. To alleviate this problem, an off-the-shelf image generator can be applied to synthesize additional training images, but these synthesized images are often not helpful for actually improving the accuracy of one-shot fine-grained recognition. This paper proposes a meta-lea… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

    Comments: Accepted by Conference on Neural Information Processing System 2019

  36. arXiv:1910.14260  [pdf, other

    cs.CV

    A Self Validation Network for Object-Level Human Attention Estimation

    Authors: Zehua Zhang, Chen Yu, David Crandall

    Abstract: Due to the foveated nature of the human vision system, people can focus their visual attention on a small region of their visual field at a time, which usually contains only a single object. Estimating this object of attention in first-person (egocentric) videos is useful for many human-centered real-world applications such as augmented reality applications and driver assistance systems. A straigh… ▽ More

    Submitted 13 December, 2019; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: Accepted by NeurIPS 2019

  37. arXiv:1906.01415  [pdf

    cs.CV

    Active Object Manipulation Facilitates Visual Object Learning: An Egocentric Vision Study

    Authors: Satoshi Tsutsui, Dian Zhi, Md Alimoor Reza, David Crandall, Chen Yu

    Abstract: Inspired by the remarkable ability of the infant visual learning system, a recent study collected first-person images from children to analyze the `training data' that they receive. We conduct a follow-up study that investigates two additional directions. First, given that infants can quickly learn to recognize a new object without much supervision (i.e. few-shot learning), we limit the number of… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted at 2019 CVPR Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  38. arXiv:1904.06117  [pdf

    cs.HC

    Conveying Situational Information to People with Visual Impairments

    Authors: Tousif Ahmed, Rakibul Hasan, Kay Connelly, David Crandall, Apu Kapadia

    Abstract: Knowing who is in one's vicinity is key to managing privacy in everyday environments, but is challenging for people with visual impairments. Wearable cameras and other sensors may be able to detect such information, but how should this complex visually-derived information be conveyed in a way that is discreet, intuitive, and unobtrusive? Motivated by previous studies on the specific information th… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: Presented at the CHI'19 Workshop: Addressing the Challenges of Situationally-Induced Impairments and Disabilities in Mobile Interaction, 2019 (arXiv:1904.05382)

    Report number: SIID/2019/no01

  39. arXiv:1904.04404  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Embodied Visual Recognition

    Authors: Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra

    Abstract: Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing angle to better understand object shapes and semantics. In this work, we introduce the task of Embodied Visual Recognition (EVR): An agent is instantiated in a 3D… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: 14 pages, 13 figures, technical report

  40. arXiv:1903.00618  [pdf, other

    cs.CV

    Unsupervised Traffic Accident Detection in First-Person Videos

    Authors: Yu Yao, Mingze Xu, Yuchen Wang, David J. Crandall, Ella M. Atkins

    Abstract: Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous driving and advanced driver assistance systems. However, most work on video anomaly detection suffers from two crucial drawbacks. First, they assume cameras are fixed and videos have static backgrounds, which is reasonable for surveillance applications but not for v… ▽ More

    Submitted 25 July, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

    Comments: Accepted to IROS 2019

  41. arXiv:1812.00479  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation using Generative Models and Self-ensembling

    Authors: Eman T. Hassan, Xin Chen, David Crandall

    Abstract: Transferring knowledge across different datasets is an important approach to successfully train deep models with a small-scale target dataset or when few labeled instances are available. In this paper, we aim at develo** a model that can generalize across multiple domain shifts, so that this model can adapt from a single source to multiple targets. This can be achieved by randomizing the generat… ▽ More

    Submitted 2 December, 2018; originally announced December 2018.

  42. arXiv:1811.07391  [pdf, other

    cs.CV

    Temporal Recurrent Networks for Online Action Detection

    Authors: Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall

    Abstract: Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed. However, important real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this pape… ▽ More

    Submitted 23 March, 2019; v1 submitted 18 November, 2018; originally announced November 2018.

  43. arXiv:1809.07408  [pdf, other

    cs.CV cs.RO

    Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

    Authors: Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, Behzad Dariush

    Abstract: Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-dec… ▽ More

    Submitted 3 March, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

    Comments: To appear on ICRA 2019

  44. arXiv:1808.08692  [pdf, other

    cs.CV

    Generalized Capsule Networks with Trainable Routing Procedure

    Authors: Zhenhua Chen, David Crandall

    Abstract: CapsNet (Capsule Network) was first proposed by~\citet{capsule} and later another version of CapsNet was proposed by~\citet{emrouting}. CapsNet has been proved effective in modeling spatial features with much fewer parameters. However, the routing procedures in both papers are not well incorporated into the whole training process. The optimal number of routing procedure is misery which has to be f… ▽ More

    Submitted 27 August, 2018; originally announced August 2018.

  45. arXiv:1803.11217  [pdf, other

    cs.CV

    Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos

    Authors: Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, David J Crandall

    Abstract: In a world of pervasive cameras, public spaces are often captured from multiple perspectives by cameras of different types, both fixed and mobile. An important problem is to organize these heterogeneous collections of videos by finding connections between them, such as identifying correspondences between the people appearing in the videos and the people holding or wearing the cameras. In this pape… ▽ More

    Submitted 25 July, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: To appear in ECCV 2018

  46. arXiv:1802.07845   

    cs.CV

    Detecting Small, Densely Distributed Objects with Filter-Amplifier Networks and Loss Boosting

    Authors: Zhenhua Chen, David Crandall, Robert Templeman

    Abstract: Detecting small, densely distributed objects is a significant challenge: small objects often contain less distinctive information compared to larger ones, and finer-grained precision of bounding box boundaries are required. In this paper, we propose two techniques for addressing this problem. First, we estimate the likelihood that each pixel belongs to an object boundary rather than predicting coo… ▽ More

    Submitted 7 May, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: rejected by a conference

  47. arXiv:1801.03986  [pdf, other

    cs.CV

    Multi-Task Spatiotemporal Neural Networks for Structured Surface Reconstruction

    Authors: Mingze Xu, Chenyou Fan, John D Paden, Geoffrey C Fox, David J Crandall

    Abstract: Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less clear how well these techniques may apply on structured prediction problems where fine-grained output with high precision is required, such as in scien… ▽ More

    Submitted 20 July, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 10 pages, 7 figures, published in WACV 2018

  48. arXiv:1801.03983  [pdf, other

    cs.CV

    Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

    Authors: Mingze Xu, Aidean Sharghi, Xin Chen, David J Crandall

    Abstract: A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled two-stream spatiotemporal architecture for reliable… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

    Comments: 9 pagers, 5 figures, published in WACV 2018

  49. arXiv:1712.07758  [pdf, ps, other

    cs.CV

    Automatic Estimation of Ice Bottom Surfaces from Radar Imagery

    Authors: Mingze Xu, David J Crandall, Geoffrey C Fox, John D Paden

    Abstract: Ground-penetrating radar on planes and satellites now makes it practical to collect 3D observations of the subsurface structure of the polar ice sheets, providing crucial data for understanding and tracking global climate change. But converting these noisy readings into useful observations is generally done by hand, which is impractical at a continental scale. In this paper, we propose a computer… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

    Comments: 5 pages, 3 figures, published in ICIP 2017

  50. arXiv:1711.05998  [pdf, other

    cs.CV

    Minimizing Supervision for Free-space Segmentation

    Authors: Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall

    Abstract: Identifying "free-space," or safely driveable regions in the scene ahead, is a fundamental task for autonomous navigation. While this task can be addressed using semantic segmentation, the manual labor involved in creating pixelwise annotations to train the segmentation model is very costly. Although weakly supervised segmentation addresses this issue, most methods are not designed for free-space.… ▽ More

    Submitted 8 December, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: Link to source code added; Typo fixed from the version published in CVPR 2018 Workshop on Autonomous Driving (WAD)