Skip to main content

Showing 1–40 of 40 results for author: Furnari, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08379  [pdf, other

    cs.CV

    Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

    Authors: Michele Mazzamuto, Antonino Furnari, Giovanni Maria Farinella

    Abstract: In this paper, we address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals, a critical component for advancing user assistance in smart glasses. Traditional supervised methods, reliant on manually labeled mistakes, suffer from domain-dependence and scalability issues. This research introduces an unsupervised method for detecting mistakes in v… ▽ More

    Submitted 17 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.01486  [pdf, other

    cs.CV

    Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

    Authors: Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Procedural activities are sequences of key-steps aimed at achieving specific goals. They are crucial to build intelligent agents able to assist users effectively. In this context, task graphs have emerged as a human-understandable representation of procedural activities, encoding a partial ordering over the key-steps. While previous works generally relied on hand-crafted procedures to extract task… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2406.01194  [pdf, other

    cs.CV

    AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

    Authors: Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise an… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2404.01933  [pdf, other

    cs.CV

    PREGO: online mistake detection in PRocedural EGOcentric videos

    Authors: Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso

    Abstract: Promptly identifying procedural errors from egocentric videos in an online setting is highly challenging and valuable for detecting mistakes as soon as they happen. This capability has a wide range of applications across various fields, such as manufacturing and healthcare. The nature of procedural mistakes is open-set since novel types of failures might occur, which calls for one-class classifier… ▽ More

    Submitted 17 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  5. arXiv:2312.03391  [pdf, other

    cs.CV

    Action Scene Graphs for Long-Form Understanding of Egocentric Videos

    Authors: Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella

    Abstract: We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how a… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  6. arXiv:2312.02672  [pdf, other

    cs.CV

    Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

    Authors: Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella

    Abstract: In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10%… ▽ More

    Submitted 14 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  7. arXiv:2312.02638  [pdf, other

    cs.CV

    Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs

    Authors: Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella

    Abstract: We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodolog… ▽ More

    Submitted 14 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  8. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  9. arXiv:2309.14809  [pdf, other

    cs.CV

    ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

    Authors: Francesco Ragusa, Rosario Leonardi, Michele Mazzamuto, Claudia Bonanno, Rosario Scavo, Antonino Furnari, Giovanni Maria Farinella

    Abstract: ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial do… ▽ More

    Submitted 27 November, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  10. arXiv:2308.07123  [pdf, other

    cs.CV

    An Outlook into the Future of Egocentric Vision

    Authors: Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi

    Abstract: What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through e… ▽ More

    Submitted 7 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1w

  11. Streaming egocentric action anticipation: An evaluation scheme and approach

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation aims to predict the future actions the camera wearer will perform from the observation of the past. While predictions about the future should be available before the predicted events take place, most approaches do not pay attention to the computational time required to make such predictions. As a result, current evaluation schemes assume that predictions are availabl… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Published in Computer Vision and Image Understanding, 2023. arXiv admin note: text overlap with arXiv:2110.05386

  12. arXiv:2306.12152  [pdf, other

    cs.CV

    Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

    Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new mu… ▽ More

    Submitted 11 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  13. arXiv:2304.03959  [pdf, other

    cs.CV

    StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation

    Authors: Francesco Ragusa, Giovanni Maria Farinella, Antonino Furnari

    Abstract: Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneo… ▽ More

    Submitted 18 March, 2024; v1 submitted 8 April, 2023; originally announced April 2023.

  14. A Multi Camera Unsupervised Domain Adaptation Pipeline for Object Detection in Cultural Sites through Adversarial Learning and Self-Training

    Authors: Giovanni Pasqualino, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Object detection algorithms allow to enable many interesting applications which can be implemented in different devices, such as smartphones and wearable devices. In the context of a cultural site, implementing these algorithms in a wearable device, such as a pair of smart glasses, allow to enable the use of augmented reality (AR) to show extra information about the artworks and enrich the visitor… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  15. Visual Object Tracking in First Person Vision

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:2108.13665

  16. MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

    Authors: Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.05654

    Journal ref: Computer Vision and Image Understanding 2023

  17. arXiv:2204.07090  [pdf, other

    cs.CV

    Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

    Authors: Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  18. arXiv:2204.07069  [pdf, other

    cs.CV

    Panoptic Segmentation using Synthetic and Real Data

    Authors: Camillo Quattrocchi, Daniele Di Mauro, Antonino Furnari, Giovanni Maria Farinella

    Abstract: Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  19. arXiv:2204.07061  [pdf, other

    cs.CV

    Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

    Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

    Abstract: We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection,… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  20. arXiv:2202.04132  [pdf, other

    cs.CV

    Untrimmed Action Anticipation

    Authors: Ivan Rodin, Antonino Furnari, Dimitrios Mavroeidis, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation consists in predicting a future action the camera wearer will perform from egocentric video. While the task has recently attracted the attention of the research community, current approaches assume that the input videos are "trimmed", meaning that a short video sequence is sampled a fixed time before the beginning of the action. We argue that, despite the recent adva… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  21. arXiv:2202.01069  [pdf, other

    cs.RO cs.CV

    Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

    Authors: Marco Rosano, Antonino Furnari, Luigi Gulino, Corrado Santoro, Giovanni Maria Farinella

    Abstract: Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunate… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Paper accepted for submission in Autonomous Robots

  22. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  23. arXiv:2110.05386  [pdf, other

    cs.CV

    Towards Streaming Egocentric Action Anticipation

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation is the task of predicting the future actions a camera wearer will likely perform based on past video observations. While in a real-world system it is fundamental to output such predictions before the action begins, past works have not generally paid attention to model runtime during evaluation. Indeed, current evaluation schemes assume that predictions can be made of… ▽ More

    Submitted 10 May, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to the 26th International Conference on Pattern Recognition (ICPR 2022)

  24. Is First Person Vision Challenging for Object Tracking?

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects a… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV) 2021, Visual Object Tracking Challenge VOT2021 workshop. arXiv admin note: text overlap with arXiv:2011.12263

  25. arXiv:2107.13411  [pdf, other

    cs.CV

    Predicting the Future from First Person (Egocentric) Vision: A Survey

    Authors: Ivan Rodin, Antonino Furnari, Dimitrios Mavroedis, Giovanni Maria Farinella

    Abstract: Egocentric videos can bring a lot of information about how humans perceive the world and interact with the environment, which can be beneficial for the analysis of human behaviour. The research in egocentric video analysis is develo** rapidly thanks to the increasing availability of wearable devices and the opportunities offered by new large-scale egocentric datasets. As computer vision techniqu… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: Computer Vision and Image Understanding, 2021

  26. arXiv:2106.11650  [pdf, other

    cs.RO cs.CV

    A Survey on Human-aware Robot Navigation

    Authors: Ronja Möller, Antonino Furnari, Sebastiano Battiato, Aki Härmä, Giovanni Maria Farinella

    Abstract: Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainmen… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Robotics and Autonomous Systems, 2021

  27. arXiv:2011.12263  [pdf, other

    cs.CV

    Is First Person Vision Challenging for Object Tracking?

    Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

    Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art visual trackers in this domain is still… ▽ More

    Submitted 24 September, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Extended Abstract accepted by the EPIC workshop at ICCV 2021. The full version of this paper is available at arXiv:2108.13665

  28. arXiv:2010.13439  [pdf, other

    cs.RO cs.CV cs.LG

    On Embodied Visual Navigation in Real Environments Through Habitat

    Authors: Marco Rosano, Antonino Furnari, Luigi Gulino, Giovanni Maria Farinella

    Abstract: Visual navigation models based on deep learning can learn effective policies when trained on large amounts of visual observations through reinforcement learning. Unfortunately, collecting the required experience in the real world requires the deployment of a robotic platform, which is expensive and time-consuming. To deal with this limitation, several simulation platforms have been proposed in ord… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Published in International Conference on Pattern Recognition (ICPR), 2020

  29. arXiv:2010.05654  [pdf, other

    cs.CV

    The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

    Authors: Francesco Ragusa, Antonino Furnari, Salvatore Livatino, Giovanni Maria Farinella

    Abstract: Wearable cameras allow to collect images and videos of humans interacting with the world. While human-object interactions have been thoroughly investigated in third person vision, the problem has been understudied in egocentric settings and in industrial scenarios. To fill this gap, we introduce MECCANO, the first dataset of egocentric videos to study human-object interactions in industrial-like s… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  30. arXiv:2008.01882  [pdf, other

    cs.CV

    An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites

    Authors: Giovanni Pasqualino, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in o… ▽ More

    Submitted 21 December, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

  31. Rescaling Egocentric Vision

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a nov… ▽ More

    Submitted 17 September, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Accepted at the International Journal of Computer Vision (IJCV). Dataset available from: http://epic-kitchens.github.io/

  32. SceneAdapt: Scene-based domain adaptation for semantic segmentation using adversarial learning

    Authors: Daniele Di Mauro, Antonino Furnari, Giuseppe Patanè, Sebastiano Battiato, Giovanni Maria Farinella

    Abstract: Semantic segmentation methods have achieved outstanding performance thanks to deep learning. Nevertheless, when such algorithms are deployed to new contexts not seen during training, it is necessary to collect and label scene-specific data in order to adapt them to the new domain using fine-tuning. This process is required whenever an already installed camera is moved or a new camera is introduced… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Journal ref: Pattern Recognition Letters, Volume 136, August 2020, Pages 175-182

  33. Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: In this paper, we tackle the problem of egocentric action anticipation, i.e., predicting what actions the camera wearer will perform in the near future and which objects they will interact with. Specifically, we contribute Rolling-Unrolling LSTM, a learning architecture to anticipate actions from egocentric videos. The method is based on three components: 1) an architecture comprised of two LSTMs… ▽ More

    Submitted 8 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.09035

    Journal ref: Published in IEEE Transaction on Pattern Analysis and Machine Interaction, 2020

  34. arXiv:2005.00343  [pdf, other

    cs.CV

    The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions.… ▽ More

    Submitted 29 April, 2020; originally announced May 2020.

    Comments: Preprint for paper at IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1804.02748

  35. arXiv:2004.07711  [pdf, other

    cs.CV cs.LG

    Knowledge Distillation for Action Anticipation via Label Smoothing

    Authors: Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria Farinella, Lamberto Ballan

    Abstract: Human capability to anticipate near future from visual observations and non-verbal cues is essential for develo** intelligent systems that need to interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving need to foresee future events to avoid crashes or help people. Egocentric scenarios are classic examples where action anticipati… ▽ More

    Submitted 18 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted to ICPR 2020

  36. EGO-CH: Dataset and Fundamental Tasks for Visitors BehavioralUnderstanding using Egocentric Vision

    Authors: Francesco Ragusa, Antonino Furnari, Sebastiano Battiato, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: Equip** visitors of a cultural site with a wearable device allows to easily collect information about their preferences which can be exploited to improve the fruition of cultural goods with augmented reality. Moreover, egocentric video can be processed using computer vision and machine learning to enable an automated analysis of visitors' behavior. The inferred information can be used both onlin… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Journal ref: Pattern Recognition Letters 2020

  37. arXiv:1905.09035  [pdf, other

    cs.CV cs.AI

    What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

    Authors: Antonino Furnari, Giovanni Maria Farinella

    Abstract: Egocentric action anticipation consists in understanding which objects the camera wearer will interact with in the near future and which actions they will perform. We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future. The input video is processed considering thr… ▽ More

    Submitted 5 August, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: Accepted as oral to ICCV [International Conference on Computer Vision] 2019

  38. arXiv:1904.05264  [pdf, other

    cs.CV

    Egocentric Visitors Localization in Cultural Sites

    Authors: Francesco Ragusa, Antonino Furnari, Sebastiano Battiato, Giovanni Signorello, Giovanni Maria Farinella

    Abstract: We consider the problem of localizing visitors in a cultural site from egocentric (first person) images. Localization information can be useful both to assist the user during his visit (e.g., by suggesting where to go and what to see next) and to provide behavioral information to the manager of the cultural site (e.g., how much time has been spent by visitors at a given location? What has been lik… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: To appear in ACM Journal on Computing and Cultural Heritage (JOCCH), 2019

    Journal ref: ACM Journal on Computing and Cultural Heritage (JOCCH), 2019

  39. Next-Active-Object prediction from Egocentric Videos

    Authors: Antonino Furnari, Sebastiano Battiato, Kristen Grauman, Giovanni Maria Farinella

    Abstract: Although First Person Vision systems can sense the environment from the user's perspective, they are generally unable to predict his intentions and goals. Since human activities can be decomposed in terms of atomic actions and interactions with objects, intelligent wearable systems would benefit from the ability to anticipate user-object interactions. Even if this task is not trivial, the First Pe… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Journal ref: Journal of Visual Communication and Image Representation, Volume 49, 2017, Pages 401-411, ISSN 1047-3203

  40. arXiv:1804.02748  [pdf, other

    cs.CV

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen… ▽ More

    Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.io