Search | arXiv e-print repository

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

Authors: Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella

Abstract: In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10%… ▽ More In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data of: +5.67% on EPIC-KITCHENS VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Our analysis is supported by a novel data generation pipeline and the newly introduced HOI-Synth benchmark which augments existing datasets with synthetic images of hand-object interactions automatically labeled with hand-object contact states, bounding boxes, and pixel-wise segmentation masks. We publicly release the generated data, code, and data generation tools to support future research at the following link: https://iplab.dmi.unict.it/HOI-Synth/. △ Less

Submitted 14 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2309.14809 [pdf, other]

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

Authors: Francesco Ragusa, Rosario Leonardi, Michele Mazzamuto, Claudia Bonanno, Rosario Scavo, Antonino Furnari, Giovanni Maria Farinella

Abstract: ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial do… ▽ More ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial domain. We provide benchmarks on four tasks related to human behavior: 1) untrimmed temporal detection of human-object interactions, 2) egocentric human-object interaction detection, 3) short-term object interaction anticipation and 4) natural language understanding of intents and entities. Baseline results show that the ENIGMA-51 dataset poses a challenging benchmark to study human behavior in industrial scenarios. We publicly release the dataset at https://iplab.dmi.unict.it/ENIGMA-51. △ Less

Submitted 27 November, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2306.12152 [pdf, other]

Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

Abstract: In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new mu… ▽ More In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility and effectiveness of synthetic EHOI data produced by the proposed tool, we designed a new method that predicts and combines different multimodal signals to detect EHOIs in RGB images. Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data. Moreover, to fully understand the usefulness of our method, we conducted an in-depth analysis in which we compared and highlighted the superiority of the proposed approach over different state-of-the-art class-agnostic methods. To support research in this field, we publicly release the datasets, source code, and pre-trained models at https://iplab.dmi.unict.it/egoism-hoi. △ Less

Submitted 11 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2204.07061 [pdf, other]

Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

Abstract: We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection,… ▽ More We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection, we propose a method that detects the hands, the objects in the scene, and determines which objects are currently involved in an interaction. We compare the performance of our method with a set of state-of-the-art baselines. Results show that using a synthetic dataset improves the performance of an EHOI detection system, especially when few real data are available. To encourage research on this topic, we publicly release the proposed dataset at the following url: https://iplab.dmi.unict.it/EHOI_SYNTH/. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2006.04603 [pdf, other]

doi 10.1016/j.media.2021.102046

BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset

Authors: Alberto Signoroni, Mattia Savardi, Sergio Benini, Nicola Adami, Riccardo Leonardi, Paolo Gibellini, Filippo Vaccher, Marco Ravanelli, Andrea Borghesi, Roberto Maroldi, Davide Farina

Abstract: In this work we design an end-to-end deep learning architecture for predicting, on Chest X-rays images (CXR), a multi-regional score conveying the degree of lung compromise in COVID-19 patients. Such semi-quantitative scoring system, namely Brixia~score, is applied in serial monitoring of such patients, showing significant prognostic value, in one of the hospitals that experienced one of the highe… ▽ More In this work we design an end-to-end deep learning architecture for predicting, on Chest X-rays images (CXR), a multi-regional score conveying the degree of lung compromise in COVID-19 patients. Such semi-quantitative scoring system, namely Brixia~score, is applied in serial monitoring of such patients, showing significant prognostic value, in one of the hospitals that experienced one of the highest pandemic peaks in Italy. To solve such a challenging visual task, we adopt a weakly supervised learning strategy structured to handle different tasks (segmentation, spatial alignment, and score estimation) trained with a "from-the-part-to-the-whole" procedure involving different datasets. In particular, we exploit a clinical dataset of almost 5,000 CXR annotated images collected in the same hospital. Our BS-Net demonstrates self-attentive behavior and a high degree of accuracy in all processing stages. Through inter-rater agreement tests and a gold standard comparison, we show that our solution outperforms single human annotators in rating accuracy and consistency, thus supporting the possibility of using this tool in contexts of computer-assisted monitoring. Highly resolved (super-pixel level) explainability maps are also generated, with an original technique, to visually help the understanding of the network activity on the lung areas. We also consider other scores proposed in literature and provide a comparison with a recently proposed non-specific approach. We eventually test the performance robustness of our model on an assorted public COVID-19 dataset, for which we also provide Brixia~score annotations, observing good direct generalization and fine-tuning capabilities that highlight the portability of BS-Net in other clinical settings. The CXR dataset along with the source code and the trained model are publicly released for research purposes. △ Less

Submitted 3 April, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 28 pages, 11 figures, preprint of accepted paper to Medical Image Analysis, Project page with Code and Dataset Available at https://brixia.github.io/

MSC Class: 68T45 ACM Class: I.2.10; I.5; I.4; J.3

arXiv:1901.03547 [pdf, other]

Feature Fusion for Robust Patch Matching With Compact Binary Descriptors

Authors: Andrea Migliorati, Attilio Fiandrotti, Gianluca Francini, Skjalg Lepsoy, Riccardo Leonardi

Abstract: This work addresses the problem of learning compact yet discriminative patch descriptors within a deep learning framework. We observe that features extracted by convolutional layers in the pixel domain are largely complementary to features extracted in a transformed domain. We propose a convolutional network framework for learning binary patch descriptors where pixel domain features are fused with… ▽ More This work addresses the problem of learning compact yet discriminative patch descriptors within a deep learning framework. We observe that features extracted by convolutional layers in the pixel domain are largely complementary to features extracted in a transformed domain. We propose a convolutional network framework for learning binary patch descriptors where pixel domain features are fused with features extracted from the transformed domain. In our framework, while convolutional and transformed features are distinctly extracted, they are fused and provided to a single classifier which thus jointly operates on convolutional and transformed features. We experiment at matching patches from three different datasets, showing that our feature fusion approach outperforms multiple state-of-the-art approaches in terms of accuracy, rate, and complexity. △ Less

Submitted 11 January, 2019; originally announced January 2019.

Comments: MMSP 2018 - IEEE 20th International Workshop on Multimedia Signal Processing - August 29-31 2018, Vancouver, Canada

arXiv:1001.4737 [pdf, other]

doi 10.1088/1748-0221/4/12/T12018

Optimization of Planck/LFI on--board data handling

Authors: M. Maris, M. Tomasi, S. Galeotta, M. Miccolis, S. Hildebrandt, M. Frailis, R. Rohlfs, N. Morisset, A. Zacchei, M. Bersanelli, P. Binko, C. Burigana, R. C. Butler, F. Cuttaia, H. Chulani, O. D'Arcangelo, S. Fogliani, E. Franceschi, F. Gasparo, F. Gomez, A. Gregorio, J. M. Herreros, R. Leonardi, P. Leutenegger, G. Maggio , et al. (12 additional authors not shown)

Abstract: To asses stability against 1/f noise, the Low Frequency Instrument (LFI) onboard the Planck mission will acquire data at a rate much higher than the data rate allowed by its telemetry bandwith of 35.5 kbps. The data are processed by an onboard pipeline, followed onground by a reversing step. This paper illustrates the LFI scientific onboard processing to fit the allowed datarate. This is a lossy… ▽ More To asses stability against 1/f noise, the Low Frequency Instrument (LFI) onboard the Planck mission will acquire data at a rate much higher than the data rate allowed by its telemetry bandwith of 35.5 kbps. The data are processed by an onboard pipeline, followed onground by a reversing step. This paper illustrates the LFI scientific onboard processing to fit the allowed datarate. This is a lossy process tuned by using a set of 5 parameters Naver, r1, r2, q, O for each of the 44 LFI detectors. The paper quantifies the level of distortion introduced by the onboard processing, EpsilonQ, as a function of these parameters. It describes the method of optimizing the onboard processing chain. The tuning procedure is based on a optimization algorithm applied to unprocessed and uncompressed raw data provided either by simulations, prelaunch tests or data taken from LFI operating in diagnostic mode. All the needed optimization steps are performed by an automated tool, OCA2, which ends with optimized parameters and produces a set of statistical indicators, among them the compression rate Cr and EpsilonQ. For Planck/LFI the requirements are Cr = 2.4 and EpsilonQ <= 10% of the rms of the instrumental white noise. To speedup the process an analytical model is developed that is able to extract most of the relevant information on EpsilonQ and Cr as a function of the signal statistics and the processing parameters. This model will be of interest for the instrument data analysis. The method was applied during ground tests when the instrument was operating in conditions representative of flight. Optimized parameters were obtained and the performance has been verified, the required data rate of 35.5 Kbps has been achieved while kee** EpsilonQ at a level of 3.8% of white noise rms well within the requirements. △ Less

Submitted 26 January, 2010; originally announced January 2010.

Comments: 51 pages, 13 fig.s, 3 tables, pdflatex, needs JINST.csl, graphicx, txfonts, rotating; Issue 1.0 10 nov 2009; Sub. to JINST 23Jun09, Accepted 10Nov09, Pub.: 29Dec09; This is a preprint, not the final version

Journal ref: 2009 JINST 4 T12018

arXiv:0809.1043 [pdf, ps, other]

doi 10.1109/TIT.2008.929941

On Unique Decodability

Authors: Marco Dalai, Riccardo Leonardi

Abstract: In this paper we propose a revisitation of the topic of unique decodability and of some fundamental theorems of lossless coding. It is widely believed that, for any discrete source X, every "uniquely decodable" block code satisfies E[l(X_1 X_2 ... X_n)]>= H(X_1,X_2,...,X_n), where X_1, X_2,...,X_n are the first n symbols of the source, E[l(X_1 X_2 ... X_n)] is the expected length of the code for… ▽ More In this paper we propose a revisitation of the topic of unique decodability and of some fundamental theorems of lossless coding. It is widely believed that, for any discrete source X, every "uniquely decodable" block code satisfies E[l(X_1 X_2 ... X_n)]>= H(X_1,X_2,...,X_n), where X_1, X_2,...,X_n are the first n symbols of the source, E[l(X_1 X_2 ... X_n)] is the expected length of the code for those symbols and H(X_1,X_2,...,X_n) is their joint entropy. We show that, for certain sources with memory, the above inequality only holds when a limiting definition of "uniquely decodable code" is considered. In particular, the above inequality is usually assumed to hold for any "practical code" due to a debatable application of McMillan's theorem to sources with memory. We thus propose a clarification of the topic, also providing an extended version of McMillan's theorem to be used for Markovian sources. △ Less

Submitted 5 September, 2008; originally announced September 2008.

Comments: Accepted for publication, IEEE Transactions on Information Theory

MSC Class: 94A45; 94A29; 94A15

arXiv:cs/0506036 [pdf, ps, other]

Non prefix-free codes for constrained sequences

Authors: Marco Dalai, Riccardo Leonardi

Abstract: In this paper we consider the use of variable length non prefix-free codes for coding constrained sequences of symbols. We suppose to have a Markov source where some state transitions are impossible, i.e. the stochastic matrix associated with the Markov chain has some null entries. We show that classic Kraft inequality is not a necessary condition, in general, for unique decodability under the a… ▽ More In this paper we consider the use of variable length non prefix-free codes for coding constrained sequences of symbols. We suppose to have a Markov source where some state transitions are impossible, i.e. the stochastic matrix associated with the Markov chain has some null entries. We show that classic Kraft inequality is not a necessary condition, in general, for unique decodability under the above hypothesis and we propose a relaxed necessary inequality condition. This allows, in some cases, the use of non prefix-free codes that can give very good performance, both in terms of compression and computational efficiency. Some considerations are made on the relation between the proposed approach and other existing coding paradigms. △ Less

Submitted 10 June, 2005; originally announced June 2005.

Comments: 5 pages, 3 figures. To be presented at the 2005 IEEE International Symposium on Information Theory

ACM Class: E.4; H.1.1

Showing 1–9 of 9 results for author: Leonardi, R