Skip to main content

Showing 1–20 of 20 results for author: Gangi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2108.11801  [pdf, other

    cs.CV

    Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room

    Authors: Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

    Abstract: The fine-grained localization of clinicians in the operating room (OR) is a key component to design the new generation of OR support systems. Computer vision models for person pixel-based segmentation and body-keypoints detection are needed to better understand the clinical activities and the spatial layout of the OR. This is challenging, not only because OR images are very different from traditio… ▽ More

    Submitted 30 June, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted at Elsevier Journal of Medical Image Analysis. Code is available at https://github.com/CAMMA-public/HPE-AdaptOR. Supplementary video is available at https://youtu.be/gqwPu9-nfGs

  2. arXiv:2012.04964  [pdf, ps, other

    cs.CL

    On Knowledge Distillation for Direct Speech Translation

    Authors: Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

    Abstract: Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyz… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Accepted at CLiC-IT 2020

  3. arXiv:2009.04707  [pdf, other

    cs.CL

    On Target Segmentation for Direct Speech Translation

    Authors: Mattia Antonino Di Gangi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Recent studies on direct speech translation show continuous improvements by means of data augmentation techniques and bigger deep learning models. While these methods are hel** to close the gap between this new approach and the more traditional cascaded one, there are many incongruities among different studies that make it difficult to assess the state of the art. Surprisingly, one point of disc… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: 14 pages single column, 4 figures, accepted for presentation at the AMTA2020 research track

  4. arXiv:2008.02270  [pdf, other

    cs.CL

    Contextualized Translation of Automatically Segmented Speech

    Authors: Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi

    Abstract: Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sen… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: Interspeech 2020

  5. Self-supervision on Unlabelled OR Data for Multi-person 2D/3D Human Pose Estimation

    Authors: Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

    Abstract: 2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room that can analyze and support the clinical activities. The lack of annotated data and the complexity of state-of-the-art pose estimation approaches limit, however, the deployment of such techniques inside the OR. In this work, we propose to use knowledge distillation in a teacher/student framework to har… ▽ More

    Submitted 20 August, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at MICCAI 2020. Code is available at https://github.com/CAMMA-public/ORPose-Color

    Journal ref: Springer (2020) LNCS, volume 12261

  6. Human Pose Estimation on Privacy-Preserving Low-Resolution Depth Images

    Authors: Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

    Abstract: Human pose estimation (HPE) is a key building block for develo** AI-based context-aware systems inside the operating room (OR). The 24/7 use of images coming from cameras mounted on the OR ceiling can however raise concerns for privacy, even in the case of depth images captured by RGB-D sensors. Being able to solely use low-resolution privacy-preserving images would address these concerns and he… ▽ More

    Submitted 20 August, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at MICCAI-2019. Code is available at https://github.com/CAMMA-public/ORPose-depth

    Journal ref: Springer (2019) 583-591

  7. arXiv:2006.05754  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

    Authors: Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi

    Abstract: Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained b… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 9 pages of content, accepted at ACL 2020

  8. arXiv:2006.02965  [pdf, other

    cs.CL cs.SD eess.AS

    End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

    Authors: Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

    Abstract: This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted at IWSLT2020

  9. arXiv:1910.10663  [pdf, ps, other

    cs.CL eess.AS

    Instance-Based Model Adaptation For Direct Speech Translation

    Authors: Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi

    Abstract: Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora. We tackle this limitation with a method to improve data exploitation and boost the system's performance at inference time. Our approach allows us to customize "on the fly" an existing model to each incoming t… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 6 pages, under review at ICASSP 2020

  10. arXiv:1910.10238  [pdf, ps, other

    cs.CL cs.LG

    Robust Neural Machine Translation for Clean and Noisy Speech Transcripts

    Authors: Mattia Antonino Di Gangi, Robert Enyedi, Alessandra Brusadin, Marcello Federico

    Abstract: Neural machine translation models have shown to achieve high quality when trained and fed with well structured and punctuated input texts. Unfortunately, the latter condition is not met in spoken language translation, where the input is generated by an automatic speech recognition (ASR) system. In this paper, we study how to adapt a strong NMT system to make it robust to typical ASR errors. As in… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 6 pages, accepted at IWSLT 2019

  11. arXiv:1910.06753  [pdf, other

    cs.CL

    On the Importance of Word Boundaries in Character-level Neural Machine Translation

    Authors: Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico, Alexandra Birch

    Abstract: Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not opt… ▽ More

    Submitted 21 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: To appear at the 3rd Workshop on Neural Generation and Translation (WNGT 2019)

  12. arXiv:1910.03320  [pdf, other

    cs.CL eess.AS

    One-To-Many Multilingual End-to-end Speech Translation

    Authors: Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi

    Abstract: Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confront with extreme data scarcity conditions. The existing SLT parallel corpora are indeed orders of magnitude smaller than those available for the closely related tasks of automatic speech recognition (ASR) and machine translation (MT), which usually comprise tens of millions of instances. To cope wit… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 8 pages, one figure, version accepted at ASRU 2019

  13. Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors

    Authors: Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico

    Abstract: Machine translation systems are conventionally trained on textual resources that do not model phenomena that occur in spoken language. While the evaluation of neural machine translation systems on textual inputs is actively researched in the literature , little has been discovered about the complexities of translating spoken language data with neural models. We introduce and motivate interesting p… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: Interspeech 2017

  14. arXiv:1904.04019  [pdf, other

    cs.CL cs.LG stat.ML

    Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection

    Authors: Mattia Antonino Di Gangi, Giosué Lo Bosco, Giovanni Pilato

    Abstract: Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little wo… ▽ More

    Submitted 6 December, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

    Comments: 37 pages, 7 figures, version 4

    Journal ref: Natural Language Engineering, 25(2), 257-285 (2019)

  15. Face Detection in the Operating Room: Comparison of State-of-the-art Methods and a Self-supervised Approach

    Authors: Thibaut Issenhuth, Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

    Abstract: Purpose: Face detection is a needed component for the automatic analysis and assistance of human activities during surgical procedures. Efficient face detection algorithms can indeed help to detect and identify the persons present in the room, and also be used to automatically anonymize the data. However, current algorithms trained on natural images do not generalize well to the operating room (OR… ▽ More

    Submitted 3 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: 13 pages

  16. arXiv:1810.07652  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018

    Authors: Mattia Antonino Di Gangi, Roberto Dessì, Roldano Cattoni, Matteo Negri, Marco Turchi

    Abstract: This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

    Comments: 6 pages, 2 figures, system description at the 15th International Workshop on Spoken Language Translation (IWSLT) 2018

  17. arXiv:1808.08180  [pdf, other

    cs.CV

    MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation

    Authors: Vinkle Srivastav, Thibaut Issenhuth, Abdolrahim Kadkhodamohammadi, Michel de Mathelin, Afshin Gangi, Nicolas Padoy

    Abstract: Person detection and pose estimation is a key requirement to develop intelligent context-aware assistance systems. To foster the development of human pose estimation methods and their applications in the Operating Room (OR), we release the Multi-View Operating Room (MVOR) dataset, the first public dataset recorded during real clinical interventions. It consists of 732 synchronized multi-view frame… ▽ More

    Submitted 20 August, 2021; v1 submitted 24 August, 2018; originally announced August 2018.

    Comments: Dataset and code is available at https://github.com/camma-public/mvor. The paper was presented in the MICCAI-LABELS 2018 (https://labels.tue-image.nl/previous-editions/labels-2018/)

  18. arXiv:1805.04185  [pdf, other

    cs.CL

    Deep Neural Machine Translation with Weakly-Recurrent Units

    Authors: Mattia Antonino Di Gangi, Marcello Federico

    Abstract: Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation. Recently, new architectures have been proposed, which can leverage parallel computation on GPUs better than classical RNNs. Faster training and inference combined with different sequence-to-sequence modeling also lead to performance improvements. While the new models completely depart fr… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: 10 pages, 3 figures, accepted as a conference paper at the 21st Annual Conference of the European Association for Machine Translation (EAMT) 2018

  19. arXiv:1701.07372  [pdf, other

    cs.CV

    A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms

    Authors: Abdolrahim Kadkhodamohammadi, Afshin Gangi, Michel de Mathelin, Nicolas Padoy

    Abstract: Many approaches have been proposed for human pose estimation in single and multi-view RGB images. However, some environments, such as the operating room, are still very challenging for state-of-the-art RGB methods. In this paper, we propose an approach for multi-view 3D human pose estimation from RGB-D images and demonstrate the benefits of using the additional depth channel for pose refinement be… ▽ More

    Submitted 25 January, 2017; originally announced January 2017.

    Comments: WACV 2017. Supplementary material video: https://youtu.be/L3A0BzT0FKQ

  20. Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data

    Authors: Abdolrahim Kadkhodamohammadi, Afshin Gangi, Michel de Mathelin, Nicolas Padoy

    Abstract: Reliable human pose estimation (HPE) is essential to many clinical applications, such as surgical workflow analysis, radiation safety monitoring and human-robot cooperation. Proposed methods for the operating room (OR) rely either on foreground estimation using a multi-camera system, which is a challenge in real ORs due to color similarities and frequent illumination changes, or on wearable sensor… ▽ More

    Submitted 6 July, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

    Comments: The supplementary video is available at https://youtu.be/iabbGSqRSgE