Skip to main content

Showing 1–8 of 8 results for author: Martínez-Hinarejos, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18664  [pdf, other

    cs.CV

    Reading Order Independent Metrics for Information Extraction in Handwritten Documents

    Authors: David Villanova-Aparisi, Solène Tarride, Carlos-D. Martínez-Hinarejos, Verónica Romero, Christopher Kermorvant, Moisés Pastor-Gadea

    Abstract: Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2402.13152  [pdf, other

    cs.CV cs.CL

    AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

    Authors: José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

    Abstract: More than 7,000 known languages are spoken around the world. However, due to the lack of annotated resources, only a small fraction of them are currently covered by speech technologies. Albeit self-supervised speech representations, recent massive speech corpora collections, as well as the organization of challenges, have alleviated this inequality, most studies are mainly benchmarked on English.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)

  3. arXiv:2402.13004  [pdf, other

    cs.CV

    Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

    Authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

    Abstract: Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems are usually based on encoder-decoder architectures. While encoders are somewhat general, multiple decoding approaches have been explored, such as the conventional… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)

  4. arXiv:2401.02746  [pdf, other

    cs.CV

    Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

    Authors: David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso

    Abstract: Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-ve… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted at 46th European Conference on Information Retrieval (ECIR 2024)

  5. arXiv:2311.12480  [pdf, ps, other

    cs.CV cs.CL cs.SD eess.AS

    Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

    Authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

    Abstract: Different studies have shown the importance of visual cues throughout the speech perception process. In fact, the development of audiovisual approaches has led to advances in the field of speech technologies. However, although noticeable results have recently been achieved, visual speech recognition remains an open research problem. It is a task in which, by dispensing with the auditory sense, cha… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted in Proceedings of IberSpeech 2022 ( https://www.isca-speech.org/archive/iberspeech_2022/gimenogomez22_iberspeech.html )

  6. Analysis of Visual Features for Continuous Lipreading in Spanish

    Authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

    Abstract: During a conversation, our brain is responsible for combining information obtained from multiple senses in order to improve our ability to understand the message we are perceiving. Different studies have shown the importance of presenting visual information in these situations. Nevertheless, lipreading is a complex task whose objective is to interpret speech when audio is not available. By dispens… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted in Proceedings of IberSpeech 2020 ( https://www.isca-speech.org/archive/iberspeech_2021/gimenogomez21_iberspeech.html )

  7. arXiv:2311.12457  [pdf, other

    cs.CV cs.CL

    LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

    Authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

    Abstract: Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted in Proceedings of LREC 2022 ( https://aclanthology.org/2022.lrec-1.294 )

  8. arXiv:2209.02403  [pdf

    cs.HC

    Guidelines to Develop Trustworthy Conversational Agents for Children

    Authors: Marina Escobar-Planas, Emilia Gómez, Carlos-D Martínez-Hinarejos

    Abstract: Conversational agents (CAs) embodied in speakers or chatbots are becoming very popular in some countries, and despite their adult-centred design, they have become part of children's lives, generating a need for children-centric trustworthy systems. This paper presents a literature review to identify the main opportunities, challenges and risks brought by CAs when used by children. We then consider… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: 19 pages