Skip to main content

Showing 1–12 of 12 results for author: Ortega, D

Searching in archive cs. Search in all archives.
.
  1. Automatic UAV-based Airport Pavement Inspection Using Mixed Real and Virtual Scenarios

    Authors: Pablo Alonso, Jon Ander Iñiguez de Gordoa, Juan Diego Ortega, Sara García, Francisco Javier Iriarte, Marcos Nieto

    Abstract: Runway and taxiway pavements are exposed to high stress during their projected lifetime, which inevitably leads to a decrease in their condition over time. To make sure airport pavement condition ensure uninterrupted and resilient operations, it is of utmost importance to monitor their condition and conduct regular inspections. UAV-based inspection is recently gaining importance due to its wide ra… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, published in proceedings of 15th International Conference on Machine Vision (ICMV)

    Journal ref: Proc. SPIE 12701, Fifteenth International Conference on Machine Vision (ICMV 2022), 1270118

  2. arXiv:2304.04478  [pdf, other

    cs.CL cs.SD eess.AS

    Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

    Authors: Daniel Ortega, Chia-Yu Li, Ngoc Thang Vu

    Abstract: This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of u… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Published in ICASSP 2020

  3. arXiv:2304.04472  [pdf, other

    cs.CL

    Modeling Speaker-Listener Interaction for Backchannel Prediction

    Authors: Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

    Abstract: We present our latest findings on backchannel modeling novelly motivated by the canonical use of the minimal responses Yeah and Uh-huh in English and their correspondent tokens in German, and the effect of encoding the speaker-listener interaction. Backchanneling theories emphasize the active and continuous role of the listener in the course of the conversation, their effects on the speaker's subs… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Published in IWSDS 2023

  4. arXiv:2205.06556  [pdf

    cs.CV

    Virtual passengers for real car solutions: synthetic datasets

    Authors: Paola Natalia Canas, Juan Diego Ortega, Marcos Nieto, Oihana Otaegui

    Abstract: Strategies that include the generation of synthetic data are beginning to be viable as obtaining real data can be logistically complicated, very expensive or slow. Not only the capture of the data can lead to complications, but also its annotation. To achieve high-fidelity data for training intelligent systems, we have built a 3D scenario and set-up to resemble reality as closely as possible. With… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 9 pages, 6 figures, 14th ITS European Congress

  5. arXiv:2110.13242  [pdf, other

    cs.RO

    2D Grid Map Generation for Deep-Learning-based Navigation Approaches

    Authors: Gabriel O. Flores-Aquino, Jheison Duvier Díaz Ortega, Ricardo Yahir Almazan Arvizu, Raúl López Muñoz, O. Octavio Gutierrez-Frias, J. Irving Vasquez-Gomez

    Abstract: In the last decade, autonomous navigation for roboticshas been leveraged by deep learning and other approachesbased on machine learning. These approaches have demon-strated significant advantages in robotics performance. Butthey have the disadvantage that they require a lot of data toinfer knowledge. In this paper, we present an algorithm forbuilding 2D maps with attributes that make them useful f… ▽ More

    Submitted 4 December, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: 6 pages, 4 figures, conference, dataset

  6. arXiv:2008.12085  [pdf, ps, other

    cs.CV cs.LG eess.IV

    DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis

    Authors: Juan Diego Ortega, Neslihan Kose, Paola Cañas, Min-An Chao, Alexander Unnervik, Marcos Nieto, Oihana Otaegui, Luis Salgado

    Abstract: Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS), especially after the recent success of Deep Learning (DL) methods. The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development, crucial for the transition of automated driving from SAE Level-2 to SAE Level-3. In this paper, we introduce the Drive… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted to ECCV 2020 workshop - Assistive Computer Vision and Robotics

  7. arXiv:2005.01777  [pdf, other

    cs.CL cs.AI

    ADVISER: A Toolkit for Develo** Multi-modal, Multi-domain and Socially-engaged Conversational Agents

    Authors: Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu

    Abstract: We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically exper… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: All authors contributed equally. Accepted to be presented at ACL - System demonstrations - 2020

  8. arXiv:1907.03196  [pdf, other

    cs.CV eess.AS eess.IV

    Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

    Authors: Juan D. S. Ortega, Mohammed Senoussaoui, Eric Granger, Marco Pedersoli, Patrick Cardinal, Alessandro L. Koerich

    Abstract: This paper presents a novel deep neural network (DNN) for multimodal fusion of audio, video and text modalities for emotion recognition. The proposed DNN architecture has independent and shared layers which aim to learn the representation for each modality, as well as the best combined representation to achieve the best prediction. Experimental results on the AVEC Sentiment Analysis in the Wild da… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

  9. arXiv:1906.10623  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Emotion Recognition Using Fusion of Audio and Video Features

    Authors: Juan D. S. Ortega, Patrick Cardinal, Alessandro L. Koerich

    Abstract: In this paper we propose a fusion approach to continuous emotion recognition that combines visual and auditory modalities in their representation spaces to predict the arousal and valence levels. The proposed approach employs a pre-trained convolution neural network and transfer learning to extract features from video frames that capture the emotional content. For the auditory content, a minimalis… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  10. arXiv:1902.11060  [pdf, other

    cs.CL

    Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

    Authors: Daniel Ortega, Chia-Yu Li, Gisela Vallejo, Pavel Denisov, Ngoc Thang Vu

    Abstract: This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid T… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

    Comments: 5 pages, 1 figure, ICASSP 2019, dialog act classification, automatic speech recognition

  11. arXiv:1803.00831  [pdf, other

    cs.CL

    Lexico-acoustic Neural-based Models for Dialog Act Classification

    Authors: Daniel Ortega, Ngoc Thang Vu

    Abstract: Recent works have proposed neural models for dialog act classification in spoken dialogs. However, they have not explored the role and the usefulness of acoustic information. We propose a neural model that processes both lexical and acoustic features for classification. Our results on two benchmark datasets reveal that acoustic features are helpful in improving the overall accuracy. Finally, a dee… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

    Comments: 5 pages, 1 figure, 2018 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018)

  12. arXiv:1708.02561  [pdf, other

    cs.CL

    Neural-based Context Representation Learning for Dialog Act Classification

    Authors: Daniel Ortega, Ngoc Thang Vu

    Abstract: We explore context representation learning methods in neural-based models for dialog act classification. We propose and compare extensively different methods which combine recurrent neural network architectures and attention mechanisms (AMs) at different context levels. Our experimental results on two benchmark datasets show consistent improvements compared to the models without contextual informa… ▽ More

    Submitted 8 August, 2017; originally announced August 2017.

    Comments: 5 pages, 1 figure, SIGDIAL 2017