Skip to main content

Showing 1–39 of 39 results for author: Garcia, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02210  [pdf, other

    cs.RO eess.SY

    An Open and Reconfigurable User Interface to Manage Complex ROS-based Robotic Systems

    Authors: Pablo Malvido Fresnillo, Saigopal Vasudevan, Jose A. Perez Garcia, Jose L. Martinez Lastra

    Abstract: The Robot Operating System (ROS) has significantly gained popularity among robotic engineers and researchers over the past five years, primarily due to its powerful infrastructure for node communication, which enables developers to build modular and large robotic applications. However, ROS presents a steep learning curve and lacks the intuitive usability of vendor-specific robotic Graphical User I… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 14 pages, 12 figures, 3 tables

  2. arXiv:2403.05887  [pdf, other

    eess.AS

    Aligning Speech to Languages to Enhance Code-switching Speech Recognition

    Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

    Abstract: Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose the language alignment loss that performs frame-level language identification using pseudo language labels learned from the ASR decoder. This eliminates the need for frame-level language annotations. To f… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  3. arXiv:2402.10642  [pdf, other

    eess.AS cs.AI

    Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

    Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

    Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  4. arXiv:2402.06201  [pdf, other

    cs.RO eess.SY

    Maximizing Consistent Force Output for Shape Memory Alloy Artificial Muscles in Soft Robots

    Authors: Meredith L. Anderson, Ran **g, Juan C. Pacheco Garcia, Ilyoung Yang, Sarah Alizadeh-Shabdiz, Charles DeLorey, Andrew P. Sabelhaus

    Abstract: Soft robots have immense potential given their inherent safety and adaptability, but challenges in soft actuator forces and design constraints have limited scaling up soft robots to larger sizes. Electrothermal shape memory alloy (SMA) artificial muscles have the potential to create these large forces and high displacements, but consistently using these muscles under a well-defined model, in-situ… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 8 pages, 8 figures, accepted by 2024 IEEE International Conference on Soft Robotics (RoboSoft)

  5. arXiv:2312.08650  [pdf, other

    cs.CV eess.SP

    PhyOT: Physics-informed object tracking in surveillance cameras

    Authors: Kawisorn Kamtue, Jose M. F. Moura, Orathai Sangpetch, Paulo Garcia

    Abstract: While deep learning has been very successful in computer vision, real world operating conditions such as lighting variation, background clutter, or occlusion hinder its accuracy across several tasks. Prior work has shown that hybrid models -- combining neural networks and heuristics/algorithms -- can outperform vanilla deep learning for several computer vision tasks, such as classification or trac… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted at IEEE ICASSP 2024 on December 13, 2023

  6. arXiv:2311.15954  [pdf, other

    cs.CL eess.AS

    A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

    Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

    Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures, 4 tables

  7. arXiv:2309.16953  [pdf, other

    eess.AS cs.SD

    Enhancing Code-switching Speech Recognition with Interactive Language Biases

    Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

    Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  8. arXiv:2306.13734  [pdf, other

    eess.AS cs.CL cs.SD

    The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

    Authors: Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

    Abstract: The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate… ▽ More

    Submitted 14 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  9. arXiv:2306.01031  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

    Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

    Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  10. arXiv:2211.17196  [pdf, other

    cs.CL cs.SD eess.AS

    EURO: ESPnet Unsupervised ASR Open-source Toolkit

    Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More

    Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  11. arXiv:2211.03025  [pdf, other

    cs.CL cs.SD eess.AS

    Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

    Authors: Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-yi Lee

    Abstract: Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need comple… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: ICASSP2023 submission

  12. arXiv:2211.00482  [pdf, other

    eess.AS cs.SD

    Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

    Authors: Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

    Abstract: Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios -- possibly due to the domain mismatch -- which severely limits their use for such applications. In this paper, we inve… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: submitted to ICASSP 2023

  13. arXiv:2210.14567  [pdf, other

    eess.AS cs.SD

    Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

    Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

    Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  14. arXiv:2210.11658  [pdf, other

    eess.SP

    A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

    Authors: Yu Xuan, Xiangyu Zhang, Shuyue Stella Li, Zihan Shen, Xin Xie, Leibny Paola Garcia, Roberto Togneri

    Abstract: The detection of abnormal fetal heartbeats during pregnancy is important for monitoring the health conditions of the fetus. While adult ECG has made several advances in modern medicine, noninvasive fetal electrocardiography (FECG) remains a great challenge. In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. The affine combination of m… ▽ More

    Submitted 26 February, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 3 tables

  15. arXiv:2210.07189  [pdf, other

    cs.CL cs.SD eess.AS

    On Compressing Sequences for Self-Supervised Speech Models

    Authors: Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, Hao Tang

    Abstract: Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how in… ▽ More

    Submitted 25 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  16. arXiv:2210.03459  [pdf, other

    eess.AS cs.CL cs.SD

    Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

    Authors: Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

    Abstract: Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  17. arXiv:2209.12702  [pdf, other

    eess.AS cs.SD

    End-to-End Lyrics Recognition with Self-supervised Learning

    Authors: Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny Paola Garcia

    Abstract: Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We e… ▽ More

    Submitted 26 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 4 pages, 2 figures, 3 tables

  18. arXiv:2209.06377  [pdf

    eess.SY

    Implementation of an Energy Management System for Real-Time Power Flow Control in AC Microgrid

    Authors: Airin Rahman, Hafte Hayelom Adhena, Ramy Georgious, Pablo Garcia

    Abstract: Microgrid (MG) system, which is composed of renewable resources with the utility grid, energy storage unit, electric vehicles, and loads, acts as a single controllable entity. To get efficient and low-cost energy, need to manage power flow within MG depending on renewable resources and load demand. This paper proposes an energy management system (EMS) for grid-connected photovoltaic (PV) and energ… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 6 pages

  19. arXiv:2208.02693  [pdf, other

    cs.CV eess.IV physics.data-an

    Relict landslide detection using Deep-Learning architectures for image segmentation in rainforest areas: A new framework

    Authors: Guilherme P. B. Garcia, Carlos H. Grohmann, Lucas P. Soares, Mateus Espadoto

    Abstract: Landslides are destructive and recurrent natural disasters on steep slopes and represent a risk to lives and properties. Knowledge of relict landslides location is vital to understand their mechanisms, update inventory maps and improve risk assessment. However, relict landslide map** is complex in tropical regions covered with rainforest vegetation. A new CNN framework is proposed for semi-autom… ▽ More

    Submitted 29 May, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

  20. arXiv:2207.00216  [pdf, other

    eess.AS

    Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models

    Authors: Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola García, Yohei Kawaguchi

    Abstract: In this paper, we present an incremental domain adaptation technique to prevent catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model. Conventional approaches require extra parameters of the same size as the model for optimization, and it is difficult to apply these approaches to end-to-end ASR models because they have a huge amount of parameters. To solve this problem… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for Interspeech 2022

  21. arXiv:2206.02432  [pdf, other

    eess.AS cs.CL cs.SD

    Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

    Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi

    Abstract: A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of spea… ▽ More

    Submitted 22 December, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted to IEEE/ACM TASLP

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 706-720, 2023

  22. arXiv:2203.07960  [pdf, other

    eess.AS

    Investigating self-supervised learning for speech enhancement and separation

    Authors: Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, Sanjeev Khudanpur

    Abstract: Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised learning-based enhancement and separation methods having been proposed and achieving good performance, studies on applying self-supervised learning (SSL… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP 2022

  23. arXiv:2110.04694  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-Channel End-to-End Neural Diarization with Distributed Microphones

    Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

    Abstract: Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of t… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  24. arXiv:2107.01545  [pdf, other

    eess.AS cs.CL cs.SD

    Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

    Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, Yohei Kawaguchi

    Abstract: Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an un… ▽ More

    Submitted 23 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021

  25. Encoder-Decoder Based Attractors for End-to-End Neural Diarization

    Authors: Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia

    Abstract: This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based a… ▽ More

    Submitted 28 March, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to IEEE/ACM TASLP. This article is based on our previous conference paper arxiv:2005.09921

  26. arXiv:2106.04764  [pdf, other

    eess.AS cs.SD

    Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

    Authors: Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu

    Abstract: In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlap** speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each tim… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted for Interspeech 2021

  27. arXiv:2106.04078  [pdf, other

    eess.AS cs.SD

    End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

    Authors: Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu

    Abstract: In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlap** speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted for SLT 2021

    Journal ref: IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 849-856

  28. Swarm Robots in Agriculture

    Authors: Daniel Albiero, Angel Pontin Garcia, Claudio Kiyoshi Umezu, Rodrigo Leme de Paulo

    Abstract: Agricultural mechanization is an area of knowledge that has evolved a lot over the past century, its main actors being agricultural tractors that, in 100 years, have increased their powers by 3,300%. This evolution has resulted in an exponential increase in the field capacity of such machines. However, it has also generated negative results such as excessive consumption of fossil fuel, excessive w… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Paper published in Brazilian Congress of Automatic. Porto Alegre, 2020

  29. arXiv:2102.01363  [pdf, other

    eess.AS cs.CL cs.SD

    The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

    Authors: Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  30. arXiv:2101.08473  [pdf, other

    cs.SD eess.AS

    Online Streaming End-to-End Neural Diarization Handling Overlap** Speech and Flexible Numbers of Speakers

    Authors: Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

    Abstract: We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlap** speech. In our previous study, the speaker-tracing buffer (STB) mechanism was proposed to achieve a chunk-wise streaming diarization using a pre-trained EEND model. STB traces the speaker information in previous chunks to map the speakers in a… ▽ More

    Submitted 6 April, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  31. arXiv:2012.10055  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Speaker Diarization as Post-Processing

    Authors: Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu

    Abstract: This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlap** speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handl… ▽ More

    Submitted 23 December, 2020; v1 submitted 18 December, 2020; originally announced December 2020.

  32. arXiv:2011.01093  [pdf, ps, other

    eess.SY

    Predicting the future state of disturbed LTI systems: A solution based on high-order observers

    Authors: Alberto Castillo, Pedro Garcia

    Abstract: Predicting the state of a system in a relatively near future time instant is often needed for control purposes. However, when the system is affected by external disturbances, its future state is dependent on the forthcoming disturbance; which is, in most of the cases, unknown and impossible to measure. In this scenario, making predictions of the future system-state is not straightforward and, inde… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: 5 pages, 1 figure

    Journal ref: Automatica, Volume 57 (2021), Issue 2 (February)

  33. arXiv:2008.13213  [pdf, other

    eess.AS

    Mixture of Speaker-type PLDAs for Children's Speech Diarization

    Authors: Jiamin Xie, Suzanna Sia, Paola Garcia, Daniel Povey, Sanjeev Khudanpur

    Abstract: In diarization, the PLDA is typically used to model an inference structure which assumes the variation in speech segments be induced by various speakers. The speaker variation is then learned from the training data. However, human perception can differentiate speakers by age, gender, among other characteristics. In this paper, we investigate a speaker-type informed model that explicitly captures t… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

    Comments: submitted to Interspeech 2020

  34. CellEVAC: An adaptive guidance system for crowd evacuation through behavioral optimization

    Authors: Miguel A. Lopez-Carmona, Alvaro Paricio Garcia

    Abstract: A critical aspect of crowds' evacuation processes is the dynamism of individual decision making. Here, we investigate how to favor a coordinated group dynamic through optimal exit-choice instructions using behavioral strategy optimization. We propose and evaluate an adaptive guidance system (Cell-based Crowd Evacuation, CellEVAC) that dynamically allocates colors to cells in a cell-based pedestria… ▽ More

    Submitted 18 May, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: 47 pages, 26 figures

    ACM Class: I.6.4

  35. arXiv:2006.07898  [pdf, other

    eess.AS cs.SD

    The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

    Authors: Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Żelasko, Paola García, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each stage of the pipeline, such as multi-array guided source separation (GSS) for enhancement and acoustic model training data, posterior fusion for spee… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

    Comments: Presented at the CHiME-6 workshop (colocated with ICASSP 2020)

  36. arXiv:2002.06220  [pdf, other

    eess.AS cs.SD

    Speaker Diarization with Region Proposal Network

    Authors: Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

    Abstract: Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method:… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  37. arXiv:1912.00938  [pdf

    eess.AS cs.SD

    Speaker detection in the wild: Lessons learned from JSALT 2019

    Authors: Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

    Abstract: This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker dete… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Submitted to ICASSP 2020

  38. arXiv:1910.11905  [pdf, ps, other

    eess.AS cs.SD

    Feature Enhancement with Deep Feature Losses for Speaker Verification

    Authors: Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak

    Abstract: Speaker Verification still suffers from the challenge of generalization to novel adverse environments. We leverage on the recent advancements made by deep learning based speech enhancement and propose a feature-domain supervised denoising based solution. We propose to use Deep Feature Loss which optimizes the enhancement network in the hidden activation space of a pre-trained auxiliary speaker emb… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages, accepted in ICASSP 2020

  39. arXiv:1801.05363  [pdf, other

    eess.SP nlin.CD

    Non Intrusive Load Monitoring in Chaotic Switching Networks

    Authors: P. Garcia, X. Dominguez, D. Chiza

    Abstract: In this work, a non intrusive load disaggregation scheme is proposed. By using a kernel based nonlinear regression strategy, the switching dynamic of an electric network, simulated as a set of RLC circuits with chaotic switching, is approximated using a time series of the total power consumption. The results suggest that the employed methodology can be useful in the design of efficient load disagg… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.