Skip to main content

Showing 1–9 of 9 results for author: Fung, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2306.14517  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

    Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

    Abstract: Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese,… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted in INTERSPEECH 2023

  2. arXiv:2306.06083  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

    Authors: Irina-Elena Veliche, Pascale Fung

    Abstract: The challenge of fairness arises when Automatic Speech Recognition (ASR) systems do not perform equally well for all sub-groups of the population. In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well. ASR fairness is therefore also a r… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Journal ref: ICASSP 2023

  3. arXiv:2207.02663  [pdf, other

    cs.CL cs.SD eess.AS

    Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

    Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

    Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  4. arXiv:2201.02419  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

    Authors: Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech… ▽ More

    Submitted 17 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  5. arXiv:2106.00410  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Nora: The Well-Being Coach

    Authors: Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung

    Abstract: The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions. In this paper we propose Nora, a virtual coaching platform designed to utilize natural language understanding in its dialogue system and suggest other recommendations based on user interactions. It is in… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: 7 pages

  6. arXiv:2004.14228  [pdf, other

    cs.CL cs.SD eess.AS

    Meta-Transfer Learning for Code-Switched Speech Recognition

    Authors: Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, Pascale Fung

    Abstract: An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and the expense and significant effort required to collect mixed-language data. We therefore propose a new learning method, meta-transfer learning, to transfer lear… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted in ACL 2020. The first two authors contributed equally to this work

  7. arXiv:2003.01901  [pdf, other

    eess.AS cs.SD

    Learning Fast Adaptation on Cross-Accented Speech Recognition

    Authors: Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

    Abstract: Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the mo… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: The first three authors contributed equally to this work

  8. arXiv:1910.13923  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer

    Authors: Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Pascale Fung

    Abstract: Highly performing deep neural networks come at the cost of computational complexity that limits their practicality for deployment on portable devices. We propose the low-rank transformer (LRT), a memory-efficient and fast neural architecture that significantly reduces the parameters and boosts the speed of training and inference for end-to-end speech recognition. Our approach reduces the number of… ▽ More

    Submitted 14 February, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

    Comments: The first two authors contributed equally to this work. Accepted as an oral presentation in ICASSP 2020

  9. arXiv:1901.06486  [pdf, other

    cs.CL cs.LG eess.AS

    Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

    Authors: Dario Bertero, Onno Kampman, Pascale Fung

    Abstract: We propose an end-to-end affect recognition approach using a Convolutional Neural Network (CNN) that handles multiple languages, with applications to emotion and personality recognition from speech. We lay the foundation of a universal model that is trained on multiple languages at once. As affect is shared across all languages, we are able to leverage shared information between languages and impr… ▽ More

    Submitted 19 January, 2019; originally announced January 2019.