Skip to main content

Showing 1–9 of 9 results for author: Frieske, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.01572  [pdf, other

    cs.CL cs.SD eess.AS

    Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models

    Authors: Rita Frieske, Bertram E. Shi

    Abstract: Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of halluci… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  2. arXiv:2306.14517  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

    Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

    Abstract: Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese,… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted in INTERSPEECH 2023

  3. State-of-the-art generalisation research in NLP: A taxonomy and review

    Authors: Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhi**g **

    Abstract: The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation… ▽ More

    Submitted 12 January, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: This preprint was published as an Analysis article in Nature Machine Intelligence. Please refer to the published version when citing this work. 28 pages of content + 6 pages of appendix + 52 pages of references

    Journal ref: Nat Mach Intell 5, 1161-1174 (2023)

  4. arXiv:2209.03711  [pdf, other

    cs.SD cs.AI eess.AS

    What Did I Just Hear? Detecting Pornographic Sounds in Adult Videos Using Neural Networks

    Authors: Holy Lovenia, Dessi Puji Lestari, Rita Frieske

    Abstract: Audio-based pornographic detection enables efficient adult content filtering without sacrificing performance by exploiting distinct spectral characteristics. To improve it, we explore pornographic sound modeling based on different neural architectures and acoustic features. We find that CNN trained on log mel spectrogram achieves the best performance on Pornography-800 dataset. Our experiment resu… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: Published in AudioMostly 2022, ACM

  5. arXiv:2203.00314  [pdf, other

    cs.CL

    VScript: Controllable Script Generation with Visual Presentation

    Authors: Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

    Abstract: In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of… ▽ More

    Submitted 13 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Journal ref: AACL Demo (2022)

  6. Survey of Hallucination in Natural Language Generation

    Authors: Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye** Bang, Delong Chen, Ho Shu Chan, Wenliang Dai, Andrea Madotto, Pascale Fung

    Abstract: Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However,… ▽ More

    Submitted 19 February, 2024; v1 submitted 7 February, 2022; originally announced February 2022.

    ACM Class: A.1

    Journal ref: ACM Computing Surveys (2022)

  7. arXiv:2201.03804  [pdf, other

    cs.CL cs.AI

    CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

    Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource lan… ▽ More

    Submitted 14 March, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: 6 pages

  8. arXiv:2201.02419  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

    Authors: Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech… ▽ More

    Submitted 17 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  9. arXiv:2112.06223  [pdf, other

    cs.CL

    ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

    Authors: Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus buil… ▽ More

    Submitted 3 May, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Journal ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)