Skip to main content

Showing 1–4 of 4 results for author: Chevi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2402.14523  [pdf, other

    cs.CL cs.SD eess.AS

    Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

    Authors: Rendi Chevi, Alham Fikri Aji

    Abstract: We often verbally express emotions in a multifaceted manner, they may vary in their intensities and may be expressed not just as a single but as a mixture of emotions. This wide spectrum of emotions is well-studied in the structural model of emotions, which represents variety of emotions as derivative products of primary emotions with varying degrees of intensity. In this paper, we propose an emot… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Project Page: https://rendchevi.github.io/daisy-tts; Updates: (1) Fixed typos, missing references, and layout, (2) Revise explanation on emotion classifier or discriminator

  3. arXiv:2203.15643  [pdf, other

    cs.SD cs.CL cs.LG cs.NE eess.AS

    Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

    Authors: Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

    Abstract: Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specif… ▽ More

    Submitted 5 November, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted at SLT 2022 (https://slt2022.org/). Associated materials can be seen in https://github.com/rendchevi/nix-tts

    MSC Class: 68T50 (Primary) 68T07; 68T10; 68T99 (Secondary) ACM Class: I.2.7; I.2.6; H.5.5

  4. arXiv:2201.00558  [pdf, other

    cs.CL

    Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

    Authors: Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

    Abstract: We perform knowledge distillation (KD) benchmark from task-specific BERT-base teacher models to various student models: BiLSTM, CNN, BERT-Tiny, BERT-Mini, and BERT-Small. Our experiment involves 12 datasets grouped in two tasks: text classification and sequence labeling in the Indonesian language. We also compare various aspects of distillations including the usage of word embeddings and unlabeled… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: 14 pages, 3 figures, submitted to Elsevier

    MSC Class: 68T50 ACM Class: I.2.7; I.2.6