Showing 1–2 of 2 results for author: Santos, V G d

Search v0.5.6 released 2020-02-24

arXiv:2210.07852 [pdf, other]

cs.CL cs.SD eess.AS

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

Authors: Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio

Abstract: The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were… ▽ More The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours. △ Less

Submitted 14 October, 2022; originally announced October 2022.
arXiv:2207.04476 [pdf, other]

cs.CL

doi 10.3897/jucs.70941

Myers-Briggs personality classification from social media text using pre-trained language models

Authors: Vitor Garcia dos Santos, Ivandré Paraboni

Abstract: In Natural Language Processing, the use of pre-trained language models has been shown to obtain state-of-the-art results in many downstream tasks such as sentiment analysis, author identification and others. In this work, we address the use of these methods for personality classification from text. Focusing on the Myers-Briggs (MBTI) personality model, we describe a series of experiments in which… ▽ More In Natural Language Processing, the use of pre-trained language models has been shown to obtain state-of-the-art results in many downstream tasks such as sentiment analysis, author identification and others. In this work, we address the use of these methods for personality classification from text. Focusing on the Myers-Briggs (MBTI) personality model, we describe a series of experiments in which the well-known Bidirectional Encoder Representations from Transformers (BERT) model is fine-tuned to perform MBTI classification. Our main findings suggest that the current approach significantly outperforms well-known text classification models based on bag-of-words and static word embeddings alike across multiple evaluation scenarios, and generally outperforms previous work in the field. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 19 pages

Journal ref: Journal of Universal Computer Science, vol. 28, no. 4 (2022), 378-395

Search v0.5.6 released 2020-02-24