-
Non-additive Stochastic Model for Supercooled Liquids: New Perspectives for Glass Science
Authors:
Antonio Cesar do Prado Rosa Junior,
Elias Brito Alves Junior,
Wanisson Silva Santana,
Clebson Cruz
Abstract:
We present a review of the Non-additive Stochastic Model for supercooled liquids (NSM), an efficient approach for diffusive processes that provides a suitable interpretation for the non-Arrhenius dynamics in these materials. Based on a class of non-homogeneous continuity equations, the NSM provides functions able to model the thermal behavior of the viscosity and diffusivity in fragile liquids. Th…
▽ More
We present a review of the Non-additive Stochastic Model for supercooled liquids (NSM), an efficient approach for diffusive processes that provides a suitable interpretation for the non-Arrhenius dynamics in these materials. Based on a class of non-homogeneous continuity equations, the NSM provides functions able to model the thermal behavior of the viscosity and diffusivity in fragile liquids. The model defines a rigorous physical criterion that distinguishes super-Arrhenius from sub-Arrhenius processes, establishes a robust scale of fragility, describes fragile-to-strong curves in Angell's plot, and provides an accurate fitting equation for experimental viscosity data as effective as other viscosity classic models.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
An Incremental MaxSAT-based Model to Learn Interpretable and Balanced Classification Rules
Authors:
Antônio Carlos Souza Ferreira Júnior,
Thiago Alves Rocha
Abstract:
The increasing advancements in the field of machine learning have led to the development of numerous applications that effectively address a wide range of problems with accurate predictions. However, in certain cases, accuracy alone may not be sufficient. Many real-world problems also demand explanations and interpretability behind the predictions. One of the most popular interpretable models that…
▽ More
The increasing advancements in the field of machine learning have led to the development of numerous applications that effectively address a wide range of problems with accurate predictions. However, in certain cases, accuracy alone may not be sufficient. Many real-world problems also demand explanations and interpretability behind the predictions. One of the most popular interpretable models that are classification rules. This work aims to propose an incremental model for learning interpretable and balanced rules based on MaxSAT, called IMLIB. This new model was based on two other approaches, one based on SAT and the other on MaxSAT. The one based on SAT limits the size of each generated rule, making it possible to balance them. We suggest that such a set of rules seem more natural to be understood compared to a mixture of large and small rules. The approach based on MaxSAT, called IMLI, presents a technique to increase performance that involves learning a set of rules by incrementally applying the model in a dataset. Finally, IMLIB and IMLI are compared using diverse databases. IMLIB obtained results comparable to IMLI in terms of accuracy, generating more balanced rules with smaller sizes.
△ Less
Submitted 29 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Proof-of-concept for a nonadditive stochastic model of supercooled liquids
Authors:
Antonio Cesar do Prado Rosa Junior,
Elias Brito Alves Junior,
Wanisson Silva Santana,
Clebson Cruz
Abstract:
The recently proposed non-additive stochastic model (NSM) offers a coherent physical interpretation for diffusive phenomena in glass-forming systems. This model presents non-exponential relationships between viscosity, activation energy, and temperature, characterizing the non-Arrhenius behavior observed in supercooled liquids. In this work, we fit the NSM viscosity equation to experimental temper…
▽ More
The recently proposed non-additive stochastic model (NSM) offers a coherent physical interpretation for diffusive phenomena in glass-forming systems. This model presents non-exponential relationships between viscosity, activation energy, and temperature, characterizing the non-Arrhenius behavior observed in supercooled liquids. In this work, we fit the NSM viscosity equation to experimental temperature-dependent viscosity data corresponding to twenty-five glass-forming liquids and compare the fit parameters with those obtained using the Vogel-Fulcher-Tammann (VFT), Avramov-Milchev (AM), and Mauro-Yue-Ellison-Gupta-Allan (MYEGA) models. The results demonstrate that the NSM provides an effective fitting equation for modeling viscosity experimental data in comparison with other established models (VFT, AM and MYEGA), characterizing the activation energy in fragile liquids, presenting a reliable indicator of the degree of fragility of the glass-forming liquids.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Authors:
Gabriel Lino Garcia,
Pedro Henrique Paiola,
Luis Henrique Morelli,
Giovani Candido,
Arnaldo Cândido Júnior,
Danilo Samuel Jodas,
Luis C. S. Afonso,
Ivan Rizzo Guilherme,
Bruno Elias Penteado,
João Paulo Papa
Abstract:
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to…
▽ More
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites
Authors:
Augusto Seben da Rosa,
Frederico Santos de Oliveira,
Anderson da Silva Soares,
Arnaldo Candido Junior
Abstract:
Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific app…
▽ More
Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
A Machine Learning Pressure Emulator for Hydrogen Embrittlement
Authors:
Minh Triet Chau,
João Lucas de Sousa Almeida,
Elie Alhajjar,
Alberto Costa Nogueira Junior
Abstract:
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fid…
▽ More
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fidelity results, the current PDE-based simulators are time- and computationally-demanding. Using simulation data, we train an ML model to predict the pressure on the pipelines' inner walls, which is a first step for pipeline system surveillance. We found that the physics-based method outperformed the purely data-driven method and satisfy the physical constraints of the gas flow system.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Port…
▽ More
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish. Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English. Our purpose in creating this dataset is to open up new research possibilities in the TTS area for multi-lingual models. The dataset is publicly available under the CC-BY 4.0 license1.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Evaluation of Speech Representations for MOS prediction
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Lucas R. S. Gris,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was creat…
▽ More
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was created for this work. The results show that the Whisper model is appropriate in all scenarios: with both the VCC2018 and BRSpeech- MOS datasets. Among the supervised and self-supervised learning models using BRSpeechMOS, Whisper-Small achieved the best linear correlation of 0.6980, and the speaker verification model, SpeakerNet, had linear correlation of 0.6963. Using VCC2018, the best supervised and self-supervised learning model, Whisper-Large, achieved linear correlation of 0.7274, and the best model speaker verification, TitaNet, achieved a linear correlation of 0.6933. Although the results of the speaker verification models are slightly lower, the SpeakerNet model has only 5M parameters, making it suitable for real-time applications, and the TitaNet model produces an embedding of size 192, the smallest among all the evaluated models. The experiment results are reproducible with publicly available source-code1 .
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
Authors:
Lucas Rafael Stefanel Gris,
Ricardo Marcacini,
Arnaldo Candido Junior,
Edresson Casanova,
Anderson Soares,
Sandra Maria Aluísio
Abstract:
Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct identification of punctuation marks in automatic transcriptions, which hinder the use of transcriptions by other systems, models, and even by humans.…
▽ More
Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct identification of punctuation marks in automatic transcriptions, which hinder the use of transcriptions by other systems, models, and even by humans. However, recently Whisper ASR was proposed by OpenAI, a general-purpose speech recognition model that has generated great expectations in dealing with such limitations. This chapter presents the first study on the performance of Whisper for punctuation prediction in the Portuguese language. We present an experimental evaluation considering both theoretical aspects involving pausing points (comma) and complete ideas (exclamation, question, and fullstop), as well as practical aspects involving transcript-based topic modeling - an application dependent on punctuation marks for promising performance. We analyzed experimental results from videos of Museum of the Person, a virtual museum that aims to tell and preserve people's life histories, thus discussing the pros and cons of Whisper in a real-world scenario. Although our experiments indicate that Whisper achieves state-of-the-art results, we conclude that some punctuation marks require improvements, such as exclamation, semicolon and colon.
△ Less
Submitted 26 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production
Authors:
Francisco Eron,
Muhammad Noman,
Raphael Ricon de Oliveira,
Deigo de Souza Marques,
Rafael Serapilha Durelli,
Andre Pimenta Freire,
Antonio Chalfun Junior
Abstract:
Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating…
▽ More
Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Interpretability Analysis of Deep Models for COVID-19 Detection
Authors:
Daniel Peixoto Pinto da Silva,
Edresson Casanova,
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Marcelo Finger,
Flaviane Svartman,
Beatriz Raposo,
Marcus Vinícius Moreira Martins,
Sandra Maria Aluísio,
Larissa Cristina Berti,
João Paulo Teixeira
Abstract:
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age.…
▽ More
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age. Following, we analyse model decisions by generating heat maps for the trained models to capture their attention during the decision process. Focusing on a explainable Inteligence Artificial approach, we show that studied models can taken unbiased decisions even in the presence of spurious data in the training set, given the adequate preprocessing steps. Our best model has 94.44% of accuracy in detection, with results indicating that models favors spectrograms for the decision process, particularly, high energy areas in the spectrogram related to prosodic domains, while F0 also leads to efficient COVID-19 detection.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Authors:
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Vinícius G. dos Santos,
Bruno A. Papa Dias,
Marli Quadros Leite,
Flaviane Romani Fernandes Svartman,
Sandra Aluísio
Abstract:
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were…
▽ More
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Non-Intrusive Reduced Models based on Operator Inference for Chaotic Systems
Authors:
João Lucas de Sousa Almeida,
Arthur Cancellieri Pires,
Klaus Feine Vaz Cid,
Alberto Costa Nogueira Junior
Abstract:
This work explores the physics-driven machine learning technique Operator Inference (OpInf) for predicting the state of chaotic dynamical systems. OpInf provides a non-intrusive approach to infer approximations of polynomial operators in reduced space without having access to the full order operators appearing in discretized models. Datasets for the physics systems are generated using conventional…
▽ More
This work explores the physics-driven machine learning technique Operator Inference (OpInf) for predicting the state of chaotic dynamical systems. OpInf provides a non-intrusive approach to infer approximations of polynomial operators in reduced space without having access to the full order operators appearing in discretized models. Datasets for the physics systems are generated using conventional numerical solvers and then projected to a low-dimensional space via Principal Component Analysis (PCA). In latent space, a least-squares problem is set to fit a quadratic polynomial operator, which is subsequently employed in a time-integration scheme in order to produce extrapolations in the same space. Once solved, the inverse PCA operation is applied to reconstruct the extrapolations in the original space. The quality of the OpInf predictions is assessed via the Normalized Root Mean Squared Error (NRMSE) metric from which the Valid Prediction Time (VPT) is computed. Numerical experiments considering the chaotic systems Lorenz 96 and the Kuramoto-Sivashinsky equation show promising forecasting capabilities of the OpInf reduced order models with VPT ranges that outperform state-of-the-art machine learning methods such as backpropagation and reservoir computing recurrent neural networks [1], as well as Markov neural operators [2].
△ Less
Submitted 21 September, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Authors:
Edresson Casanova,
Christopher Shulby,
Alexander Korolev,
Arnaldo Candido Junior,
Anderson da Silva Soares,
Sandra Aluísio,
Moacir Antonelli Ponti
Abstract:
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model tr…
▽ More
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
△ Less
Submitted 20 May, 2023; v1 submitted 29 March, 2022;
originally announced April 2022.
-
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Authors:
Edresson Casanova,
Julian Weber,
Christopher Shulby,
Arnaldo Candido Junior,
Eren Gölge,
Moacir Antonelli Ponti
Abstract:
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our…
▽ More
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.
△ Less
Submitted 30 April, 2023; v1 submitted 4 December, 2021;
originally announced December 2021.
-
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Authors:
Arnaldo Candido Junior,
Edresson Casanova,
Anderson Soares,
Frederico Santos de Oliveira,
Lucas Oliveira,
Ricardo Corso Fernandes Junior,
Daniel Peixoto Pinto da Silva,
Fernando Gorgulho Fayet,
Bruno Baldissera Carlotto,
Lucas Rafael Stefanel Gris,
Sandra Maria Aluísio
Abstract:
Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however,…
▽ More
Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however, are composed of audios containing only read and prepared speech. There is a lack of datasets including spontaneous speech, which are essential in different ASR applications. This paper presents CORAA (Corpus of Annotated Audios) v1. with 290.77 hours, a publicly available dataset for ASR in BP containing validated pairs (audio-transcription). CORAA also contains European Portuguese audios (4.69 hours). We also present a public ASR model based on Wav2Vec 2.0 XLSR-53 and fine-tuned over CORAA. Our model achieved a Word Error Rate of 24.18% on CORAA test set and 20.08% on Common Voice test set. When measuring the Character Error Rate, we obtained 11.02% and 6.34% for CORAA and Common Voice, respectively. CORAA corpora were assembled to both improve ASR models in BP with phenomena from spontaneous speech and motivate young researchers to start their studies on ASR for Portuguese. All the corpora are publicly available at https://github.com/nilc-nlp/CORAA under the CC BY-NC-ND 4.0 license.
△ Less
Submitted 18 November, 2021; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA
Authors:
Alexssandro Ferreira Cordeiro,
Pedro Luiz de Paula Filho,
Hamilton Pereira da Silva,
Arnaldo Candido Junior,
Edresson Casanova,
Jandrei Sartori Spancerski
Abstract:
Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C\# with the IDE Visual studio. The results of the comparisons indicate that the form of sequential…
▽ More
Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C\# with the IDE Visual studio. The results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 1/3 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. For the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while the parallel approaches generated noise in their outputs.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
Authors:
Lucas Rafael Stefanel Gris,
Edresson Casanova,
Frederico Santos de Oliveira,
Anderson da Silva Soares,
Arnaldo Candido Junior
Abstract:
Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In…
▽ More
Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data. The final model presents an average word error rate of 12.4% over 7 different datasets (10.5% when applying a language model). According to our knowledge, the obtained error is the lowest among open end-to-end (E2E) ASR models for BP.
△ Less
Submitted 22 December, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Authors:
Edresson Casanova,
Christopher Shulby,
Eren Gölge,
Nicolas Michael Müller,
Frederico Santos de Oliveira,
Arnaldo Candido Junior,
Anderson da Silva Soares,
Sandra Maria Aluisio,
Moacir Antonelli Ponti
Abstract:
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform…
▽ More
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model converges using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality.
△ Less
Submitted 15 June, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
Authors:
Edresson Casanova,
Arnaldo Candido Junior,
Christopher Shulby,
Frederico Santos de Oliveira,
João Paulo Teixeira,
Moacir Antonelli Ponti,
Sandra Maria Aluisio
Abstract:
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources fo…
▽ More
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 hours from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in Portuguese.
△ Less
Submitted 29 January, 2022; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
Authors:
Edresson Casanova,
Arnaldo Candido Junior,
Christopher Shulby,
Frederico Santos de Oliveira,
Lucas Rafael Stefanel Gris,
Hamilton Pereira da Silva,
Sandra Maria Aluisio,
Moacir Antonelli Ponti
Abstract:
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice…
▽ More
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice. For this purpose, a new dataset was built, composed of 40 male speakers, who read sentences in Portuguese, totaling approximately 3h. We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture. Then, we compared this model with the SOTA method for the speaker recognition task: the Fast ResNet-34 trained with approximately 2,000 hours, using the loss functions Angular Prototypical and GE2E. Three experiments were carried out with datasets in different languages. Among these three experiments, our model achieved the second best result in two experiments and the best result in one of them. This highlights the importance of our method, which proved to be a great competitor to SOTA speaker recognition models, with 500x less data and a simpler approach.
△ Less
Submitted 18 June, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Non-Arrhenius behavior and fragile-to-strong transition of glass-forming liquids
Authors:
Antonio Cesar do Prado Rosa Junior,
Clebson Cruz,
Wanisson Silva Santana,
Elias Brito Alves Junior,
Marcelo Albano Moret
Abstract:
Characterization of the non-Arrhenius behavior of glass-forming liquids is a broad avenue for research toward the understanding of the formation mechanisms of noncrystalline materials. In this context, this paper explores the main properties of the viscosity of glass-forming systems, considering super-Arrhenius diffusive processes. We establish the viscous activation energy as a function of the te…
▽ More
Characterization of the non-Arrhenius behavior of glass-forming liquids is a broad avenue for research toward the understanding of the formation mechanisms of noncrystalline materials. In this context, this paper explores the main properties of the viscosity of glass-forming systems, considering super-Arrhenius diffusive processes. We establish the viscous activation energy as a function of the temperature, measure the degree of fragility of the system, and characterize the fragile-to-strong transition through the standard Angell's plot. Our results show that the non-Arrhenius behavior observed in fragile liquids can be understood through the non-Markovian dynamics that characterize the diffusive processes of these systems. Moreover, the fragile-to-strong transition corresponds to a change in the spatio-temporal range of correlations during the glass transition process.
△ Less
Submitted 19 April, 2020; v1 submitted 27 January, 2020;
originally announced January 2020.
-
On evolutoids of regular surfaces in Euclidean 3-space
Authors:
Ady Cambraia Junior,
Abilio Lemos,
Mostafa Salarinoghabi
Abstract:
Inspired by the concept of evolutoids of planar curves, we present the concept of evolutoids for regular surfaces as an envelope of a two-parameter family of lines in Euclidean 3-space. We give an explicit parametrization for such evolutoids. Besides, we used the theory of singularities to study the local behavior of regular points of this object and presented some relations between the geometry o…
▽ More
Inspired by the concept of evolutoids of planar curves, we present the concept of evolutoids for regular surfaces as an envelope of a two-parameter family of lines in Euclidean 3-space. We give an explicit parametrization for such evolutoids. Besides, we used the theory of singularities to study the local behavior of regular points of this object and presented some relations between the geometry of the regular surface and its evolutoid.
△ Less
Submitted 7 April, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Using Map** Languages for Building Legal Knowledge Graphs from XML Files
Authors:
Ademar Crotti Junior,
Fabrizio Orlandi,
Declan O'Sullivan,
Christian Dirschl,
Quentin Reul
Abstract:
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant…
▽ More
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant research effort has been invested in the area of building knowledge graphs from semi-structured text documents, such as XML, with the prevailing approach being the use of map** languages. In this paper, we present a semantic model for representing legal documents together with an industrial use case. We also present a set of use case requirements based on the proposed semantic model, which are used to compare and discuss the use of state-of-the-art map** languages for building knowledge graphs for legal data.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Characterization of the non-Arrhenius behavior of supercooled liquids by modeling non-additive stochastic systems
Authors:
Antonio Cesar do Prado Rosa Junior,
Clebson Cruz,
Wanisson Silva Santana,
Marcelo Albano Moret
Abstract:
The characterization of the formation mechanisms of amorphous solids is a large avenue for research, since understanding its non-Arrhenius behavior is challenging to overcome. In this context, we present one path toward modeling the diffusive processes in supercooled liquids near glass transition through a class of non-homogeneous continuity equations, providing a consistent theoretical basis for…
▽ More
The characterization of the formation mechanisms of amorphous solids is a large avenue for research, since understanding its non-Arrhenius behavior is challenging to overcome. In this context, we present one path toward modeling the diffusive processes in supercooled liquids near glass transition through a class of non-homogeneous continuity equations, providing a consistent theoretical basis for the physical interpretation of its non-Arrhenius behavior. More precisely, we obtain the generalized drag and diffusion coefficients that allow us to model a wide range of non-Arrhenius processes. This provides a reliable measurement of the degree of fragility of the system and an estimation of the fragile-to-strong transition in glass-forming liquids, as well as a generalized Stokes-Einstein equation, leading to a better understanding of the classical and quantum effects on the dynamics of non-additive stochastic systems.
△ Less
Submitted 8 May, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
Evolving Affine Evolutoids
Authors:
Ady Cambraia Junior,
Abílio Lemos
Abstract:
The envelope of straight lines affine normal to a plane curve C is its affine evolute; the envelope of the affine lines tangent to C is the original curve, together with the entire affine tangent line at each inflexion of C. In this paper, we consider plane curves without inflexions. We use some techniques of singularity theory to explain how the first envelope turns into the second, as the (const…
▽ More
The envelope of straight lines affine normal to a plane curve C is its affine evolute; the envelope of the affine lines tangent to C is the original curve, together with the entire affine tangent line at each inflexion of C. In this paper, we consider plane curves without inflexions. We use some techniques of singularity theory to explain how the first envelope turns into the second, as the (constant) slope between the set of lines forming the envelope and the set of affine tangents to C changes from 0 to 1. In particular, we guarantee the existence of the first slope for which singularities occur. Moreover, we explain how these singularities evolve in the discriminant surface.
△ Less
Submitted 4 May, 2017;
originally announced May 2017.
-
Envelope of mid-planes of a surface and some classical notions of affine differential geometry
Authors:
Ady Cambraia Junior,
Marcos Craizer
Abstract:
For a pair of points in a smooth locally convex surface in 3-space, its mid-plane is the plane containing its mid-point and the intersection line of the corresponding pair of tangent planes. In this paper we show that the limit of mid-planes when one point tends to the other along a direction is the Transon plane of the direction. Moreover, the limit of the envelope of mid-planes is non-empty for…
▽ More
For a pair of points in a smooth locally convex surface in 3-space, its mid-plane is the plane containing its mid-point and the intersection line of the corresponding pair of tangent planes. In this paper we show that the limit of mid-planes when one point tends to the other along a direction is the Transon plane of the direction. Moreover, the limit of the envelope of mid-planes is non-empty for at most six directions, and, in this case, it coincides with the center of the Moutard's quadric. These results establish an unexpected connection between these classical notions of affine differential geometry and the apparently unrelated concept of envelope of mid-planes. We call the limit of envelope of mid-planes the affine mid-planes evolute and prove that, under some generic conditions, it is a regular surface in 3-space.
△ Less
Submitted 5 May, 2017; v1 submitted 10 March, 2015;
originally announced March 2015.