Search | arXiv e-print repository

Del Visual al Auditivo: Sonorización de Escenas Guiada por Imagen

Authors: María Sánchez, Laura Fernández, Julián Arias, Mateo Cámara, Giulia Comini, Adam Gabrys, José Luis Blanco, Juan Ignacio Godino, Luis Alfonso Hernández

Abstract: Recent advances in image, video, text and audio generative techniques, and their use by the general public, are leading to new forms of content generation. Usually, each modality was approached separately, which poses limitations. The automatic sound recording of visual sequences is one of the greatest challenges for the automatic generation of multimodal content. We present a processing flow that… ▽ More Recent advances in image, video, text and audio generative techniques, and their use by the general public, are leading to new forms of content generation. Usually, each modality was approached separately, which poses limitations. The automatic sound recording of visual sequences is one of the greatest challenges for the automatic generation of multimodal content. We present a processing flow that, starting from images extracted from videos, is able to sound them. We work with pre-trained models that employ complex encoders, contrastive learning, and multiple modalities, allowing complex representations of the sequences for their sonorization. The proposed scheme proposes different possibilities for audio map** and text guidance. We evaluated the scheme on a dataset of frames extracted from a commercial video game and sounds extracted from the Freesound platform. Subjective tests have evidenced that the proposed scheme is able to generate and assign audios automatically and conveniently to images. Moreover, it adapts well to user preferences, and the proposed objective metrics show a high correlation with the subjective ratings. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 10 pages, in Spanish, Tecniacústica

arXiv:2401.05462 [pdf, other]

doi 10.1145/3605098.3635989

Petri Nets for Smart Grids: The Story So Far

Authors: Mouzhi Ge, Bruno Rossi, Stanislav Chren, José Miguel Blanco

Abstract: Since the energy domain is in a transformative shift towards sustainability, the integration of new technologies and smart systems into traditional power grids has emerged. As an effective approach, Petri Nets (PN) have been applied to model and analyze the complex dynamics in Smart Grid (SG) environments. However, we are currently missing an overview of types of PNs applied to different areas and… ▽ More Since the energy domain is in a transformative shift towards sustainability, the integration of new technologies and smart systems into traditional power grids has emerged. As an effective approach, Petri Nets (PN) have been applied to model and analyze the complex dynamics in Smart Grid (SG) environments. However, we are currently missing an overview of types of PNs applied to different areas and problems related to SGs. Therefore, this paper proposes four fundamental research questions related to the application areas of PNs in SGs, PNs types, aspects modelled by PNs in the identified areas, and the validation methods in the evaluation. The answers to the research questions are derived from a comprehensive and interdisciplinary literature analysis. The results capture a valuable overview of PNs applications in the global energy landscape and can offer indications for future research directions. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2310.16140 [pdf]

IA Para el Mantenimiento Predictivo en Canteras: Modelado

Authors: Fernando Marcos, Rodrigo Tamaki, Mateo Cámara, Virginia Yagüe, José Luis Blanco

Abstract: Dependence on raw materials, especially in the mining sector, is a key part of today's economy. Aggregates are vital, being the second most used raw material after water. Digitally transforming this sector is key to optimizing operations. However, supervision and maintenance (predictive and corrective) are challenges little explored in this sector, due to the particularities of the sector, machine… ▽ More Dependence on raw materials, especially in the mining sector, is a key part of today's economy. Aggregates are vital, being the second most used raw material after water. Digitally transforming this sector is key to optimizing operations. However, supervision and maintenance (predictive and corrective) are challenges little explored in this sector, due to the particularities of the sector, machinery and environmental conditions. All this, despite the successes achieved in other scenarios in monitoring with acoustic and contact sensors. We present an unsupervised learning scheme that trains a variational autoencoder model on a set of sound records. This is the first such dataset collected during processing plant operations, containing information from different points of the processing line. Our results demonstrate the model's ability to reconstruct and represent in latent space the recorded sounds, the differences in operating conditions and between different equipment. In the future, this should facilitate the classification of sounds, as well as the detection of anomalies and degradation patterns in the operation of the machinery. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 10 pages, in Spanish language, 5 figures. Presented in Tecniacustica 2023 conference (Cuenca, Spain)

arXiv:2310.15663 [pdf]

FOLEY-VAE: Generación de efectos de audio para cine con inteligencia artificial

Authors: Mateo Cámara, José Luis Blanco

Abstract: In this research, we present an interface based on Variational Autoencoders trained with a wide range of natural sounds for the innovative creation of Foley effects. The model can transfer new sound features to prerecorded audio or microphone-captured speech in real time. In addition, it allows interactive modification of latent variables, facilitating precise and customized artistic adjustments.… ▽ More In this research, we present an interface based on Variational Autoencoders trained with a wide range of natural sounds for the innovative creation of Foley effects. The model can transfer new sound features to prerecorded audio or microphone-captured speech in real time. In addition, it allows interactive modification of latent variables, facilitating precise and customized artistic adjustments. Taking as a starting point our previous study on Variational Autoencoders presented at this same congress last year, we analyzed an existing implementation: RAVE [1]. This model has been specifically trained for audio effects production. Various audio effects have been successfully generated, ranging from electromagnetic, science fiction, and water sounds, among others published with this work. This innovative approach has been the basis for the artistic creation of the first Spanish short film with sound effects assisted by artificial intelligence. This milestone illustrates palpably the transformative potential of this technology in the film industry, opening the door to new possibilities for sound creation and the improvement of artistic quality in film productions. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 9 pages, in Spanish, Tecniacústica

arXiv:2309.14761 [pdf, other]

Optimization Techniques for a Physical Model of Human Vocalisation

Authors: Mateo Cámara, Zhiyuan Xu, Yisu Zong, José Luis Blanco, Joshua D. Reiss

Abstract: We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between rea… ▽ More We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and computational demand. Results show that genetic and swarm optimizers outperform least squares algorithms at the cost of executing slower and that specific combinations of optimizers and audio representations offer significantly different results. The proposed methodology could be used in benchmarking other physical models and audio types. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted to DAFx 2023

arXiv:2207.02978 [pdf, other]

Extending Logical Neural Networks using First-Order Theories

Authors: Aidan Evans, Jorge Blanco

Abstract: Logical Neural Networks (LNNs) are a type of architecture which combine a neural network's abilities to learn and systems of formal logic's abilities to perform symbolic reasoning. LLNs provide programmers the ability to implicitly modify the underlying structure of the neural network via logical formulae. In this paper, we take advantage of this abstraction to extend LNNs to support equality and… ▽ More Logical Neural Networks (LNNs) are a type of architecture which combine a neural network's abilities to learn and systems of formal logic's abilities to perform symbolic reasoning. LLNs provide programmers the ability to implicitly modify the underlying structure of the neural network via logical formulae. In this paper, we take advantage of this abstraction to extend LNNs to support equality and function symbols via first-order theories. This extension improves the power of LNNs by significantly increasing the types of problems they can tackle. As a proof of concept, we add support for the first-order theory of equality to IBM's LNN library and demonstrate how the introduction of this allows the LNN library to now reason about expressions without needing to make the unique-names assumption. △ Less

Submitted 11 August, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: 7 pages, 4 figures

arXiv:2104.14614 [pdf, other]

doi 10.1016/j.jss.2021.110985

A holistic approach for cross-platform software development

Authors: Juliano Zanuzzio Blanco, Daniel Lucrédio

Abstract: Cross-platform development solutions can help to make software available on different devices and platforms. But these are normally restricted to preconfigured platforms and consider that each individual solution is equal or similar to each other. As a result, developers have to resort to native development and build individual solutions, one for each device/platform, that cooperate to deliver the… ▽ More Cross-platform development solutions can help to make software available on different devices and platforms. But these are normally restricted to preconfigured platforms and consider that each individual solution is equal or similar to each other. As a result, developers have to resort to native development and build individual solutions, one for each device/platform, that cooperate to deliver the desired global functionality. This article presents an approach that takes advantage of existing solutions and have support for extending and including new platforms, and distributing functionality across devices. The approach is based on a general-purpose language that raises the abstraction level in order to keep the software free from platform details. Automatic transformations produce executable code that can be properly divided and deployed separately into different platforms. The proposed approach was evaluated in four ways. In the first evaluation, an existing cross-platform system was recreated using the approach. The second and third evaluations was conducted with expert and novice developers, who tested the approach in practice. The fourth evaluation introduced support for cross-platform testing. Results have brought evidence supporting the following main contributions: use of a single environment, the ability to reuse similar concepts between platforms and the potential to reduce costs. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: 18 pages, 6 figures, Journal of Systems and Software

Journal ref: https://doi.org/10.1016/j.jss.2021.110985

arXiv:1805.11472 [pdf, other]

Comparison of 1D and 3D Models for the Estimation of Fractional Flow Reserve

Authors: P. J. Blanco, C. A. Bulant, L. O. Müller, G. D. Maso Talou, C. Guedes Bezerra, P. L. Lemos, R. A. Feijóo

Abstract: In this work we propose to validate the predictive capabilities of one-dimensional (1D) blood flow models with full three-dimensional (3D) models in the context of patient-specific coronary hemodynamics in hyperemic conditions. Such conditions mimic the state of coronary circulation during the acquisition of the Fractional Flow Reserve (FFR) index. Demonstrating that 1D models accurately reproduce… ▽ More In this work we propose to validate the predictive capabilities of one-dimensional (1D) blood flow models with full three-dimensional (3D) models in the context of patient-specific coronary hemodynamics in hyperemic conditions. Such conditions mimic the state of coronary circulation during the acquisition of the Fractional Flow Reserve (FFR) index. Demonstrating that 1D models accurately reproduce FFR estimates obtained with 3D models has implications in the approach to computationally estimate FFR. To this end, a sample of 20 patients was employed from which 29 3D geometries of arterial trees were constructed, 9 obtained from coronary computed tomography angiography (CCTA) and 20 from intra-vascular ultrasound (IVUS). For each 3D arterial model, a 1D counterpart was generated. The same outflow and inlet pressure boundary conditions were applied to both (3D and 1D) models. In the 1D setting, pressure losses at stenoses and bifurcations were accounted for through specific lumped models. Comparisons between 1D models ($\text{FFR}_{\text{1D}}$) and 3D models ($\text{FFR}_{\text{3D}}$) were performed in terms of predicted $\text{FFR}$ value. Compared to $\text{FFR}_{\text{3D}}$, $\text{FFR}_{\text{1D}}$ resulted with a difference of 0.00$\pm$0.03 and overall predictive capability AUC, Acc, Spe, Sen, PPV and NPV of 0.97, 0.98, 0.90, 0.99, 0.82, and 0.99, with an FFR threshold of 0.8. We conclude that inexpensive $\text{FFR}_{\text{1D}}$ simulations can be reliably used as a surrogate of demanding $\text{FFR}_{\text{3D}}$ computations. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: 11 pages, 2 figures, 2 tables, submited to Scientific Reports

arXiv:1805.10273 [pdf, other]

Towards a glaucoma risk index based on simulated hemodynamics from fundus images

Authors: José Ignacio Orlando, João Barbosa Breda, Karel van Keer, Matthew B. Blaschko, Pablo J. Blanco, Carlos A. Bulant

Abstract: Glaucoma is the leading cause of irreversible but preventable blindness in the world. Its major treatable risk factor is the intra-ocular pressure, although other biomarkers are being explored to improve the understanding of the pathophysiology of the disease. It has been recently observed that glaucoma induces changes in the ocular hemodynamics. However, its effects on the functional behavior of… ▽ More Glaucoma is the leading cause of irreversible but preventable blindness in the world. Its major treatable risk factor is the intra-ocular pressure, although other biomarkers are being explored to improve the understanding of the pathophysiology of the disease. It has been recently observed that glaucoma induces changes in the ocular hemodynamics. However, its effects on the functional behavior of the retinal arterioles have not been studied yet. In this paper we propose a first approach for characterizing those changes using computational hemodynamics. The retinal blood flow is simulated using a 0D model for a steady, incompressible non Newtonian fluid in rigid domains. The simulation is performed on patient-specific arterial trees extracted from fundus images. We also propose a novel feature representation technique to comprise the outcomes of the simulation stage into a fixed length feature vector that can be used for classification studies. Our experiments on a new database of fundus images show that our approach is able to capture representative changes in the hemodynamics of glaucomatous patients. Code and data are publicly available in https://ignaciorlando.github.io. △ Less

Submitted 25 October, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: MICCAI 2018 paper

Showing 1–9 of 9 results for author: Blanco, J