Skip to main content

Showing 1–11 of 11 results for author: Beltran, J R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.08510  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications

    Authors: David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen

    Abstract: Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach re… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at Forum Acusticum 2023

  2. arXiv:2303.13881  [pdf, other

    cs.SD cs.AI eess.AS

    Symbolic Music Structure Analysis with Graph Representations and Changepoint Detection Methods

    Authors: Carlos Hernandez-Olivan, Sonia Rubio Llamas, Jose R. Beltran

    Abstract: Music Structure Analysis is an open research task in Music Information Retrieval (MIR). In the past, there have been several works that attempt to segment music into the audio and symbolic domains, however, the identification and segmentation of the music structure at different levels is still an open research problem in this area. In this work we propose three methods, two of which are novel grap… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  3. arXiv:2210.13944  [pdf, other

    cs.AI cs.SD eess.AS

    A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives

    Authors: Carlos Hernandez-Olivan, Javier Hernandez-Olivan, Jose R. Beltran

    Abstract: Music is one of the Gardner's intelligences in his theory of multiple intelligences. How humans perceive and understand music is still being studied and is crucial to develop artificial intelligence models that imitate such processes. Music generation with Artificial Intelligence is an emerging field that is gaining much attention in the recent years. In this paper, we describe how humans compose… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Under review

  4. arXiv:2209.07974  [pdf, other

    cs.SD cs.MM eess.AS

    musicaiz: A Python Library for Symbolic Music Generation, Analysis and Visualization

    Authors: Carlos Hernandez-Olivan, Jose R. Beltran

    Abstract: In this article, we present musicaiz, an object-oriented library for analyzing, generating and evaluating symbolic music. The submodules of the package allow the user to create symbolic music data from scratch, build algorithms to analyze symbolic music, encode MIDI data as tokens to train deep learning sequence models, modify existing music data and evaluate music generation systems. The evaluati… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  5. arXiv:2203.16940  [pdf

    eess.AS cs.LG cs.SD eess.SP

    Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs

    Authors: David Diaz-Guerra, Antonio Miguel, Jose R. Beltran

    Abstract: In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of s… ▽ More

    Submitted 6 December, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: The code to reproduce this work can be found in our GitHub repository: https://github.com/DavidDiazGuerra/icoDOA

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 313-321, 2023

  6. arXiv:2203.14641  [pdf, other

    cs.SD cs.AI

    Subjective Evaluation of Deep Learning Models for Symbolic Music Composition

    Authors: Carlos Hernandez-Olivan, Jorge Abadias Puyuelo, Jose R. Beltran

    Abstract: Deep learning models are typically evaluated to measure and compare their performance on a given task. The metrics that are commonly used to evaluate these models are standard metrics that are used for different tasks. In the field of music composition or generation, the standard metrics used in other fields have no clear meaning in terms of music theory. In this paper, we propose a subjective met… ▽ More

    Submitted 3 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Workshop on Generative AI and HCI, CHI 2022

  7. arXiv:2108.12290  [pdf, other

    cs.SD cs.AI eess.AS

    Music Composition with Deep Learning: A Review

    Authors: Carlos Hernandez-Olivan, Jose R. Beltran

    Abstract: Generating a complex work of art such as a musical composition requires exhibiting true creativity that depends on a variety of factors that are related to the hierarchy of musical language. Music generation have been faced with Algorithmic methods and recently, with Deep Learning models that are being used in other fields such as Computer Vision. In this paper we want to put into context the exis… ▽ More

    Submitted 7 September, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

  8. arXiv:2107.06231  [pdf, other

    cs.SD cs.LG eess.AS

    Timbre Classification of Musical Instruments with a Deep Learning Multi-Head Attention-Based Model

    Authors: Carlos Hernandez-Olivan, Jose R. Beltran

    Abstract: The aim of this work is to define a model based on deep learning that is able to identify different instrument timbres with as few parameters as possible. For this purpose, we have worked with classical orchestral instruments played with different dynamics, which are part of a few instrument families and which play notes in the same pitch range. It has been possible to assess the ability to classi… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

  9. arXiv:2008.07527  [pdf, other

    eess.AS cs.LG cs.SD

    Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

    Authors: Carlos Hernandez-Olivan, Jose R. Beltran, David Diaz-Guerra

    Abstract: The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of structural boundaries of the music pieces. This structural boundary analysis has recently been studied with unsupervised methods and \textit{end-to-end} techniques such as Convolutional Neural Networks (CNN) us… ▽ More

    Submitted 1 December, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Journal ref: International Journal of Interactive Multimedia & Artificial Intelligence (2021), vol. 7, no 2, p. 78-88

  10. arXiv:2006.09006  [pdf

    eess.AS cs.LG cs.SD eess.SP

    Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks

    Authors: David Diaz-Guerra, Antonio Miguel, Jose R. Beltran

    Abstract: In this paper, we present a new single sound source DOA estimation and tracking system based on the well-known SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network. It uses SRP-PHAT power maps as input features of a fully convolutional causal architecture that uses 3D convolutional layers to accurately perform the tracking of a sound source even in highly reverberant scenarios w… ▽ More

    Submitted 16 December, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: This is a pre-print of an article published in IEEE/ACM Transactions on Audio Speech and Language Processing. The code to reproduce this work can be found in our GitHub repository: https://github.com/DavidDiazGuerra/Cross3D

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 300-311, 2021

  11. gpuRIR: A Python Library for Room Impulse Response Simulation with GPU Acceleration

    Authors: David Diaz-Guerra, Antonio Miguel, Jose R. Beltran

    Abstract: The Image Source Method (ISM) is one of the most employed techniques to calculate acoustic Room Impulse Responses (RIRs), however, its computational complexity grows fast with the reverberation time of the room and its computation time can be prohibitive for some applications where a huge number of RIRs are needed. In this paper, we present a new implementation that dramatically improves the compu… ▽ More

    Submitted 9 October, 2020; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: This is a pre-print of an article published in Multimedia Tools and Applications (2020)