Search | arXiv e-print repository

arXiv:2012.02643 [pdf, other]

Predicting Emotions Perceived from Sounds

Authors: Faranak Abri, Luis Felipe Gutiérrez, Akbar Siami Namin, David R. W. Sears, Keith S. Jones

Abstract: Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as… ▽ More Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as a complement to visualization techniques. Through auditory perception it is possible to convey information related to temporal, spatial, or some other context-oriented information. An important research question is whether the emotions perceived from these auditory icons or earcons are predictable in order to build an automated sonification platform. This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed to study the prediction of emotions perceived from sounds. To do so, the key features of sounds are captured and then are modeled using machine learning algorithms using feature reduction techniques. We observe that it is possible to predict perceived emotions with high accuracy. In particular, the regression based on Random Forest demonstrated its superiority compared to other machine learning algorithms. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 10 pages

arXiv:2006.15399 [pdf, other]

doi 10.1080/17459737.2020.1785568

Beneath (or beyond) the surface: Discovering voice-leading patterns with skip-grams

Authors: David R. W. Sears, Gerhard Widmer

Abstract: Recurrent voice-leading patterns like the Mi-Re-Do compound cadence (MRDCC) rarely appear on the musical surface in complex polyphonic textures, so finding these patterns using computational methods remains a tremendous challenge. The present study extends the canonical n-gram approach by using skip-grams, which include sub-sequences in an n-gram list if their constituent members occur within a ce… ▽ More Recurrent voice-leading patterns like the Mi-Re-Do compound cadence (MRDCC) rarely appear on the musical surface in complex polyphonic textures, so finding these patterns using computational methods remains a tremendous challenge. The present study extends the canonical n-gram approach by using skip-grams, which include sub-sequences in an n-gram list if their constituent members occur within a certain number of skips. We compiled four data sets of Western tonal music consisting of symbolic encodings of the notated score and a recorded performance, created a model pipeline for defining, counting, filtering, and ranking skip-grams, and ranked the position of the MRDCC in every possible model configuration. We found that the MRDCC receives a higher rank in the list when the pipeline employs 5 skips, filters the list by excluding n-gram types that do not reflect a genuine harmonic change between adjacent members, and ranks the remaining types using a statistical association measure. △ Less

Submitted 27 June, 2020; originally announced June 2020.

Comments: This is an original manuscript / preprint of an article published by Taylor & Francis in the Journal of Mathematics and Music, available online: https://doi.org/10.1080/17459737.2020.1785568. 26 pages, 8 figures, 3 tables

arXiv:1807.06700 [pdf]

Psychological constraints on string-based methods for pattern discovery in polyphonic corpora

Authors: David R. W. Sears, Gerhard Widmer

Abstract: Researchers often divide symbolic music corpora into contiguous sequences of n events (called n-grams) for the purposes of pattern discovery, key finding, classification, and prediction. What is more, several studies have reported improved task performance when using psychologically motivated weighting functions, which adjust the count to privilege n-grams featuring more salient or memorable event… ▽ More Researchers often divide symbolic music corpora into contiguous sequences of n events (called n-grams) for the purposes of pattern discovery, key finding, classification, and prediction. What is more, several studies have reported improved task performance when using psychologically motivated weighting functions, which adjust the count to privilege n-grams featuring more salient or memorable events (e.g., Krumhansl, 1990). However, these functions have yet to appear in harmonic pattern discovery algorithms, which attempt to discover the most recurrent chord progressions in complex polyphonic corpora. This study examines whether psychologically-motivated weighting functions can improve harmonic pattern discovery algorithms. Models using various n-gram selection methods, weighting functions, and ranking algorithms attempt to discover the most conventional closing harmonic progression in the common-practice period, ii6-"I64"-V7-I, with the progression's mean reciprocal rank serving as an evaluation metric for model comparison. △ Less

Submitted 17 July, 2018; originally announced July 2018.

Comments: Extended abstract

arXiv:1806.08724 [pdf, other]

Evaluating language models of tonal harmony

Authors: David R. W. Sears, Filip Korzeniowski, Gerhard Widmer

Abstract: This study borrows and extends probabilistic language models from natural language processing to discover the syntactic properties of tonal harmony. Language models come in many shapes and sizes, but their central purpose is always the same: to predict the next event in a sequence of letters, words, notes, or chords. However, few studies employing such models have evaluated the most state-of-the-a… ▽ More This study borrows and extends probabilistic language models from natural language processing to discover the syntactic properties of tonal harmony. Language models come in many shapes and sizes, but their central purpose is always the same: to predict the next event in a sequence of letters, words, notes, or chords. However, few studies employing such models have evaluated the most state-of-the-art architectures using a large-scale corpus of Western tonal music, instead preferring to use relatively small datasets containing chord annotations from contemporary genres like jazz, pop, and rock. Using symbolic representations of prominent instrumental genres from the common-practice period, this study applies a flexible, data-driven encoding scheme to (1) evaluate Finite Context (or n-gram) models and Recurrent Neural Networks (RNNs) in a chord prediction task; (2) compare predictive accuracy from the best-performing models for chord onsets from each of the selected datasets; and (3) explain differences between the two model architectures in a regression analysis. We find that Finite Context models using the Prediction by Partial Match (PPM) algorithm outperform RNNs, particularly for the piano datasets, with the regression model suggesting that RNNs struggle with particularly rare chord types. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: 7 pages, 4 figures, 3 tables. To appear in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France

arXiv:1804.01849 [pdf, other]

A Large-Scale Study of Language Models for Chord Prediction

Authors: Filip Korzeniowski, David R. W. Sears, Gerhard Widmer

Abstract: We conduct a large-scale study of language models for chord prediction. Specifically, we compare N-gram models to various flavours of recurrent neural networks on a comprehensive dataset comprising all publicly available datasets of annotated chords known to us. This large amount of data allows us to systematically explore hyper-parameter settings for the recurrent neural networks---a crucial step… ▽ More We conduct a large-scale study of language models for chord prediction. Specifically, we compare N-gram models to various flavours of recurrent neural networks on a comprehensive dataset comprising all publicly available datasets of annotated chords known to us. This large amount of data allows us to systematically explore hyper-parameter settings for the recurrent neural networks---a crucial step in achieving good results with this model class. Our results show not only a quantitative difference between the models, but also a qualitative one: in contrast to static N-gram models, certain RNN configurations adapt to the songs at test time. This finding constitutes a further step towards the development of chord recognition systems that are more aware of local musical context than what was previously possible. △ Less

Submitted 5 April, 2018; originally announced April 2018.

Comments: Accepted at ICASSP 2018

Showing 1–5 of 5 results for author: Sears, D R W