-
Matrix Profile based Anomaly Detection in Streaming Gait Data for Fall Prevention
Authors:
Branislav Gerazov,
Elena Hadzieva,
Andrei Krivosei,
Fiorella Ines Soto Sanchez,
Jakob Rostovski,
Alar Kuusik,
Mahtab Alam
Abstract:
The automatic detection of gait anomalies can lead to systems that can be used for fall detection and prevention. In this paper, we present a gait anomaly detection system based on the Matrix Profile (MP) algorithm. The MP algorithm is exact, parameter free, simple and efficient, making it a perfect candidate for on the edge deployment. We propose a gait anomaly detection system that is able to ad…
▽ More
The automatic detection of gait anomalies can lead to systems that can be used for fall detection and prevention. In this paper, we present a gait anomaly detection system based on the Matrix Profile (MP) algorithm. The MP algorithm is exact, parameter free, simple and efficient, making it a perfect candidate for on the edge deployment. We propose a gait anomaly detection system that is able to adapt to an individual's gait pattern and successfully detect anomalous steps with short latency. To evaluate the system we record a small database of enacted anomalous steps. The results show the system outperforms a more complex Neural Network baseline.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI
Authors:
Daniela Janeva,
Stefan Krsteski,
Matea Tashkovska,
Nikola Jovanovski,
Tomislav Kartalov,
Dimitar Taskovski,
Zoran Ivanovski,
Branislav Gerazov
Abstract:
Schizophrenia and bipolar disorder are debilitating psychiatric illnesses that can be challenging to diagnose accurately. The similarities between the diseases make it difficult to differentiate between them using traditional diagnostic tools. Recently, resting-state functional magnetic resonance imaging (rsfMRI) has emerged as a promising tool for the diagnosis of psychiatric disorders. This pape…
▽ More
Schizophrenia and bipolar disorder are debilitating psychiatric illnesses that can be challenging to diagnose accurately. The similarities between the diseases make it difficult to differentiate between them using traditional diagnostic tools. Recently, resting-state functional magnetic resonance imaging (rsfMRI) has emerged as a promising tool for the diagnosis of psychiatric disorders. This paper presents several methods for differentiating schizophrenia and bipolar disorder based on features extracted from rsfMRI data. The system that achieved the best results, uses 1D Convolutional Neural Networks to analyze patterns of Intrinsic Connectivity time courses obtained from rsfMRI and potentially identify biomarkers that distinguish between the two disorders. We evaluate the system's performance on a large dataset of patients with schizophrenia and bipolar disorder and demonstrate that the system achieves a 0.7078 Area Under Curve (AUC) score in differentiating patients with these disorders. Our results suggest that rsfMRI-based classification systems have great potential for improving the accuracy of psychiatric diagnoses and may ultimately lead to more effective treatments for patients with this disorder.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Macedonian Speech Synthesis for Assistive Technology Applications
Authors:
Bojan Sofronievski,
Elena Velovska,
Martin Velichkovski,
Violeta Argirova,
Tea Veljkovikj,
Risto Chavdarov,
Stefan Janev,
Kristijan Lazarev,
Toni Bachvarovski,
Zoran Ivanovski,
Dimitar Tashkovski,
Branislav Gerazov
Abstract:
Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world…
▽ More
Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.
△ Less
Submitted 18 June, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Exploration strategies for articulatory synthesis of complex syllable onsets
Authors:
Daniel R. van Niekerk,
Anqi Xu,
Branislav Gerazov,
Paul K. Krug,
Peter Birkholz,
Yi Xu
Abstract:
High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, develo** appropriate map**s from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these map**s without manual intervention. We demonstrate the product…
▽ More
High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, develo** appropriate map**s from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these map**s without manual intervention. We demonstrate the production of syllables with complex onsets and discuss the quality of the articulatory gestures with reference to coarticulation.
△ Less
Submitted 30 June, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Scorpiano -- A System for Automatic Music Transcription for Monophonic Piano Music
Authors:
Bojan Sofronievski,
Branislav Gerazov
Abstract:
Music transcription is the process of transcribing music audio into music notation. It is a field in which the machines still cannot beat human performance. The main motivation for automatic music transcription is to make it possible for anyone playing a musical instrument, to be able to generate the music notes for a piece of music quickly and accurately. It does not matter if the person is a beg…
▽ More
Music transcription is the process of transcribing music audio into music notation. It is a field in which the machines still cannot beat human performance. The main motivation for automatic music transcription is to make it possible for anyone playing a musical instrument, to be able to generate the music notes for a piece of music quickly and accurately. It does not matter if the person is a beginner and simply struggles to find the music score by searching, or an expert who heard a live jazz improvisation and would like to reproduce it without losing time doing manual transcription. We propose Scorpiano -- a system that can automatically generate a music score for simple monophonic piano melody tracks using digital signal processing. The system integrates multiple digital audio processing methods: notes onset detection, tempo estimation, beat detection, pitch detection and finally generation of the music score. The system has proven to give good results for simple piano melodies, comparable to commercially available neural network based systems.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
ProsoBeast Prosody Annotation Tool
Authors:
Branislav Gerazov,
Michael Wagner
Abstract:
The labelling of speech corpora is a laborious and time-consuming process. The ProsoBeast Annotation Tool seeks to ease and accelerate this process by providing an interactive 2D representation of the prosodic landscape of the data, in which contours are distributed based on their similarity. This interactive map allows the user to inspect and label the utterances. The tool integrates several stat…
▽ More
The labelling of speech corpora is a laborious and time-consuming process. The ProsoBeast Annotation Tool seeks to ease and accelerate this process by providing an interactive 2D representation of the prosodic landscape of the data, in which contours are distributed based on their similarity. This interactive map allows the user to inspect and label the utterances. The tool integrates several state-of-the-art methods for dimensionality reduction and feature embedding, including variational autoencoders. The user can use these to find a good representation for their data. In addition, as most of these methods are stochastic, each can be used to generate an unlimited number of different prosodic maps. The web app then allows the user to seamlessly switch between these alternative representations in the annotation process. Experiments with a sample prosodically rich dataset have shown that the tool manages to find good representations of varied data and is helpful both for annotation and label correction. The tool is released as free software for use by the community.
△ Less
Submitted 15 June, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels
Authors:
Branislav Gerazov,
Daniel van Niekerk,
Anqi Xu,
Paul Konstantin Krug,
Peter Birkholz,
Yi Xu
Abstract:
The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoust…
▽ More
The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics and also to offer insight to perceptual results.
△ Less
Submitted 2 April, 2021; v1 submitted 20 May, 2020;
originally announced May 2020.
-
A Variational Prosody Model for Map** the Context-Sensitive Variation of Functional Prosodic Prototypes
Authors:
Branislav Gerazov,
Gérard Bailly,
Omar Mohammed,
Yi Xu,
Philip N. Garner
Abstract:
The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end map**s using millions of tunable parameters. The shi…
▽ More
The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end map**s using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the context-sensitive variation of the constituent elementary prosodic contours. We show that the VPM can give insight into the intrinsic variability of these prosodic prototypes through learning a meaningful prosodic latent space representation structure. We also show that the VPM is able to capture prosodic phenomena that have multiple dimensions of context based variability. Since it is based on the principle of superposition, the VPM does not necessitate the use of specially crafted corpora for the analysis, opening up the possibilities of using big data for prosody analysis. In a speech synthesis scenario, the model can be used to generate a dynamic and natural prosody contour that is devoid of averaging effects.
△ Less
Submitted 18 March, 2019; v1 submitted 22 June, 2018;
originally announced June 2018.
-
A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours
Authors:
Branislav Gerazov,
Gérard Bailly,
Yi Xu
Abstract:
The way speech prosody encodes linguistic, paralinguistic and non-linguistic information via multiparametric representations of the speech signals is still an open issue. The Superposition of Functional Contours (SFC) model proposes to decompose prosody into elementary multiparametric functional contours through the iterative training of neural network contour generators using analysis-by-synthesi…
▽ More
The way speech prosody encodes linguistic, paralinguistic and non-linguistic information via multiparametric representations of the speech signals is still an open issue. The Superposition of Functional Contours (SFC) model proposes to decompose prosody into elementary multiparametric functional contours through the iterative training of neural network contour generators using analysis-by-synthesis. Each generator is responsible for computing multiparametric contours that encode one given linguistic, paralinguistic and non-linguistic information on a variable scope of rhythmic units. The contributions of all generators' outputs are then overlapped and added to produce the prosody of the utterance. We propose an extension of the contour generators that allows them to model the prominence of the elementary contours based on contextual information. WSFC jointly learns the patterns of the elementary multiparametric functional contours and their weights dependent on the contours' contexts. The experimental results show that the proposed weighted SFC (WSFC) model can successfully capture contour prominence and thus improve SFC modelling performance. The WSFC is also shown to be effective at modelling the impact of attitudes on the prominence of functional contours cuing syntactic relations in French, and that of emphasis on the prominence of tone contours in Chinese.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.