Search | arXiv e-print repository

EKM: An exact, polynomial-time algorithm for the $K$-medoids problem

Abstract: The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case… ▽ More The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case $O\left(N^{K+1}\right)$ time complexity. EKM is developed according to recent advances in transformational programming and combinatorial generation, using formal program derivation steps. The derived algorithm is provably correct by construction. We demonstrate the effectiveness of our algorithm by comparing it against various approximate methods on numerous real-world datasets. We show that the wall-clock run time of our algorithm matches the worst-case time complexity analysis on synthetic datasets, clearly outperforming the exponential time complexity of benchmark branch-and-bound based MIP solvers. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this ubiquitous problem. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2403.09580 [pdf, ps, other]

Algorithmic syntactic causal identification

Authors: Dhurim Cakiqi, Max A. Little

Abstract: Causal identification in causal Bayes nets (CBNs) is an important tool in causal inference allowing the derivation of interventional distributions from observational distributions where this is possible in principle. However, most existing formulations of causal identification using techniques such as d-separation and do-calculus are expressed within the mathematical language of classical probabil… ▽ More Causal identification in causal Bayes nets (CBNs) is an important tool in causal inference allowing the derivation of interventional distributions from observational distributions where this is possible in principle. However, most existing formulations of causal identification using techniques such as d-separation and do-calculus are expressed within the mathematical language of classical probability theory on CBNs. However, there are many causal settings where probability theory and hence current causal identification techniques are inapplicable such as relational databases, dataflow programs such as hardware description languages, distributed systems and most modern machine learning algorithms. We show that this restriction can be lifted by replacing the use of classical probability theory with the alternative axiomatic foundation of symmetric monoidal categories. In this alternative axiomatization, we show how an unambiguous and clean distinction can be drawn between the general syntax of causal models and any specific semantic implementation of that causal model. This allows a purely syntactic algorithmic description of general causal identification by a translation of recent formulations of the general ID algorithm through fixing. Our description is given entirely in terms of the non-parametric ADMG structure specifying a causal model and the algebraic signature of the corresponding monoidal category, to which a sequence of manipulations is then applied so as to arrive at a modified monoidal category in which the desired, purely syntactic interventional causal model, is obtained. We use this idea to derive purely syntactic analogues of classical back-door and front-door causal adjustment, and illustrate an application to a more complex causal model. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 TikZ figures

arXiv:2401.05159 [pdf, other]

Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN

Authors: Muhammad Ali Farooq, Wang Yao, Michael Schukat, Mark A Little, Peter Corcoran

Abstract: This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanc… ▽ More This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning and a small amount of data representation in text-to-image latent diffusion models. The optimally tuned model is further used for rendering high-quality skin lesion synthetic data with diverse and realistic characteristics, providing a valuable supplement and diversity to the existing training data. We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models, assessing its effectiveness in enhancing model performance and generalization to unseen real-world data. Our experimental results demonstrate the efficacy of the synthetic data generated through stable diffusion models helps in improving the robustness and adaptability of end-to-end CNN and vision transformer models on two different real-world skin lesion datasets. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Paper is submitted in EMBC 2024 Conference

arXiv:2306.12344 [pdf, other]

An efficient, provably exact, practical algorithm for the 0-1 loss linear classification problem

Authors: Xi He, Waheed Ul Rahman, Max A. Little

Abstract: Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Al… ▽ More Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, including the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss) or approximate combinatorial search, none of which can be guaranteed to solve the problem exactly. Finding efficient algorithms to obtain an exact i.e. globally optimal solution for the 0-1 loss linear classification problem with fixed dimension, remains an open problem. In research we report here, we detail the rigorous construction of a new algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in polynomial time. We prove correctness using concepts from the theory of hyperplane arrangements and oriented matroids. We demonstrate the effectiveness of this algorithm on synthetic and real-world datasets, showing optimal accuracy both in and out-of-sample, in practical computational time. We also empirically demonstrate how the use of approximate upper bound leads to polynomial time run-time improvements to the algorithm whilst retaining exactness. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this long-standing problem. △ Less

Submitted 2 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: 19 pages, 3 figures

arXiv:2211.11294 [pdf, other]

TSDF: A simple yet comprehensive, unified data storage and exchange format standard for digital biosensor data in health applications

Authors: Kasper Claes, Valentina Ticcinelli, Reham Badawy, Yordan P. Raykov, Luc J. W. Evers, Max A. Little

Abstract: Digital sensors are increasingly being used to monitor the change over time of physiological processes in biological health and disease, often using wearable devices. This generates very large amounts of digital sensor data, for which, a consensus on a common storage, exchange and archival data format standard, has yet to be reached. To address this gap, we propose Time Series Data Format (TSDF):… ▽ More Digital sensors are increasingly being used to monitor the change over time of physiological processes in biological health and disease, often using wearable devices. This generates very large amounts of digital sensor data, for which, a consensus on a common storage, exchange and archival data format standard, has yet to be reached. To address this gap, we propose Time Series Data Format (TSDF): a unified, standardized format for storing all types of physiological sensor data, across diverse disease areas. We pose a series of format design criteria and review in detail current storage and exchange formats. When judged against these criteria, we find these current formats lacking, and propose a very simple, intuitive standard for both numerical sensor data and metadata, based on raw binary data and JSON-format text files, for sensor measurements/timestamps and metadata, respectively. By focusing on the common characteristics of diverse biosensor data, we define a set of necessary and sufficient metadata fields for storing, processing, exchanging, archiving and reliably interpreting, multi-channel biological time series data. Our aim is for this standardized format to increase the interpretability and exchangeability of data, thereby contributing to scientific reproducibility in studies where digital biosensor data forms a key evidence base. △ Less

Submitted 22 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2208.04315 [pdf]

Patient-Specific Game-Based Transfer Method for Parkinson's Disease Severity Prediction

Authors: Zaifa Xue, Huibin Lu, Tao Zhang, Max A. Little

Abstract: Dysphonia is one of the early symptoms of Parkinson's disease (PD). Most existing methods use feature selection methods to find the optimal subset of voice features for all PD patients. Few have considered the heterogeneity between patients, which implies the need to provide specific prediction models for different patients. However, building the specific model faces the challenge of small sample… ▽ More Dysphonia is one of the early symptoms of Parkinson's disease (PD). Most existing methods use feature selection methods to find the optimal subset of voice features for all PD patients. Few have considered the heterogeneity between patients, which implies the need to provide specific prediction models for different patients. However, building the specific model faces the challenge of small sample size, which makes it lack generalization ability. Instance transfer is an effective way to solve this problem. Therefore, this paper proposes a patient-specific game-based transfer (PSGT) method for PD severity prediction. First, a selection mechanism is used to select PD patients with similar disease trends to the target patient from the source domain, which greatly reduces the risk of negative transfer. Then, the contribution of the transferred subjects and their instances to the disease estimation of the target subject is fairly evaluated by the Shapley value, which improves the interpretability of the method. Next, the proportion of valid instances in the transferred subjects is determined, and the instances with higher contribution are transferred to further reduce the difference between the transferred instance subset and the target subject. Finally, the selected subset of instances is added to the training set of the target subject, and the extended data is fed into the random forest to improve the performance of the method. Parkinson's telemonitoring dataset is used to evaluate the feasibility and effectiveness. Experiment results show that the PSGT has better performance in both prediction error and stability over compared methods. △ Less

Submitted 12 August, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

arXiv:2107.01752 [pdf, other]

doi 10.1145/3664828

Dynamic programming by polymorphic semiring algebraic shortcut fusion

Authors: Max A. Little, Xi He, Ugur Kayas

Abstract: Dynamic programming (DP) is an algorithmic design paradigm for the efficient, exact solution of otherwise intractable, combinatorial problems. However, DP algorithm design is often presented in an ad-hoc manner. It is sometimes difficult to justify algorithm correctness. To address this issue, this paper presents a rigorous algebraic formalism for systematically deriving DP algorithms, based on se… ▽ More Dynamic programming (DP) is an algorithmic design paradigm for the efficient, exact solution of otherwise intractable, combinatorial problems. However, DP algorithm design is often presented in an ad-hoc manner. It is sometimes difficult to justify algorithm correctness. To address this issue, this paper presents a rigorous algebraic formalism for systematically deriving DP algorithms, based on semiring polymorphism. We start with a specification, construct an algorithm to compute the required solution which is self-evidently correct because it exhaustively generates and evaluates all possible solutions meeting the specification. We then derive, through the use of shortcut fusion, an implementation of this algorithm which is both efficient and correct. We also demonstrate how, with the use of semiring lifting, the specification can be augmented with combinatorial constraints, showing how these constraints can be fused with the algorithm. We furthermore demonstrate how existing DP algorithms for a given combinatorial problem can be abstracted from their original context and re-purposed. This approach can be applied to the full scope of combinatorial problems expressible in terms of semirings. This includes, for example: optimal probability and Viterbi decoding, probabilistic marginalization, logical inference, fuzzy sets, differentiable softmax, relational and provenance queries. The approach, building on ideas from the existing literature on constructive algorithmics, exploits generic properties of polymorphic functions, tupling and formal sums and algebraic simplifications arising from constraint algebras. We demonstrate the effectiveness of this formalism for some example applications arising in signal processing, bioinformatics and reliability engineering. Python software implementing these algorithms can be downloaded from: http://www.maxlittle.net/software/dppolyalg.zip. △ Less

Submitted 4 January, 2024; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: Updated v22 with revised text

Journal ref: Formal Aspects of Computing, May 2024

arXiv:2102.03885 [pdf, other]

Few-shot time series segmentation using prototype-defined infinite hidden Markov models

Authors: Yazan Qarout, Yordan P. Raykov, Max A. Little

Abstract: We propose a robust framework for interpretable, few-shot analysis of non-stationary sequential data based on flexible graphical models to express the structured distribution of sequential events, using prototype radial basis function (RBF) neural network emissions. A motivational link is demonstrated between prototypical neural network architectures for few-shot learning and the proposed RBF netw… ▽ More We propose a robust framework for interpretable, few-shot analysis of non-stationary sequential data based on flexible graphical models to express the structured distribution of sequential events, using prototype radial basis function (RBF) neural network emissions. A motivational link is demonstrated between prototypical neural network architectures for few-shot learning and the proposed RBF network infinite hidden Markov model (RBF-iHMM). We show that RBF networks can be efficiently specified via prototypes allowing us to express complex nonstationary patterns, while hidden Markov models are used to infer principled high-level Markov dynamics. The utility of the framework is demonstrated on biomedical signal processing applications such as automated seizure detection from EEG data where RBF networks achieve state-of-the-art performance using a fraction of the data needed to train long-short-term memory variational autoencoders. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2009.01231 [pdf, other]

Detecting Parkinson's Disease From an Online Speech-task

Authors: Wasifur Rahman, Sangwu Lee, Md. Saiful Islam, Victor Nikhil Antony, Harshil Ratnu, Mohammad Rafayet Ali, Abdullah Al Mamun, Ellen Wagner, Stella Jensen-Roberts, Max A. Little, Ray Dorsey, Ehsan Hoque

Abstract: In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD). We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) -- from all over the US and beyond. A small portion of the data was collected in a lab setting t… ▽ More In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD). We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) -- from all over the US and beyond. A small portion of the data was collected in a lab setting to compare quality. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet "the quick brown fox jumps over the lazy dog..". We extracted both standard acoustic features (Mel Frequency Cepstral Coefficients (MFCC), jitter and shimmer variants) and deep learning based features from the speech data. Using these features, we trained several machine learning algorithms. We achieved 0.75 AUC (Area Under The Curve) performance on determining presence of self-reported Parkinson's disease by modeling the standard acoustic features through the XGBoost -- a gradient-boosted decision tree model. Further analysis reveal that the widely used MFCC features and a subset of previously validated dysphonia features designed for detecting Parkinson's from verbal phonation task (pronouncing 'ahh') contains the most distinct information. Our model performed equally well on data collected in controlled lab environment as well as 'in the wild' across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with a video/audio enabled device, contributing to equity and access in neurological care. △ Less

Submitted 15 December, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

arXiv:2006.12369 [pdf, other]

Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reduction

Authors: Adam Farooq, Yordan P. Raykov, Petar Raykov, Max A. Little

Abstract: Ubiquitous linear Gaussian exploratory tools such as principle component analysis (PCA) and factor analysis (FA) remain widely used as tools for: exploratory analysis, pre-processing, data visualization and related tasks. However, due to their rigid assumptions including crowding of high dimensional data, they have been replaced in many settings by more flexible and still interpretable latent feat… ▽ More Ubiquitous linear Gaussian exploratory tools such as principle component analysis (PCA) and factor analysis (FA) remain widely used as tools for: exploratory analysis, pre-processing, data visualization and related tasks. However, due to their rigid assumptions including crowding of high dimensional data, they have been replaced in many settings by more flexible and still interpretable latent feature models. The Feature allocation is usually modelled using discrete latent variables assumed to follow either parametric Beta-Bernoulli distribution or Bayesian nonparametric prior. In this work we propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques. The new framework allows for explicit control over the number of features used to express each point and enables a more flexible set of allocation distributions including feature allocations with different sparsity levels. This approach is used to derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction in a wide case of scenarios. We derive both standard Gibbs sampler, as well as, an expectation-maximization inference algorithms that converge orders of magnitude faster to a reasonable point estimate solution. The utility of the proposed aPPCA model is demonstrated for standard PCA tasks such as feature learning, data visualization and data whitening. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features. We also demonstrate an application of the aPPCA to more robust blind source separation for functional magnetic resonance imaging (fMRI). △ Less

Submitted 28 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Interactive demo available at https://colab.research.google.com/drive/1KrrHmAu6mV7tutZtYnpEbVibxs4GCwIo?usp=sharing

ACM Class: I.5.1

arXiv:2004.03047 [pdf, other]

Probabilistic modelling of gait for robust passive monitoring in daily life

Authors: Yordan P. Raykov, Luc J. W. Evers, Reham Badawy, Bastiaan Bloem, Tom M. Heskes, Marjan Meinders, Kasper Claes, Max A. Little

Abstract: Passive monitoring in daily life may provide invaluable insights about a person's health throughout the day. Wearable sensor devices are likely to play a key role in enabling such monitoring in a non-obtrusive fashion. However, sensor data collected in daily life reflects multiple health and behavior related factors together. This creates the need for structured principled analysis to produce reli… ▽ More Passive monitoring in daily life may provide invaluable insights about a person's health throughout the day. Wearable sensor devices are likely to play a key role in enabling such monitoring in a non-obtrusive fashion. However, sensor data collected in daily life reflects multiple health and behavior related factors together. This creates the need for structured principled analysis to produce reliable and interpretable predictions that can be used to support clinical diagnosis and treatment. In this work we develop a principled modelling approach for free-living gait (walking) analysis. Gait is a promising target for non-obtrusive monitoring because it is common and indicative of various movement disorders such as Parkinson's disease (PD), yet its analysis has largely been limited to experimentally controlled lab settings. To locate and characterize stationary gait segments in free living using accelerometers, we present an unsupervised statistical framework designed to segment signals into differing gait and non-gait patterns. Our flexible probabilistic framework combines empirical assumptions about gait into a principled graphical model with all of its merits. We demonstrate the approach on a new video-referenced dataset including unscripted daily living activities of 25 PD patients and 25 controls, in and around their own houses. We evaluate our ability to detect gait and predict medication induced fluctuations in PD patients based on modelled gait. Our evaluation includes a comparison between sensors attached at multiple body locations including wrist, ankle, trouser pocket and lower back. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:1910.09648 [pdf, other]

Causal bootstrap**

Authors: Max A. Little, Reham Badawy

Abstract: To draw scientifically meaningful conclusions and build reliable models of quantitative phenomena, cause and effect must be taken into consideration (either implicitly or explicitly). This is particularly challenging when the measurements are not from controlled experimental (interventional) settings, since cause and effect can be obscured by spurious, indirect influences. Modern predictive techni… ▽ More To draw scientifically meaningful conclusions and build reliable models of quantitative phenomena, cause and effect must be taken into consideration (either implicitly or explicitly). This is particularly challenging when the measurements are not from controlled experimental (interventional) settings, since cause and effect can be obscured by spurious, indirect influences. Modern predictive techniques from machine learning are capable of capturing high-dimensional, nonlinear relationships between variables while relying on few parametric or probabilistic model assumptions. However, since these techniques are associational, applied to observational data they are prone to picking up spurious influences from non-experimental (observational) data, making their predictions unreliable. Techniques from causal inference, such as probabilistic causal diagrams and do-calculus, provide powerful (nonparametric) tools for drawing causal inferences from such observational data. However, these techniques are often incompatible with modern, nonparametric machine learning algorithms since they typically require explicit probabilistic models. Here, we develop causal bootstrap** for augmenting classical nonparametric bootstrap resampling with information on the causal relationship between variables. This makes it possible to resample observational data such that, if it is possible to identify an interventional relationship from that data, new data representing that relationship can be simulated from the original observational data. In this way, we can use modern machine learning algorithms unaltered to make statistically powerful, yet causally-robust, predictions. We develop several causal bootstrap** algorithms for drawing interventional inferences from observational data, for classification and regression problems, and demonstrate, using synthetic and real-world examples, the value of this approach. △ Less

Submitted 9 December, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: 18 pages, 3 figures

arXiv:1905.11785 [pdf, other]

Automatic Quality Control and Enhancement for Voice-Based Remote Parkinson's Disease Detection

Authors: Amir Hossein Poorjam, Mathew Shaji Kavalekalam, Liming Shi, Yordan P. Raykov, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen

Abstract: The performance of voice-based Parkinson's disease (PD) detection systems degrades when there is an acoustic mismatch between training and operating conditions caused mainly by degradation in test signals. In this paper, we address this mismatch by considering three types of degradation commonly encountered in remote voice analysis, namely background noise, reverberation and nonlinear distortion,… ▽ More The performance of voice-based Parkinson's disease (PD) detection systems degrades when there is an acoustic mismatch between training and operating conditions caused mainly by degradation in test signals. In this paper, we address this mismatch by considering three types of degradation commonly encountered in remote voice analysis, namely background noise, reverberation and nonlinear distortion, and investigate how these degradations influence the performance of a PD detection system. Given that the specific degradation is known, we explore the effectiveness of a variety of enhancement algorithms in compensating this mismatch and improving the PD detection accuracy. Then, we propose two approaches to automatically control the quality of recordings by identifying the presence and type of short-term and long-term degradations and protocol violations in voice signals. Finally, we experiment with using the proposed quality control methods to inform the choice of enhancement algorithm. Experimental results using the voice recordings of the mPower mobile PD data set under different degradation conditions show the effectiveness of the quality control approaches in selecting an appropriate enhancement method and, consequently, in improving the PD detection accuracy. This study is a step towards the development of a remote PD detection system capable of operating in unseen acoustic environments. △ Less

Submitted 31 May, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

Comments: Preprint, 12 pages, 6 figures

arXiv:1905.11010 [pdf, ps, other]

Adaptive probabilistic principal component analysis

Authors: Adam Farooq, Yordan P. Raykov, Luc Evers, Max A. Little

Abstract: Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point… ▽ More Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring. △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1905.08557 [pdf, other]

Bayesian Pitch Tracking Based on the Harmonic Model

Authors: Liming Shi, Jesper Kjaer Nielsen, Jesper Rindom Jensen, Max A. Little, Mads Graesboll Christensen

Abstract: Fundamental frequency is one of the most important characteristics of speech and audio signals. Harmonic model-based fundamental frequency estimators offer a higher estimation accuracy and robustness against noise than the widely used autocorrelation-based methods. However, the traditional harmonic model-based estimators do not take the temporal smoothness of the fundamental frequency, the model o… ▽ More Fundamental frequency is one of the most important characteristics of speech and audio signals. Harmonic model-based fundamental frequency estimators offer a higher estimation accuracy and robustness against noise than the widely used autocorrelation-based methods. However, the traditional harmonic model-based estimators do not take the temporal smoothness of the fundamental frequency, the model order, and the voicing into account as they process each data segment independently. In this paper, a fully Bayesian fundamental frequency tracking algorithm based on the harmonic model and a first-order Markov process model is proposed. Smoothness priors are imposed on the fundamental frequencies, model orders, and voicing using first-order Markov process models. Using these Markov models, fundamental frequency estimation and voicing detection errors can be reduced. Using the harmonic model, the proposed fundamental frequency tracker has an improved robustness to noise. An analytical form of the likelihood function, which can be computed efficiently, is derived. Compared to the state-of-the-art neural network and non-parametric approaches, the proposed fundamental frequency tracking algorithm reduces the mean absolute errors and gross errors by 15\% and 20\% on the Keele pitch database and 36\% and 26\% on sustained /a/ sounds from a database of Parkinson's disease voices under 0 dB white Gaussian noise. A MATLAB version of the proposed algorithm is made freely available for reproduction of the results\footnote{An implementation of the proposed algorithm using MATLAB may be found in \url{https://tinyurl.com/yxn4a543} △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1601.00960 [pdf, other]

High Frequency Remote Monitoring of Parkinson's Disease via Smartphone: Platform Overview and Medication Response Detection

Authors: Andong Zhan, Max A. Little, Denzil A. Harris, Solomon O. Abiola, E. Ray Dorsey, Suchi Saria, Andreas Terzis

Abstract: Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures s… ▽ More Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures symptoms actively (i.e. data are collected when a suite of tests is initiated by the individual at specific times during the day), and passively (i.e. data are collected continuously in the background). After data collection, we extract features to assess measures of five key behaviors related to PD symptoms -- voice, balance, gait, dexterity, and reaction time. A random forest classifier is used to discriminate measurements taken after a dose of medication (treatment) versus before the medication dose (baseline). Results: A worldwide study for remote PD monitoring was established using HopkinsPD in July, 2014. This study used entirely remote, online recruitment and installation, demonstrating highly cost-effective scalability. In six months, 226 individuals (121 PD and 105 controls) contributed over 46,000 hours of passive monitoring data and approximately 8,000 instances of structured tests of voice, balance, gait, reaction, and dexterity. To the best of our knowledge, this is the first study to have collected data at such a scale for remote PD monitoring. Moreover, we demonstrate the initial ability to discriminate treatment from baseline with 71.0(+-0.4)% accuracy, which suggests medication response can be monitored remotely via smartphone-based measures. △ Less

Submitted 5 January, 2016; originally announced January 2016.

arXiv:1304.1209 [pdf, other]

doi 10.1098/rsif.2013.0048

Highly comparative time-series analysis: The empirical structure of time series and their methods

Authors: Ben D. Fulcher, Max A. Little, Nick S. Jones

Abstract: The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 re… ▽ More The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines. △ Less

Submitted 3 April, 2013; originally announced April 2013.

Journal ref: J. R. Soc. Interface vol. 10 no. 83 20130048 (2013)

Showing 1–17 of 17 results for author: Little, M A