Search | arXiv e-print repository

Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

Authors: Chengzhe Piao, Taiyu Zhu, Yu Wang, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li

Abstract: Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models… ▽ More Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the "cold start" problem in diabetes care, we propose "GluADFL", blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients' data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e.g., random, performs the best among others) to more structured topologies (e.g., cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserving solution for BG prediction in T1D, significantly enhancing the quality of diabetes management. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2403.14438 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446224

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Abstract: Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from th… ▽ More Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from the audio waveform. Second, we take the decoder outputs of an automatic speech recognition (ASR) system, such as 1-best hypotheses, as input features to a large language model (LLM). Finally, we explore a multimodal system that combines acoustic and lexical features, as well as ASR decoder signals in an LLM. Using multimodal information yields relative equal-error-rate improvements over text-only and audio-only models of up to 39% and 61%. Increasing the size of the LLM and training with low-rank adaption leads to further relative EER reductions of up to 18% on our dataset. △ Less

Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.03632

arXiv:2403.03712 [pdf, ps, other]

Saturating Sorting without Sorts

Authors: Pamina Georgiou, Márton Hajdu, Laura Kovács

Abstract: We present a first-order theorem proving framework for establishing the correctness of functional programs implementing sorting algorithms with recursive data structures. We formalize the semantics of recursive programs in many-sorted first-order logic and integrate sortedness/permutation properties within our first-order formalization. Rather than focusing on sorting lists of elements of specif… ▽ More We present a first-order theorem proving framework for establishing the correctness of functional programs implementing sorting algorithms with recursive data structures. We formalize the semantics of recursive programs in many-sorted first-order logic and integrate sortedness/permutation properties within our first-order formalization. Rather than focusing on sorting lists of elements of specific first-order theories, such as integer arithmetic, our list formalization relies on a sort parameter abstracting (arithmetic) theories and hence concrete sorts. We formalize the permutation property of lists in first-order logic so that we automatically prove verification conditions of such algorithms purely by superpositon-based first-order reasoning. Doing so, we adjust recent efforts for automating inducion in saturation. We advocate a compositional approach for automating proofs by induction required to verify functional programs implementing and preserving sorting and permutation properties over parameterized list structures. Our work turns saturation-based first-order theorem proving into an automated verification engine by (i) guiding automated inductive reasoning with manual proof splits and (ii) fully automating inductive reasoning in saturation. We showcase the applicability of our framework over recursive sorting algorithms, including Mergesort and Quicksort. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.16230 [pdf, other]

GARNN: An Interpretable Graph Attentive Recurrent Neural Network for Predicting Blood Glucose Levels via Multivariate Time Series

Authors: Chengzhe Piao, Taiyu Zhu, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li

Abstract: Accurate prediction of future blood glucose (BG) levels can effectively improve BG management for people living with diabetes, thereby reducing complications and improving quality of life. The state of the art of BG prediction has been achieved by leveraging advanced deep learning methods to model multi-modal data, i.e., sensor data and self-reported event data, organised as multi-variate time ser… ▽ More Accurate prediction of future blood glucose (BG) levels can effectively improve BG management for people living with diabetes, thereby reducing complications and improving quality of life. The state of the art of BG prediction has been achieved by leveraging advanced deep learning methods to model multi-modal data, i.e., sensor data and self-reported event data, organised as multi-variate time series (MTS). However, these methods are mostly regarded as ``black boxes'' and not entirely trusted by clinicians and patients. In this paper, we propose interpretable graph attentive recurrent neural networks (GARNNs) to model MTS, explaining variable contributions via summarizing variable importance and generating feature maps by graph attention mechanisms instead of post-hoc analysis. We evaluate GARNNs on four datasets, representing diverse clinical scenarios. Upon comparison with twelve well-established baseline methods, GARNNs not only achieve the best prediction accuracy but also provide high-quality temporal interpretability, in particular for postprandial glucose levels as a result of corresponding meal intake and insulin injection. These findings underline the potential of GARNN as a robust tool for improving diabetes care, bridging the gap between deep learning technology and real-world healthcare solutions. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.00176 [pdf, other]

Adversarial Quantum Machine Learning: An Information-Theoretic Generalization Analysis

Authors: Petros Georgiou, Sharu Theresa Jose, Osvaldo Simeone

Abstract: In a manner analogous to their classical counterparts, quantum classifiers are vulnerable to adversarial attacks that perturb their inputs. A promising countermeasure is to train the quantum classifier by adopting an attack-aware, or adversarial, loss function. This paper studies the generalization properties of quantum classifiers that are adversarially trained against bounded-norm white-box atta… ▽ More In a manner analogous to their classical counterparts, quantum classifiers are vulnerable to adversarial attacks that perturb their inputs. A promising countermeasure is to train the quantum classifier by adopting an attack-aware, or adversarial, loss function. This paper studies the generalization properties of quantum classifiers that are adversarially trained against bounded-norm white-box attacks. Specifically, a quantum adversary maximizes the classifier's loss by transforming an input state $ρ(x)$ into a state $λ$ that is $ε$-close to the original state $ρ(x)$ in $p$-Schatten distance. Under suitable assumptions on the quantum embedding $ρ(x)$, we derive novel information-theoretic upper bounds on the generalization error of adversarially trained quantum classifiers for $p = 1$ and $p = \infty$. The derived upper bounds consist of two terms: the first is an exponential function of the 2-Rényi mutual information between classical data and quantum embedding, while the second term scales linearly with the adversarial perturbation size $ε$. Both terms are shown to decrease as $1/\sqrt{T}$ over the training set size $T$ . An extension is also considered in which the adversary assumed during training has different parameters $p$ and $ε$ as compared to the adversary affecting the test inputs. Finally, we validate our theoretical findings with numerical experiments for a synthetic setting. △ Less

Submitted 15 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: 10 pages, 2 figures. Fixed a typo (wrong inequality sign) in lemma 2 and extended to cover the whole range of values of p. Added reference on inequalities in trace norms

arXiv:2312.03632 [pdf, other]

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Abstract: Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is to determine whether a user addressed the virtual assistant based on signals obtained from the streaming audio recorded by the device microphone. We address this… ▽ More Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is to determine whether a user addressed the virtual assistant based on signals obtained from the streaming audio recorded by the device microphone. We address this task by combining 1-best hypotheses and decoder signals from an automatic speech recognition system with acoustic representations from an audio encoder as input features to a large language model (LLM). In particular, we are interested in data and resource efficient systems that require only a small amount of training data and can operate in scenarios with only a single frozen LLM available on a device. For this reason, our model is trained on 80k or less examples of multimodal data using a combination of low-rank adaptation and prefix tuning. We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data. We also show that low-dimensional specialized audio representations lead to lower EERs than high-dimensional general audio representations. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.01146 [pdf]

Bayesian models are better than frequentist models in identifying differences in small datasets comprising phonetic data

Authors: Georgios P. Georgiou

Abstract: While many studies have previously conducted direct comparisons between results obtained from frequentist and Bayesian models, our research introduces a novel perspective by examining these models in the context of a small dataset comprising phonetic data. Specifically, we employed mixed-effects models and Bayesian regression models to explore differences between monolingual and bilingual populati… ▽ More While many studies have previously conducted direct comparisons between results obtained from frequentist and Bayesian models, our research introduces a novel perspective by examining these models in the context of a small dataset comprising phonetic data. Specifically, we employed mixed-effects models and Bayesian regression models to explore differences between monolingual and bilingual populations in the acoustic values of produced vowels. Our findings revealed that Bayesian hypothesis testing exhibited superior accuracy in identifying evidence for differences compared to the posthoc test, which tended to underestimate the existence of such differences. These results align with a substantial body of previous research highlighting the advantages of Bayesian over frequentist models, thereby emphasizing the need for methodological reform. In conclusion, our study supports the assertion that Bayesian models are more suitable for investigating differences in small datasets of phonetic and/or linguistic data, suggesting that researchers in these fields may find greater reliability in utilizing such models for their analyses. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 15 pages, 3 figures

arXiv:2311.15054 [pdf]

Detection of developmental language disorder in Cypriot Greek children using a neural network algorithm

Authors: Georgios P. Georgiou, Elena Theodorou

Abstract: Children with developmental language disorder (DLD) encounter difficulties in acquiring various language structures. Early identification and intervention are crucial to prevent negative long-term outcomes impacting the academic, social, and emotional development of children. The study aims to develop an automated method for the identification of DLD using artificial intelligence, specifically a n… ▽ More Children with developmental language disorder (DLD) encounter difficulties in acquiring various language structures. Early identification and intervention are crucial to prevent negative long-term outcomes impacting the academic, social, and emotional development of children. The study aims to develop an automated method for the identification of DLD using artificial intelligence, specifically a neural network machine learning algorithm. This protocol is applied for the first time in a Cypriot Greek child population with DLD. The neural network model was trained using perceptual and production data elicited from 15 children with DLD and 15 healthy controls in the age range of 7;10 until 10;4. The k-fold technique was used to crossvalidate the algorithm. The performance of the model was evaluated using metrics such as accuracy, precision, recall, F1 score, and ROC/AUC curve to assess its ability to make accurate predictions on a set of unseen data. The results demonstrated high classification values for all metrics, indicating the high accuracy of the neural model in classifying children with DLD. Additionally, the variable importance analysis revealed that the language production skills of children had a more significant impact on the performance of the model compared to perception skills. Machine learning paradigms provide effective discrimination between children with DLD and those with TD, with the potential to enhance clinical assessment and facilitate earlier and more efficient detection of the disorder. △ Less

Submitted 10 February, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

Comments: 15 pages, 3 figures, journal article

arXiv:2302.09044 [pdf, other]

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Authors: Colin Lea, Zifang Huang, Lauren Tooley, Jaya Narain, Dianna Yee, Panayiotis Georgiou, Tien Dung Tran, Jeffrey P. Bigham, Leah Findlater

Abstract: Consumer speech recognition systems do not work as well for many people with speech diferences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we frst address these questions using results from a 61-person survey from people… ▽ More Consumer speech recognition systems do not work as well for many people with speech diferences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we frst address these questions using results from a 61-person survey from people who stutter and fnd participants want to use speech recognition but are frequently cut of, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfuencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances of 79.1% less often and improves word error rate from 25.4% to 9.9%. △ Less

Submitted 27 February, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: CHI 2023

arXiv:2212.09351 [pdf, other]

doi 10.1109/JQE.2023.3296732

Polarization Modulation in Quantum-Dot Spin-VCSELs for Ultrafast Data Transmission

Authors: Christos Tselios, Panagiotis Georgiou, Christina, Politi, Antonio Hurtado, Dimitris Alexandropoulos

Abstract: Spin-Vertical Cavity Surface Emitting Lasers (spin-VCSELs) are undergoing increasing research effort for new paradigms in high-speed optical communications and photon-enabled computing. To date research in spin-VCSELs has mostly focused on Quantum-Well (QW) devices. However, novel Quantum-Dot (QD) spin-VCSELs, offer enhanced parameter controls permitting the effective, dynamical and ultrafast mani… ▽ More Spin-Vertical Cavity Surface Emitting Lasers (spin-VCSELs) are undergoing increasing research effort for new paradigms in high-speed optical communications and photon-enabled computing. To date research in spin-VCSELs has mostly focused on Quantum-Well (QW) devices. However, novel Quantum-Dot (QD) spin-VCSELs, offer enhanced parameter controls permitting the effective, dynamical and ultrafast manipulation of their light emissions polarization. In the present contribution we investigate in detail the operation of QD spin-VCSELs subject to polarization modulation for their use as ultrafast light sources in optical communication systems. We reveal that QD spin-VCSELs outperform their QW counterparts in terms of modulation efficiency, yielding a nearly two-fold improvement. We also analyse the impact of key device parameters in QD spin-VCSELs (e.g. photon decay rate and intra-dot relaxation rate) on the large signal modulation performance with regard to optical modulation amplitude and eye-diagram opening penalty. We show that in addition to exhibiting enhanced polarization modulation performance for data rates up to 250Gbps, QD spin-VCSELs enable operation in dual (ground and excited state) emission thus allowing future exciting routes for multiplexing of information in optical communication links. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2208.06327 [pdf, other]

doi 10.1038/s42256-022-00558-5

Develo** moral AI to support antimicrobial decision making

Authors: William J Bolton, Cosmin Badea, Pantelis Georgiou, Alison Holmes, Timothy M Rawson

Abstract: Artificial intelligence (AI) assisting with antimicrobial prescribing raises significant moral questions. Utilising ethical frameworks alongside AI-driven systems, while considering infection specific complexities, can support moral decision making to tackle antimicrobial resistance. Artificial intelligence (AI) assisting with antimicrobial prescribing raises significant moral questions. Utilising ethical frameworks alongside AI-driven systems, while considering infection specific complexities, can support moral decision making to tackle antimicrobial resistance. △ Less

Submitted 12 August, 2022; originally announced August 2022.

ACM Class: I.2.1

arXiv:2202.03587 [pdf, other]

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

Authors: Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

Abstract: Deriving multimodal representations of audio and lexical inputs is a central problem in Natural Language Understanding (NLU). In this paper, we present Contrastive Aligned Audio-Language Multirate and Multimodal Representations (CALM), an approach for learning multimodal representations using contrastive and multirate information inherent in audio and lexical inputs. The proposed model aligns acou… ▽ More Deriving multimodal representations of audio and lexical inputs is a central problem in Natural Language Understanding (NLU). In this paper, we present Contrastive Aligned Audio-Language Multirate and Multimodal Representations (CALM), an approach for learning multimodal representations using contrastive and multirate information inherent in audio and lexical inputs. The proposed model aligns acoustic and lexical information in the input embedding space of a pretrained language-only contextual embedding model. By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time. Operationally, audio spectrograms are processed using linearized patches through a Spectral Transformer (SpecTran) which is trained using a Contrastive Audio-Language Pretraining objective to align audio and language from similar queries. Subsequently, the derived acoustic and lexical tokens representations are input into a multimodal transformer to incorporate utterance level context and derive the proposed CALM representations. We show that these pretrained embeddings can subsequently be used in multimodal supervised tasks and demonstrate the benefits of the proposed pretraining steps in terms of the alignment of the two embedding spaces and the multirate nature of the pretraining. Our system shows 10-25\% improvement over existing emotion recognition systems including state-of-the-art three-modality systems under various evaluation objectives. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2106.11759 [pdf, other]

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Authors: Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

Abstract: Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetition… ▽ More Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64\% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24\% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6\% better domain recognition and 1.7\% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 page reference, 2 figures

arXiv:2104.03899 [pdf, other]

Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks

Authors: Haoqi Li, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

Abstract: Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically r… ▽ More Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context and derive a manifold representation, where speech frames with similar behaviors are closer while frames of different behaviors maintain larger distances. The models are trained on movie audio data and validated on diverse domains including on a couples therapy corpus and other publicly collected data (e.g., stand-up comedy). With encouraging results, our proposed framework shows the feasibility of unsupervised learning within cross-domain behavioral modeling. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2102.11265 [pdf, other]

doi 10.3758/s13428-021-01623-4

Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

Authors: Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

Abstract: With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai… ▽ More With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is however a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called Motivational Interviewing, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5,000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes. △ Less

Submitted 27 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: new version has an updated title

arXiv:2102.02270 [pdf, other]

doi 10.1371/journal.pone.0264488

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Authors: Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by co… ▽ More Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by considering inherent human language ambiguities. In this paper, we propose a novel word vector space estimation by unsupervised learning on lattices output by an automatic speech recognition (ASR) system. We encode each word in confusion2vec vector space by its constituent subword character n-grams. We show the subword encoding helps better represent the acoustic perceptual ambiguities in human spoken language via information modeled on lattice structured ASR output. The usefulness of the proposed Confusion2vec representation is evaluated using semantic, syntactic and acoustic analogy and word similarity tasks. We also show the benefits of subword modeling for acoustic ambiguity representation on the task of spoken language intent detection. The results significantly outperform existing word vector representations when evaluated on erroneous ASR outputs. We demonstrate that Confusion2vec subword modeling eliminates the need for retraining/adapting the natural language understanding models on ASR transcripts. △ Less

Submitted 19 February, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

arXiv:2008.03550 [pdf, other]

A novel hand-held interface supporting the self-management of Type 1 diabetes

Authors: Robert Spence, Chukwuma Uduku, Kezhi Li, Nick Oliver, Pantelis Georgiou

Abstract: The paper describes the interaction design of a hand-held interface supporting the self-management of Type 1 diabetes. It addresses well-established clinical and human-computer interaction requirements. The design exploits three opportunities. One is associated with visible context, whether conspicuous or inconspicuous. A second arises from the design freedom made possible by the user's anticipa… ▽ More The paper describes the interaction design of a hand-held interface supporting the self-management of Type 1 diabetes. It addresses well-established clinical and human-computer interaction requirements. The design exploits three opportunities. One is associated with visible context, whether conspicuous or inconspicuous. A second arises from the design freedom made possible by the user's anticipated focus of attention during certain interactions. A third opportunity to provide valuable functionality arises from wearable sensors and machine learning algorithms. The resulting interface permits ``What if?'' questions: it allows a user to dynamically and manually explore predicted short-term (e.g., 2 hours) relationships between an intended meal, blood glucose level and recommended insulin dosage, and thereby readily make informed food and exercise decisions. Design activity has been informed throughout by focus groups comprising people with Type 1 diabetes in addition to experts in diabetes, interaction design and machine learning. The design is being implemented prior to a clinical trial. △ Less

Submitted 8 August, 2020; originally announced August 2020.

arXiv:2008.01387 [pdf, ps, other]

Trace Logic for Inductive Loop Reasoning

Authors: Pamina Georgiou, Bernhard Gleiss, Laura Kovács

Abstract: We propose trace logic, an instance of many-sorted first-order logic, to automate the partial correctness verification of programs containing loops. Trace logic generalizes semantics of program locations and captures loop semantics by encoding properties at arbitrary timepoints and loop iterations. We guide and automate inductive loop reasoning in trace logic by using generic trace lemmas capturin… ▽ More We propose trace logic, an instance of many-sorted first-order logic, to automate the partial correctness verification of programs containing loops. Trace logic generalizes semantics of program locations and captures loop semantics by encoding properties at arbitrary timepoints and loop iterations. We guide and automate inductive loop reasoning in trace logic by using generic trace lemmas capturing inductive loop invariants. Our work is implemented in the RAPID framework, by extending and integrating superposition-based first-order reasoning within RAPID. We successfully used RAPID to prove correctness of many programs whose functional behavior are best summarized in the first-order theories of linear integer arithmetic, arrays and inductive data types. △ Less

Submitted 6 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: Related Version: A compact, peer-reviewed version of this paper will be available in the conference proceedings of Formal Methods of Computer-Aided Design (FMCAD) 2020

arXiv:2005.09059 [pdf, other]

doi 10.1109/JBHI.2020.3014556

Basal Glucose Control in Type 1 Diabetes using Deep Reinforcement Learning: An In Silico Validation

Authors: Taiyu Zhu, Kezhi Li, Pau Herrero, Pantelis Georgiou

Abstract: People with Type 1 diabetes (T1D) require regular exogenous infusion of insulin to maintain their blood glucose concentration in a therapeutically adequate target range. Although the artificial pancreas and continuous glucose monitoring have been proven to be effective in achieving closed-loop control, significant challenges still remain due to the high complexity of glucose dynamics and limitatio… ▽ More People with Type 1 diabetes (T1D) require regular exogenous infusion of insulin to maintain their blood glucose concentration in a therapeutically adequate target range. Although the artificial pancreas and continuous glucose monitoring have been proven to be effective in achieving closed-loop control, significant challenges still remain due to the high complexity of glucose dynamics and limitations in the technology. In this work, we propose a novel deep reinforcement learning model for single-hormone (insulin) and dual-hormone (insulin and glucagon) delivery. In particular, the delivery strategies are developed by double Q-learning with dilated recurrent neural networks. For designing and testing purposes, the FDA-accepted UVA/Padova Type 1 simulator was employed. First, we performed long-term generalized training to obtain a population model. Then, this model was personalized with a small data-set of subject-specific data. In silico results show that the single and dual-hormone delivery strategies achieve good glucose control when compared to a standard basal-bolus therapy with low-glucose insulin suspension. Specifically, in the adult cohort (n=10), percentage time in target range [70, 180] mg/dL improved from 77.6% to 80.9% with single-hormone control, and to $85.6\%$ with dual-hormone control. In the adolescent cohort (n=10), percentage time in target range improved from 55.5% to 65.9% with single-hormone control, and to 78.8% with dual-hormone control. In all scenarios, a significant decrease in hypoglycemia was observed. These results show that the use of deep reinforcement learning is a viable approach for closed-loop glucose control in T1D. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Journal ref: IEEE journal of biomedical and health informatics 2020

arXiv:2004.06756 [pdf, other]

doi 10.21437/Interspeech.2019-1947

Speaker Diarization with Lexical Information

Authors: Tae ** Park, Kyu J. Han, **g Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive… ▽ More This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings. △ Less

Submitted 13 April, 2020; originally announced April 2020.

Journal ref: Interspeech 2019, 391-395

arXiv:1911.11927 [pdf, ps, other]

Automatic prediction of suicidal risk in military couples using multimodal interaction cues from couples conversations

Authors: Sandeep Nallan Chakravarthula, Md Nasir, Shao-Yen Tseng, Haoqi Li, Tae ** Park, Brian Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis Georgiou

Abstract: Suicide is a major societal challenge globally, with a wide range of risk factors, from individual health, psychological and behavioral elements to socio-economic aspects. Military personnel, in particular, are at especially high risk. Crisis resources, while helpful, are often constrained by access to clinical visits or therapist availability, especially when needed in a timely manner. There have… ▽ More Suicide is a major societal challenge globally, with a wide range of risk factors, from individual health, psychological and behavioral elements to socio-economic aspects. Military personnel, in particular, are at especially high risk. Crisis resources, while helpful, are often constrained by access to clinical visits or therapist availability, especially when needed in a timely manner. There have hence been efforts on identifying whether communication patterns between couples at home can provide preliminary information about potential suicidal behaviors, prior to intervention. In this work, we investigate whether acoustic, lexical, behavior and turn-taking cues from military couples' conversations can provide meaningful markers of suicidal risk. We test their effectiveness in real-world noisy conditions by extracting these cues through an automatic diarization and speech recognition front-end. Evaluation is performed by classifying 3 degrees of suicidal risk: none, ideation, attempt. Our automatic system performs significantly better than chance in all classification scenarios and we find that behavior and turn-taking cues are the most informative ones. We also observe that conditioning on factors such as speaker gender and topic of discussion tends to improve classification performance. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: submitted to ICASSP 2020

arXiv:1911.09515 [pdf, other]

An analysis of observation length requirements for machine understanding of human behaviors from spoken language

Authors: Sandeep Nallan Chakravarthula, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

Abstract: The task of quantifying human behavior by observing interaction cues is an important and useful one across a range of domains in psychological research and practice. Machine learning-based approaches typically perform this task by first estimating behavior based on cues within an observation window, such as a fixed number of words, and then aggregating the behavior over all the windows in that int… ▽ More The task of quantifying human behavior by observing interaction cues is an important and useful one across a range of domains in psychological research and practice. Machine learning-based approaches typically perform this task by first estimating behavior based on cues within an observation window, such as a fixed number of words, and then aggregating the behavior over all the windows in that interaction. The length of this window directly impacts the accuracy of estimation by controlling the amount of information being used. The exact link between window length and accuracy, however, has not been well studied, especially in spoken language. In this paper, we investigate this link and present an analysis framework that determines appropriate window lengths for the task of behavior estimation. Our proposed framework utilizes a two-pronged evaluation approach: (a) extrinsic similarity between machine predictions and human expert annotations, and (b) intrinsic consistency between intra-machine and intra-human behavior relations. We apply our analysis to real-life conversations that are annotated for a large and diverse set of behavior codes and examine the relation between the nature of a behavior and how long it should be observed. We find that behaviors describing negative and positive affect can be accurately estimated from short to medium-length expressions whereas behaviors related to problem-solving and dysphoria require much longer observations and are difficult to quantify from language alone. These findings are found to be generally consistent across different behavior modeling approaches. △ Less

Submitted 26 August, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: converted to CSL format, restructured presentation of analysis and methodology, moved finer details to Appendix, enlarged figures and text, fixed typos and notational inconsistency

arXiv:1911.07994 [pdf, other]

doi 10.21437/Odyssey.2020-17

Linguistically Aided Speaker Diarization Using Speaker Role Information

Authors: Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not known a priori, so a clustering algorithm is typically employed, which is traditionally based solely on audio. Under noisy conditions, however, such an approach… ▽ More Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not known a priori, so a clustering algorithm is typically employed, which is traditionally based solely on audio. Under noisy conditions, however, such an approach poses the risk of generating unreliable speaker clusters. In this work we aim to utilize linguistic information as a supplemental modality to identify the various speakers in a more robust way. We are focused on conversational scenarios where the speakers assume distinct roles and are expected to follow different linguistic patterns. This distinct linguistic variability can be exploited to help us construct the speaker identities. That way, we are able to boost the diarization performance by converting the clustering task to a classification one. The proposed method is applied in real-world dyadic psychotherapy interactions between a provider and a patient and demonstrated to show improved results. △ Less

Submitted 5 February, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: from v1: restructured Introduction and Background, added experimental results with ASR text and language-only baseline

arXiv:1911.01533 [pdf, other]

Speaker-invariant Affective Representation Learning via Adversarial Training

Authors: Haoqi Li, Ming Tu, **g Huang, Shrikanth Narayanan, Panayiotis Georgiou

Abstract: Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect… ▽ More Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. Specifically, we propose to disentangle the speaker characteristics from emotion through an adversarial training network in order to better represent emotion. Our method combines the gradient reversal technique with an entropy loss function to remove such speaker information. Our approach is evaluated on both IEMOCAP and CMU-MOSEI datasets. We show that our method improves speech emotion classification and increases generalization to unseen speakers. △ Less

Submitted 12 August, 2021; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: Accepted by ICASSP 2020; 5 pages

arXiv:1910.10287 [pdf, other]

RNN based Incremental Online Spoken Language Understanding

Authors: Prashanth Gurunath Shivakumar, Naveen Kumar, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU often has to wait for the ASR to finish processing on an utterance basis, potentially leading to high latencies that render the spoken interaction less natural. In… ▽ More Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU often has to wait for the ASR to finish processing on an utterance basis, potentially leading to high latencies that render the spoken interaction less natural. In this paper, we propose recurrent neural network (RNN) based incremental processing towards the SLU task of intent detection. The proposed methodology offers lower latencies than a typical SLU system, without any significant reduction in system accuracy. We introduce and analyze different recurrent neural network architectures for incremental and online processing of the ASR transcripts and compare it to the existing offline systems. A lexical End-of-Sentence (EOS) detector is proposed for segmenting the stream of transcript into sentences for intent classification. Intent detection experiments are conducted on benchmark ATIS, Snips and Facebook's multilingual task oriented dialog datasets modified to emulate a continuous incremental stream of words with no utterance demarcation. We also analyze the prospects of early intent detection, before EOS, with our proposed system. △ Less

Submitted 30 November, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

Comments: Accepted for publication at IEEE Spoken Language Technology Workshop 2021

arXiv:1910.04059 [pdf, other]

A Dual-Hormone Closed-Loop Delivery System for Type 1 Diabetes Using Deep Reinforcement Learning

Authors: Taiyu Zhu, Kezhi Li, Pantelis Georgiou

Abstract: We propose a dual-hormone delivery strategy by exploiting deep reinforcement learning (RL) for people with Type 1 Diabetes (T1D). Specifically, double dilated recurrent neural networks (RNN) are used to learn the hormone delivery strategy, trained by a variant of Q-learning, whose inputs are raw data of glucose \& meal carbohydrate and outputs are dual-hormone (insulin and glucagon) delivery. With… ▽ More We propose a dual-hormone delivery strategy by exploiting deep reinforcement learning (RL) for people with Type 1 Diabetes (T1D). Specifically, double dilated recurrent neural networks (RNN) are used to learn the hormone delivery strategy, trained by a variant of Q-learning, whose inputs are raw data of glucose \& meal carbohydrate and outputs are dual-hormone (insulin and glucagon) delivery. Without prior knowledge of the glucose-insulin metabolism, we run the method on the UVA/Padova simulator. Hundreds days of self-play are performed to obtain a generalized model, then importance sampling is adopted to customize the model for personal use. \emph{In-silico} the proposed strategy achieves glucose time in target range (TIR) $93\%$ for adults and $83\%$ for adolescents given standard bolus, outperforming previous approaches significantly. The results indicate that deep RL is effective in building personalized hormone delivery strategy for people with T1D. △ Less

Submitted 9 October, 2019; originally announced October 2019.

arXiv:1910.03641 [pdf, other]

Linking emotions to behaviors through deep transfer learning

Authors: Haoqi Li, Brian Baucom, Panayiotis Georgiou

Abstract: Human behavior refers to the way humans act and interact. Understanding human behavior is a cornerstone of observational practice, especially in psychotherapy. An important cue of behavior analysis is the dynamical changes of emotions during the conversation. Domain experts integrate emotional information in a highly nonlinear manner, thus, it is challenging to explicitly quantify the relationship… ▽ More Human behavior refers to the way humans act and interact. Understanding human behavior is a cornerstone of observational practice, especially in psychotherapy. An important cue of behavior analysis is the dynamical changes of emotions during the conversation. Domain experts integrate emotional information in a highly nonlinear manner, thus, it is challenging to explicitly quantify the relationship between emotions and behaviors. In this work, we employ deep transfer learning to analyze their inferential capacity and contextual importance. We first train a network to quantify emotions from acoustic signals and then use information from the emotion recognition network as features for behavior recognition. We treat this emotion-related information as behavioral primitives and further train higher level layers towards behavior quantification. Through our analysis, we find that emotion-related information is an important cue for behavior recognition. Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks' contextual view of the data. This demonstrates that the sequence of emotions is critical in behavior expression. To achieve these frameworks we employ hybrid architectures of convolutional networks and recurrent networks to extract emotion-related behavior primitives and facilitate automatic behavior recognition from speech. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 23 pages, 8 figures

arXiv:1909.04302 [pdf, other]

Multimodal Embeddings from Language Models

Authors: Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many natural language tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of multimodal inputs to a pretraine… ▽ More Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many natural language tasks. In this work we integrate acoustic information into contextualized lexical embeddings through the addition of multimodal inputs to a pretrained bidirectional language model. The language model is trained on spoken language that includes text and audio modalities. The resulting representations from this model are multimodal and contain paralinguistic information which can modify word meanings and provide affective information. We show that these multimodal embeddings can be used to improve over previous state of the art multimodal models in emotion recognition on the CMU-MOSEI dataset. △ Less

Submitted 10 September, 2019; originally announced September 2019.

arXiv:1909.00107 [pdf, other]

Behavior Gated Language Models

Authors: Prashanth Gurunath Shivakumar, Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Most current language modeling techniques only exploit co-occurrence, semantic and syntactic information from the sequence of words. However, a range of information such as the state of the speaker and dynamics of the interaction might be useful. In this work we derive motivation from psycholinguistics and propose the addition of behavioral information into the context of language modeling. We pro… ▽ More Most current language modeling techniques only exploit co-occurrence, semantic and syntactic information from the sequence of words. However, a range of information such as the state of the speaker and dynamics of the interaction might be useful. In this work we derive motivation from psycholinguistics and propose the addition of behavioral information into the context of language modeling. We propose the augmentation of language models with an additional module which analyzes the behavioral state of the current context. This behavioral information is used to gate the outputs of the language model before the final word prediction output. We show that the addition of behavioral context in language models achieves lower perplexities on behavior-rich datasets. We also confirm the validity of the proposed models on a variety of model architectures and improve on previous state-of-the-art models with generic domain Penn Treebank Corpus. △ Less

Submitted 30 August, 2019; originally announced September 2019.

arXiv:1908.00908 [pdf, other]

Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions using Speech and Language

Authors: Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis Georgiou

Abstract: Cancer impacts the quality of life of those diagnosed as well as their spouse caregivers, in addition to potentially influencing their day-to-day behaviors. There is evidence that effective communication between spouses can improve well-being related to cancer but it is difficult to efficiently evaluate the quality of daily life interactions using manual annotation frameworks. Automated recognitio… ▽ More Cancer impacts the quality of life of those diagnosed as well as their spouse caregivers, in addition to potentially influencing their day-to-day behaviors. There is evidence that effective communication between spouses can improve well-being related to cancer but it is difficult to efficiently evaluate the quality of daily life interactions using manual annotation frameworks. Automated recognition of behaviors based on the interaction cues of speakers can help analyze interactions in such couples and identify behaviors which are beneficial for effective communication. In this paper, we present and detail a dataset of dyadic interactions in 85 real-life cancer-afflicted couples and a set of observational behavior codes pertaining to interpersonal communication attributes. We describe and employ neural network-based systems for classifying these behaviors based on turn-level acoustic and lexical speech patterns. Furthermore, we investigate the effect of controlling for factors such as gender, patient/caregiver role and conversation content on behavior classification. Analysis of our preliminary results indicates the challenges in this task due to the nature of the targeted behaviors and suggests that techniques incorporating contextual processing might be better suited to tackle this problem. △ Less

Submitted 2 August, 2019; originally announced August 2019.

arXiv:1906.09899 [pdf, ps, other]

Verifying Relational Properties using Trace Logic

Authors: Gilles Barthe, Renate Eilers, Pamina Georgiou, Bernhard Gleiss, Laura Kovacs, Matteo Maffei

Abstract: We present a logical framework for the verification of relational properties in imperative programs. Our work is motivated by relational properties which come from security applications and often require reasoning about formulas with quantifier-alternations. Our framework reduces verification of relational properties of imperative programs to a validity problem into trace logic, an expressive inst… ▽ More We present a logical framework for the verification of relational properties in imperative programs. Our work is motivated by relational properties which come from security applications and often require reasoning about formulas with quantifier-alternations. Our framework reduces verification of relational properties of imperative programs to a validity problem into trace logic, an expressive instance of first-order predicate logic. Trace logic draws its expressiveness from its syntax, which allows expressing properties over computation traces. Its axiomatization supports fine-grained reasoning about intermediate steps in program execution, notably loop iterations. We present an algorithm to encode the semantics of programs as well as their relational properties in trace logic, and then show how first-order theorem proving can be used to reason about the resulting trace logic formulas. Our work is implemented in the tool Rapid and evaluated with examples coming from the security field. △ Less

Submitted 12 August, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

arXiv:1904.06002 [pdf, other]

Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance

Authors: Md Nasir, Sandeep Nallan Chakravarthula, Brian Baucom, David C. Atkins, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: Linguistic coordination is a well-established phenomenon in spoken conversations and often associated with positive social behaviors and outcomes. While there have been many attempts to measure lexical coordination or entrainment in literature, only a few have explored coordination in syntactic or semantic space. In this work, we attempt to combine these different aspects of coordination into a si… ▽ More Linguistic coordination is a well-established phenomenon in spoken conversations and often associated with positive social behaviors and outcomes. While there have been many attempts to measure lexical coordination or entrainment in literature, only a few have explored coordination in syntactic or semantic space. In this work, we attempt to combine these different aspects of coordination into a single measure by leveraging distances in a neural word representation space. In particular, we adopt the recently proposed Word Mover's Distance with word2vec embeddings and extend it to measure the dissimilarity in language used in multiple consecutive speaker turns. To validate our approach, we apply this measure for two case studies in the clinical psychology domain. We find that our proposed measure is correlated with the therapist's empathy towards their patient in Motivational Interviewing and with affective behaviors in Couples Therapy. In both case studies, our proposed metric exhibits higher correlation than previously proposed measures. When applied to the couples with relationship improvement, we also notice a significant decrease in the proposed measure over the course of therapy, indicating higher linguistic coordination. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1904.03576 [pdf, other]

doi 10.21437/Interspeech.2019-2226

Spoken Language Intent Detection using Confusion2Vec

Authors: Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou

Abstract: Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representati… ▽ More Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84% and improves robustness by 37.48% (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts. △ Less

Submitted 1 July, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

Report number: 2226

Journal ref: Proceedings of Interspeech 2019

arXiv:1901.07467 [pdf, ps, other]

Enhancing Blood Glucose Prediction with Meal Absorption and Physical Exercise Information

Authors: Chengyuan Liu, Josep Vehi, Nick Oliver, Pantelis Georgiou, Pau Herrero

Abstract: Objective: Numerous glucose prediction algorithm have been proposed to empower type 1 diabetes (T1D) management. Most of these algorithms only account for input such as glucose, insulin and carbohydrate, which limits their performance. Here, we present a novel glucose prediction algorithm which, in addition to standard inputs, accounts for meal absorption and physical exercise information to enhan… ▽ More Objective: Numerous glucose prediction algorithm have been proposed to empower type 1 diabetes (T1D) management. Most of these algorithms only account for input such as glucose, insulin and carbohydrate, which limits their performance. Here, we present a novel glucose prediction algorithm which, in addition to standard inputs, accounts for meal absorption and physical exercise information to enhance prediction accuracy. Methods: a compartmental model of glucose-insulin dynamics combined with a deconvolution technique for state estimation is employed for glucose prediction. In silico data corresponding from the 10 adult subjects of UVa-Padova simulator, and clinical data from 10 adults with T1D were used. Finally, a comparison against a validated glucose prediction algorithm based on a latent variable with exogenous input (LVX) model is provided. Results: For a prediction horizon of 60 minutes, accounting for meal absorption and physical exercise improved glucose forecasting accuracy. In particular, root mean square error (mg/dL) went from 26.68 to 23.89, p<0.001 (in silico data); and from 37.02 to 35.96, p<0.001 (clinical data - only meal information). Such improvement in accuracy was translated into significant improvements on hypoglycaemia and hyperglycaemia prediction. Finally, the performance of the proposed algorithm is statistically superior to that of the LVX algorithm (26.68 vs. 32.80, p<0.001 (in silico data); 37.02 vs. 49.17, p<0.01 (clinical data). Conclusion: Taking into account meal absorption and physical exercise information improves glucose prediction accuracy. △ Less

Submitted 13 December, 2018; originally announced January 2019.

Comments: 10 pages, 5 figures, 8 tables and one appendix

arXiv:1811.10761

Speaker Diarization With Lexical Information

Authors: Tae ** Park, Kyu Han, Ian Lane, Panayiotis Georgiou

Abstract: This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our… ▽ More This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our proposed method works without any reference transcript. Words, and word boundary information are provided by an ASR system. We show that our proposed method improves a baseline speaker diarization system solely based on speaker embeddings, achieving a meaningful improvement on the CALLHOME American English Speech dataset. △ Less

Submitted 28 November, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: This version removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

arXiv:1811.03199 [pdf, other]

doi 10.7717/peerj-cs.195

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

Authors: Prashanth Gurunath Shivakumar, Panayiotis Georgiou

Abstract: Word vector representations are a crucial part of Natural Language Processing (NLP) and Human Computer Interaction. In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity. Humans employ both acoustic similarity cues and contextual cues to decode information and we focus on a mode… ▽ More Word vector representations are a crucial part of Natural Language Processing (NLP) and Human Computer Interaction. In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity. Humans employ both acoustic similarity cues and contextual cues to decode information and we focus on a model that incorporates both sources of information. The representational ambiguity of acoustics, which manifests itself in word confusions, is often resolved by both humans and machines through contextual cues. A range of representational ambiguities can emerge in various domains further to acoustic perception, such as morphological transformations, paraphrasing for NLP tasks like machine translation etc. In this work, we present a case study in application to Automatic Speech Recognition (ASR), where the word confusions are related to acoustic similarity. We present several techniques to train an acoustic perceptual similarity representation ambiguity. We term this Confusion2Vec and learn on unsupervised-generated data from ASR confusion networks or lattice-like structures. Appropriate evaluations for the Confusion2Vec are formulated for gauging acoustic similarity in addition to semantic-syntactic and word similarity evaluations. The Confusion2Vec is able to model word confusions efficiently, without compromising on the semantic-syntactic word relations, thus effectively enriching the word vector space with extra task relevant ambiguity information. We provide an intuitive exploration of the 2-dimensional Confusion2Vec space using Principal Component Analysis of the embedding and relate to semantic, syntactic and acoustic relationships. The potential of Confusion2Vec in the utilization of uncertainty present in lattices is demonstrated through small examples relating to ASR error correction. △ Less

Submitted 28 March, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

Journal ref: PeerJ Computer Science 5:e195, 2019

arXiv:1810.12349 [pdf, other]

doi 10.1109/TAFFC.2019.2952113

Multi-label Multi-task Deep Learning for Behavioral Coding

Authors: James Gibson, David C. Atkins, Torrey Creed, Zac Imel, Panayiotis Georgiou, Shrikanth Narayanan

Abstract: We propose a methodology for estimating human behaviors in psychotherapy sessions using mutli-label and multi-task learning paradigms. We discuss the problem of behavioral coding in which data of human interactions is the annotated with labels to describe relevant human behaviors of interest. We describe two related, yet distinct, corpora consisting of therapist client interactions in psychotherap… ▽ More We propose a methodology for estimating human behaviors in psychotherapy sessions using mutli-label and multi-task learning paradigms. We discuss the problem of behavioral coding in which data of human interactions is the annotated with labels to describe relevant human behaviors of interest. We describe two related, yet distinct, corpora consisting of therapist client interactions in psychotherapy sessions. We experimentally compare the proposed learning approaches for estimating behaviors of interest in these datasets. Specifically, we compare single and multiple label learning approaches, single and multiple task learning approaches, and evaluate the performance of these approaches when incorporating turn context. We demonstrate the prediction performance gains which can be achieved by using the proposed paradigms and discuss the insights these models provide into these complex interactions. △ Less

Submitted 5 November, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

arXiv:1807.06792 [pdf, other]

Unsupervised Online Multitask Learning of Behavioral Sentence Embeddings

Authors: Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou

Abstract: Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data. These representations, or embeddings, often yield superior results in many tasks, whether used directly or as features in subsequent training stages. However, the quality of the embeddings is highly dependent on the assumed knowledge in the unlabeled data and… ▽ More Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data. These representations, or embeddings, often yield superior results in many tasks, whether used directly or as features in subsequent training stages. However, the quality of the embeddings is highly dependent on the assumed knowledge in the unlabeled data and how the system extracts information without supervision. Domain portability is also very limited in unsupervised learning, often requiring re-training on other in-domain corpora to achieve robustness. In this work we present a multitask paradigm for unsupervised contextual learning of behavioral interactions which addresses unsupervised domain adaption. We introduce an online multitask objective into unsupervised learning and show that sentence embeddings generated through this process increases performance of affective tasks. △ Less

Submitted 1 November, 2018; v1 submitted 18 July, 2018; originally announced July 2018.

arXiv:1807.03043 [pdf, other]

Convolutional Recurrent Neural Networks for Glucose Prediction

Authors: Kezhi Li, John Daniels, Chengyuan Liu, Pau Herrero, Pantelis Georgiou

Abstract: Control of blood glucose is essential for diabetes management. Current digital therapeutic approaches for subjects with Type 1 diabetes mellitus (T1DM) such as the artificial pancreas and insulin bolus calculators leverage machine learning techniques for predicting subcutaneous glucose for improved control. Deep learning has recently been applied in healthcare and medical research to achieve state… ▽ More Control of blood glucose is essential for diabetes management. Current digital therapeutic approaches for subjects with Type 1 diabetes mellitus (T1DM) such as the artificial pancreas and insulin bolus calculators leverage machine learning techniques for predicting subcutaneous glucose for improved control. Deep learning has recently been applied in healthcare and medical research to achieve state-of-the-art results in a range of tasks including disease diagnosis, and patient state prediction among others. In this work, we present a deep learning model that is capable of forecasting glucose levels with leading accuracy for simulated patient cases (RMSE = 9.38$\pm$0.71 [mg/dL] over a 30-minute horizon, RMSE = 18.87$\pm$2.25 [mg/dL] over a 60-minute horizon) and real patient cases (RMSE = 21.07$\pm$2.35 [mg/dL] for 30-minute, RMSE = 33.27$\pm$4.79\% for 60-minute). In addition, the model provides competitive performance in providing effective prediction horizon ($PH_{eff}$) with minimal time lag both in a simulated patient dataset ($PH_{eff}$ = 29.0$\pm$0.7 for 30-min and $PH_{eff}$ = 49.8$\pm$2.9 for 60-min) and in a real patient dataset ($PH_{eff}$ = 19.3$\pm$3.1 for 30-min and $PH_{eff}$ = 29.3$\pm$9.4 for 60-min). This approach is evaluated on a dataset of 10 simulated cases generated from the UVa/Padova simulator and a clinical dataset of 10 real cases each containing glucose readings, insulin bolus, and meal (carbohydrate) data. Performance of the recurrent convolutional neural network is benchmarked against four algorithms. The proposed algorithm is implemented on an Android mobile phone, with an execution time of $6$ms on a phone compared to an execution time of $780$ms on a laptop. △ Less

Submitted 25 February, 2019; v1 submitted 9 July, 2018; originally announced July 2018.

Comments: 10 pages, 7 figures

Journal ref: IEEE journal of biomedical and health informatics 2019

arXiv:1805.10731 [pdf, other]

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

Authors: Tae ** Park, Panayiotis Georgiou

Abstract: While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system using a sequence-to-sequence neural network trained on both lexical and acoustic features. We also propose a loss function that allows for selecting not only t… ▽ More While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system using a sequence-to-sequence neural network trained on both lexical and acoustic features. We also propose a loss function that allows for selecting not only the speaker change points but also the best speaker at any time by allowing for different speaker grou**s. We incorporate Mel Frequency Cepstral Coefficients (MFCC) as an acoustic feature alongside lexical information that are obtained from conversations from the Fisher dataset. Thus, we show that acoustics provide complementary information to the lexical modality. The experimental results show that sequence-to-sequence system trained on both word sequences and MFCC can improve on speaker diarization result compared to the system that only relies on lexical modality or the baseline MFCC-based system. In addition, we test the performance of our proposed method with Automatic Speech Recognition (ASR) transcripts. While the performance on ASR transcripts drops, the Diarization Error Rate (DER) of our proposed method still outperforms the traditional method based on Bayesian Information Criterion (BIC). △ Less

Submitted 27 May, 2018; originally announced May 2018.

arXiv:1805.09436 [pdf, other]

Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions

Authors: Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou

Abstract: Dyadic interactions among humans are marked by speakers continuously influencing and reacting to each other in terms of responses and behaviors, among others. Understanding how interpersonal dynamics affect behavior is important for successful treatment in psychotherapy domains. Traditional schemes that automatically identify behavior for this purpose have often looked at only the target speaker.… ▽ More Dyadic interactions among humans are marked by speakers continuously influencing and reacting to each other in terms of responses and behaviors, among others. Understanding how interpersonal dynamics affect behavior is important for successful treatment in psychotherapy domains. Traditional schemes that automatically identify behavior for this purpose have often looked at only the target speaker. In this work, we propose a Markov model of how a target speaker's behavior is influenced by their own past behavior as well as their perception of their partner's behavior, based on lexical features. Apart from incorporating additional potentially useful information, our model can also control the degree to which the partner affects the target speaker. We evaluate our proposed model on the task of classifying Negative behavior in Couples Therapy and show that it is more accurate than the single-speaker model. Furthermore, we investigate the degree to which the optimal influence relates to how well a couple does on the long-term, via relating to relationship outcomes △ Less

Submitted 23 May, 2018; originally announced May 2018.

arXiv:1805.05840 [pdf]

Body Dust: Miniaturized Highly-integrated Low Power Sensing for Remotely Powered Drinkable CMOS Bioelectronics

Authors: Sandro Carrara, Pantelis Georgiou

Abstract: The aim of this paper is to introduce current advances in technology that could enable the development of fully drinkable and autonomous bio-electronic CMOS sensors in the form of dust particles, capable of identifying the source of a disease by targeting a specific region in organs and tissue such as a tumor mass and automatically sending diagnostic information wirelessly outside the body. We cal… ▽ More The aim of this paper is to introduce current advances in technology that could enable the development of fully drinkable and autonomous bio-electronic CMOS sensors in the form of dust particles, capable of identifying the source of a disease by targeting a specific region in organs and tissue such as a tumor mass and automatically sending diagnostic information wirelessly outside the body. We call this swarm of sensing dust particles Body Dust. A diagnostic system in the form of Body Dust would need to be small enough to support free circulation in human tissues, which requires a total size of less than 10 um3, in order to mimic the typical sizes of a blood cell (e.g., red cells have the diameter around 7 μm). Whilst with present state-of-the-art in CMOS technology, this requirement in terms of size is currently un-feasible, recent research has advanced technology such that we can begin to work towards such an approach. Therefore, we present here the current limits of CMOS technology as well as the challenges related to the development of such a system. Towards this goal, this article presents the theoretical feasibility to obtain the first ever-conceived sub-10-um Bio/CMOS integrated circuit with biosensing capability to provide diagnostic telemetry once self-located in human tissue. △ Less

Submitted 30 April, 2018; originally announced May 2018.

Comments: 9 pages, 14 figures

arXiv:1805.03322 [pdf, other]

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

Authors: Prashanth Gurunath Shivakumar, Panayiotis Georgiou

Abstract: Children speech recognition is challenging mainly due to the inherent high variability in children's physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children's life. Part of the challenge is due to the lack of large amounts of available children speech data for… ▽ More Children speech recognition is challenging mainly due to the inherent high variability in children's physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children's life. Part of the challenge is due to the lack of large amounts of available children speech data for efficient modeling. This work attempts to address the key challenges using transfer learning from adult's models to children's models in a Deep Neural Network (DNN) framework for children's Automatic Speech Recognition (ASR) task evaluating on multiple children's speech corpora with a large vocabulary. The paper presents a systematic and an extensive analysis of the proposed transfer learning technique considering the key factors affecting children's speech recognition from prior literature. Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints. Our Analysis spans over (i) number of DNN model parameters (for adaptation), (ii) amount of adaptation data, (iii) ages of children, (iv) age dependent-independent adaptation. Finally, we provide Recommendations on (i) the favorable strategies over various aforementioned - analyzed parameters, and (ii) potential future research directions and relevant challenges/problems persisting in DNN based ASR for children's speech. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1804.08782 [pdf, other]

doi 10.21437/Interspeech.2018-1395

Towards an Unsupervised Entrainment Distance in Conversational Speech using Deep Neural Networks

Authors: Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

Abstract: Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics. Understanding how interlocutors tend to adapt to each other's speaking style through entrainment involves measuring a range of acoustic features and comparing those via multiple signal comparison methods. In this work, we present a turn-level distance measure obt… ▽ More Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics. Understanding how interlocutors tend to adapt to each other's speaking style through entrainment involves measuring a range of acoustic features and comparing those via multiple signal comparison methods. In this work, we present a turn-level distance measure obtained in an unsupervised manner using a Deep Neural Network (DNN) model, which we call Neural Entrainment Distance (NED). This metric establishes a framework that learns an embedding from the population-wide entrainment in an unlabeled training corpus. We use the framework for a set of acoustic features and validate the measure experimentally by showing its efficacy in distinguishing real conversations from fake ones created by randomly shuffling speaker turns. Moreover, we show real world evidence of the validity of the proposed measure. We find that high value of NED is associated with high ratings of emotional bond in suicide assessment interviews, which is consistent with prior studies. △ Less

Submitted 23 April, 2018; originally announced April 2018.

Comments: submitted to Interspeech 2018

arXiv:1802.07860 [pdf, other]

doi 10.1109/TASLP.2019.2921890

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Authors: Arindam Jati, Panayiotis Georgiou

Abstract: Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio stream… ▽ More Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce "speaker embeddings" by learning to separate `same' vs `different' speaker pairs which are generated from an unlabeled data of audio streams. Two sets of experiments are done in different scenarios to evaluate the strength of NPC embeddings and compare with state-of-the-art in-domain supervised methods. First, two speaker identification experiments with different context lengths are performed in a scenario with comparatively limited within-speaker channel variability. NPC embeddings are found to perform the best at short duration experiment, and they provide complementary information to i-vectors for full utterance experiments. Second, a large scale speaker verification task having a wide range of within-speaker channel variability is adopted as an upper-bound experiment where comparisons are drawn with in-domain supervised methods. △ Less

Submitted 25 April, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1577-1589, Oct. 2019

arXiv:1802.02607 [pdf, other]

doi 10.1017/ATSIP.2018.31

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Authors: Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis Georgiou

Abstract: Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can… ▽ More Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (out-of-vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through recurrent neural network language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system. △ Less

Submitted 28 March, 2019; v1 submitted 7 February, 2018; originally announced February 2018.

Journal ref: APSIPA Transactions on Signal and Information Processing 8. Cambridge University Press: e8, 2019

arXiv:1701.03198 [pdf, other]

Unsupervised Latent Behavior Manifold Learning from Acoustic Features: audio2behavior

Authors: Haoqi Li, Brian Baucom, Panayiotis Georgiou

Abstract: Behavioral annotation using signal processing and machine learning is highly dependent on training data and manual annotations of behavioral labels. Previous studies have shown that speech information encodes significant behavioral information and be used in a variety of automated behavior recognition tasks. However, extracting behavior information from speech is still a difficult task due to the… ▽ More Behavioral annotation using signal processing and machine learning is highly dependent on training data and manual annotations of behavioral labels. Previous studies have shown that speech information encodes significant behavioral information and be used in a variety of automated behavior recognition tasks. However, extracting behavior information from speech is still a difficult task due to the sparseness of training data coupled with the complex, high-dimensionality of speech, and the complex and multiple information streams it encodes. In this work we exploit the slow varying properties of human behavior. We hypothesize that nearby segments of speech share the same behavioral context and hence share a similar underlying representation in a latent space. Specifically, we propose a Deep Neural Network (DNN) model to connect behavioral context and derive the behavioral manifold in an unsupervised manner. We evaluate the proposed manifold in the couples therapy domain and also provide examples from publicly available data (e.g. stand-up comedy). We further investigate training within the couples' therapy domain and from movie data. The results are extremely encouraging and promise improved behavioral quantification in an unsupervised manner and warrants further investigation in a range of applications. △ Less

Submitted 11 January, 2017; originally announced January 2017.

Comments: Accepted by ICASSP 2017

arXiv:1606.04518 [pdf, other]

Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples' Therapy

Authors: Haoqi Li, Brian Baucom, Panayiotis Georgiou

Abstract: Observational studies are based on accurate assessment of human state. A behavior recognition system that models interlocutors' state in real-time can significantly aid the mental health domain. However, behavior recognition from speech remains a challenging task since it is difficult to find generalizable and representative features because of noisy and high-dimensional data, especially when data… ▽ More Observational studies are based on accurate assessment of human state. A behavior recognition system that models interlocutors' state in real-time can significantly aid the mental health domain. However, behavior recognition from speech remains a challenging task since it is difficult to find generalizable and representative features because of noisy and high-dimensional data, especially when data is limited and annotated coarsely and subjectively. Deep Neural Networks (DNN) have shown promise in a wide range of machine learning tasks, but for Behavioral Signal Processing (BSP) tasks their application has been constrained due to limited quantity of data. We propose a Sparsely-Connected and Disjointly-Trained DNN (SD-DNN) framework to deal with limited data. First, we break the acoustic feature set into subsets and train multiple distinct classifiers. Then, the hidden layers of these classifiers become parts of a deeper network that integrates all feature streams. The overall system allows for full connectivity while limiting the number of parameters trained at any time and allows convergence possible with even limited data. We present results on multiple behavior codes in the couples' therapy domain and demonstrate the benefits in behavior classification accuracy. We also show the viability of this system towards live behavior annotations. △ Less

Submitted 14 June, 2016; originally announced June 2016.

arXiv:1605.02021 [pdf, other]

Window functions and sigmoidal behaviour of memristive systems

Authors: Panayiotis S. Georgiou, Sophia N. Yaliraki, Emmanuel M. Drakakis, Mauricio Barahona

Abstract: A common approach to model memristive systems is to include empirical window functions to describe edge effects and non-linearities in the change of the memristance. We demonstrate that under quite general conditions, each window function can be associated with a sigmoidal curve relating the normalised time-dependent memristance to the time integral of the input. Conversely, this explicit relation… ▽ More A common approach to model memristive systems is to include empirical window functions to describe edge effects and non-linearities in the change of the memristance. We demonstrate that under quite general conditions, each window function can be associated with a sigmoidal curve relating the normalised time-dependent memristance to the time integral of the input. Conversely, this explicit relation allows us to derive window functions suitable for the mesoscopic modelling of memristive systems from a variety of well-known sigmoidals. Such sigmoidal curves are defined in terms of measured variables and can thus be extracted from input and output signals of a device and then transformed to its corresponding window. We also introduce a new generalised window function that allows the flexible modelling of asymmetric edge effects in a simple manner. △ Less

Submitted 14 January, 2016; originally announced May 2016.

Comments: 12 pages, 5 figures, 1 table. To appear in International Journal of Circuit Theory and Applications

arXiv:1011.0060 [pdf, ps, other]

Quantitative Measure of Hysteresis for Memristors Through Explicit Dynamics

Authors: Panayiotis S. Georgiou, Sophia N. Yaliraki, Emmanuel M. Drakakis, Mauricio Barahona

Abstract: We introduce a mathematical framework for the analysis of the input-output dynamics of externally driven memristors. We show that, under general assumptions, their dynamics comply with a Bernoulli differential equation and hence can be nonlinearly transformed into a formally solvable linear equation. The Bernoulli formalism, which applies to both charge- and flux-controlled memristors when either… ▽ More We introduce a mathematical framework for the analysis of the input-output dynamics of externally driven memristors. We show that, under general assumptions, their dynamics comply with a Bernoulli differential equation and hence can be nonlinearly transformed into a formally solvable linear equation. The Bernoulli formalism, which applies to both charge- and flux-controlled memristors when either current- or voltage-driven, can, in some cases, lead to expressions of the output of the device as an explicit function of the input. We apply our framework to obtain analytical solutions of the i-v characteristics of the recently proposed model of the Hewlett-Packard memristor under three different drives without the need for numerical simulations. Our explicit solutions allow us to identify a dimensionless lumped parameter that combines device-specific parameters with properties of the input drive. This parameter governs the memristive behavior of the device and, consequently, the amount of hysteresis in the i-v. We proceed further by defining formally a quantitative measure for the hysteresis of the device for which we obtain explicit formulas in terms of the aforementioned parameter and we discuss the applicability of the analysis for the design and analysis of memristor devices. △ Less

Submitted 17 July, 2011; v1 submitted 30 October, 2010; originally announced November 2010.

Comments: 11 pages, 12 figures

Showing 1–50 of 50 results for author: Georgiou, P