Search | arXiv e-print repository

Large Language Models for Dysfluency Detection in Stuttered Speech

Authors: Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

Abstract: Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we appr… ▽ More Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we approach the task of multi-label dysfluency detection as a language modeling problem. We present hypotheses candidates generated with an automatic speech recognition system and acoustic representations extracted from an audio encoder model to an LLM, and finetune the system to predict dysfluency labels on three datasets containing English and German stuttered speech. The experimental results show that our system effectively combines acoustic and lexical information and achieves competitive results on the multi-label stuttering detection task. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2406.11022 [pdf, other]

Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

Authors: Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

Abstract: This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present whe… ▽ More This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present when transformer-based models are trained to perform automatic speech recognition, necessitating mitigation strategies for PTQ. We show that outliers can be reduced by a recently proposed gating mechanism in the attention blocks of the student model, enabling effective 8-bit quantization, and lower word error rates compared to student models without the gating mechanism in place. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2403.14438 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446224

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Abstract: Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from th… ▽ More Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from the audio waveform. Second, we take the decoder outputs of an automatic speech recognition (ASR) system, such as 1-best hypotheses, as input features to a large language model (LLM). Finally, we explore a multimodal system that combines acoustic and lexical features, as well as ASR decoder signals in an LLM. Using multimodal information yields relative equal-error-rate improvements over text-only and audio-only models of up to 39% and 61%. Increasing the size of the LLM and training with low-rank adaption leads to further relative EER reductions of up to 18% on our dataset. △ Less

Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.03632

arXiv:2312.03632 [pdf, other]

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Abstract: Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is to determine whether a user addressed the virtual assistant based on signals obtained from the streaming audio recorded by the device microphone. We address this… ▽ More Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is to determine whether a user addressed the virtual assistant based on signals obtained from the streaming audio recorded by the device microphone. We address this task by combining 1-best hypotheses and decoder signals from an automatic speech recognition system with acoustic representations from an audio encoder as input features to a large language model (LLM). In particular, we are interested in data and resource efficient systems that require only a small amount of training data and can operate in scenarios with only a single frozen LLM available on a device. For this reason, our model is trained on 80k or less examples of multimodal data using a combination of low-rank adaptation and prefix tuning. We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data. We also show that low-dimensional specialized audio representations lead to lower EERs than high-dimensional general audio representations. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2306.06514 [pdf, other]

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

Authors: Dominik Wagner, Ilja Baumann, Tobias Bocklet

Abstract: Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn map**s between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing the audio signal by using separate models for conve… ▽ More Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn map**s between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing the audio signal by using separate models for conversion and waveform synthesis. This work unifies conversion and synthesis into a single model, thereby eliminating the need for a separate vocoder. By leveraging cycle-consistent training and a self-supervised auxiliary training task, our model is able to efficiently generate converted high-quality raw audio waveforms. Subjective listening tests show that our method outperforms the baseline in whispered speech conversion (up to 6.7% relative improvement), and mean opinion score predictions yield competitive results in conventional VC (between 0.5% and 2.4% relative improvement). △ Less

Submitted 10 June, 2023; originally announced June 2023.

arXiv:2305.19255 [pdf, other]

A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

Abstract: Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-labe… ▽ More Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency sha**. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

arXiv:2212.01775 [pdf, other]

doi 10.1109/SLT54892.2023.10022796

Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

Authors: Dominik Wagner, Sebastian P. Bayerl, Hector A. Cordourier Maruri, Tobias Bocklet

Abstract: This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech from whispered inputs. Objective and subjective qual… ▽ More This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech from whispered inputs. Objective and subjective quality measures indicate that both VQ-VAEs and MelGANs can be modified to perform the conversion task. We find that the proposed approaches significantly improve the Mel cepstral distortion (MCD) metric by at least 25% relative to a DiscoGAN baseline. Subjective listening tests suggest that the MelGAN-based system significantly improves naturalness, intelligibility, and voicing compared to the whispered input speech. A novel evaluation measure based on differences between latent speech representations also indicates that our MelGAN-based approach yields improvements relative to the baseline. △ Less

Submitted 30 January, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted at SLT 2022

arXiv:2211.08774 [pdf, other]

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Authors: Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

Abstract: We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the prove… ▽ More We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the proven method of concatenating speaker vectors to the acoustic features and supplying them as auxiliary model inputs remains a viable option to increase the robustness of end-to-end architectures. The effect on transformer models is stronger, when more noise is added to the input speech. The most substantial benefits for systems based on wav2vec 2.0 are achieved under moderate or no noise conditions. Both x-vectors and ECAPA-TDNN embeddings outperform i-vectors as speaker representations. The optimal embedding size depends on the dataset and also varies with the noise condition. △ Less

Submitted 7 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted at ASRU 2023

arXiv:2210.15982 [pdf, other]

Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

Authors: Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

Abstract: Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone,… ▽ More Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone, i.e., co-occurs with others. This work explores an approach based on a modified wav2vec 2.0 system for end-to-end stuttering detection and classification as a multi-label problem. The method is evaluated on combinations of three datasets containing English and German stuttered speech, yielding state-of-the-art results for stuttering detection on the SEP-28k-Extended dataset. Experimental results provide evidence for the transferability of features and the generalizability of the method across datasets and languages. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.15941 [pdf, other]

Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

Authors: Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

Abstract: Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The resu… ▽ More Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The results indicate that the distinction between CLP and healthy voices, especially with latent representations from the lower and middle encoder layers, reaches an accuracy of 100%. We test the classifier to find influencing factors for classification using unseen out-of-domain healthy and pathologic corpora with varying characteristics: age, spoken content, and acoustic conditions. Cross-pathology and cross-healthy tests reveal that the trained classifiers are unreliable if there is a mismatch between training and out-of-domain test data in, e.g., age, spoken content, or acoustic conditions. △ Less

Submitted 1 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: INTERSPEECH 2023

arXiv:2210.15336 [pdf, ps, other]

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Authors: Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

Abstract: The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and or… ▽ More The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 97.0%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources. △ Less

Submitted 1 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: INTERSPEECH 2023

arXiv:2206.08058 [pdf, other]

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Authors: Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

Abstract: This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (F… ▽ More This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (FFT), speaker embeddings (ECAPA-TDNN), grapheme-motivated embeddings (wav2vec 2.0), and phonetic embeddings in form of senones (ASR acoustic model). Each of the approaches provides input for VGG-like 5-layer CNN classifiers. We also examine the adaptation per nonword. The evaluation of the proposed systems was performed using recordings from different kindergartens of spoken nonwords. ECAPA-TDNN and low-level FFT features do not explicitly model phonetic information; wav2vec2.0 is trained on grapheme labels, our ASR acoustic model features contain (sub-)phonetic information. We found that the more granular the phonetic modeling is, the higher are the achieved recognition rates. The best system trained on ASR acoustic model features with VTLN achieved an accuracy of 89.4% and an area under the ROC (Receiver Operating Characteristic) curve (AUC) of 0.923. This corresponds to an improvement in accuracy of 20.2% and AUC of 0.309 relative compared to the FFT-baseline. △ Less

Submitted 17 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at Interspeech 2022

arXiv:2206.03400 [pdf, ps, other]

doi 10.1007/978-3-031-16270-1_35

The Influence of Dataset Partitioning on Dysfluency Detection Systems

Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

Abstract: This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the sy… ▽ More This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the systems with different non-speaker-exclusive and speaker-exclusive splits of the Stuttering Events in Podcasts (SEP-28k) dataset to shed some light on the variability of results w.r.t. to the partition method used. Furthermore, we show that the SEP-28k dataset is dominated by only a few speakers, making it difficult to evaluate. To remedy this problem, we created SEP-28k-Extended (SEP-28k-E), containing semi-automatically generated speaker and gender information for the SEP-28k corpus, and suggest different data splits, each useful for evaluating other aspects of methods for dysfluency detection. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

arXiv:2204.03428 [pdf, other]

Detecting Vocal Fatigue with Neural Embeddings

Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

Abstract: Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional map**s of the data reveal that neural embeddings capture information about the cha… ▽ More Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional map**s of the data reveal that neural embeddings capture information about the change in vocal characteristics of a speaker during prolonged voice usage. We show that vocal fatigue can be reliably predicted using all three kinds of neural embeddings after only 50 minutes of continuous speaking when temporal smoothing and normalization are applied to the extracted embeddings. We employ support vector machines for classification and achieve accuracy scores of 81% using x-vectors, 85% using ECAPA-TDNN embeddings, and 82% using wav2vec 2.0 embeddings as input features. We obtain an accuracy score of 76%, when the trained system is applied to a different speaker and recording environment without any adaptation. △ Less

Submitted 17 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted for Publication in the Journal of Voice

arXiv:2204.03417 [pdf, other]

doi 10.21437/Interspeech.2022-347

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer

Abstract: Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech tec… ▽ More Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score. △ Less

Submitted 16 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted at Interspeech 2022

arXiv:2104.10752 [pdf, ps, other]

A Concise Guide on the Integration of Battery Electric Buses into Urban Bus Networks

Authors: Nicolas Dirks, Dennis Wagner, Maximilian Schiffer, Grit Walther

Abstract: With the increasing market penetration of battery-electric buses into urban bus networks, practitioners face many novel planning problems. As a result, the interest in optimization-based decision-making for these planning problems increases but practitioners' requirements on planning solutions and current academic approaches often diverge. Against this background, this survey aims to provide a con… ▽ More With the increasing market penetration of battery-electric buses into urban bus networks, practitioners face many novel planning problems. As a result, the interest in optimization-based decision-making for these planning problems increases but practitioners' requirements on planning solutions and current academic approaches often diverge. Against this background, this survey aims to provide a concise guide on optimization-based planning approaches for integrating battery-electric buses into urban bus networks for both practitioners and academics. First, we derive practitioners' requirements for integrating battery-electric buses from state-of-the-art specifications, project reports, and expert knowledge. Second, we analyze whether existing optimization-based planning models fulfill these practitioners' requirements. Based on this analysis, we carve out the existing gap between practice and research and discuss how to address these in future research. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 24 pages, 8 figures, 11 tables

arXiv:2005.09313 [pdf, ps, other]

Measures and LMIs for Adaptive Control Validation

Authors: Daniel Wagner, Didier Henrion, Martin Hrom{č}ík

Abstract: Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) have been used previously as a means for solving control law verification and validation (VV) problems. However, these methods have been restricted to relatively simple control laws and a limited number of states. In this document, we extend these methods to model reference… ▽ More Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) have been used previously as a means for solving control law verification and validation (VV) problems. However, these methods have been restricted to relatively simple control laws and a limited number of states. In this document, we extend these methods to model reference adaptive control (MRAC) configurations typical of the aircraft industry. The main contribution is a validation scheme that exploits the specific nonlinearities and structure of MRAC. A nonlinear F-16 plant is used for illustration. LMI relaxations solved by off-the-shelf-software are compared to traditional Monte-Carlo simulations. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2003.11292 [pdf, ps, other]

Measures and LMIs for Lateral F-16 MRAC Validation

Authors: Daniel Wagner, Didier Henrion, Martin Hromčík

Abstract: Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) are state-of-the-art methods for verification and validation (VV) in aerospace. In this document, we extend these results to a full F-16 closed-loop nonlinear dutch roll polynomial model complete with model reference adaptive control (MRAC). This is done through a new techn… ▽ More Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) are state-of-the-art methods for verification and validation (VV) in aerospace. In this document, we extend these results to a full F-16 closed-loop nonlinear dutch roll polynomial model complete with model reference adaptive control (MRAC). This is done through a new technique of approximating the reference trajectory by exploiting sparse ordinary differential equations (ODEs) with parsimony. The VV problem is then solved directly using moment LMI relaxations and off-the-shelf-software. The main results are then compared to their numerical counterparts obtained using traditional Monte-Carlo simulations. △ Less

Submitted 25 March, 2020; originally announced March 2020.

arXiv:1505.05747 [pdf, other]

Operating Power Grids with Few Flow Control Buses

Authors: Thomas Leibfried, Tamara Mchedlidze, Nico Meyer-Hübner, Martin Nöllenburg, Ignaz Rutter, Peter Sanders, Dorothea Wagner, Franziska Wegner

Abstract: Future power grids will offer enhanced controllability due to the increased availability of power flow control units (FACTS). As the installation of control units in the grid is an expensive investment, we are interested in using few controllers to achieve high controllability. In particular, two questions arise: How many flow control buses are necessary to obtain globally optimal power flows? And… ▽ More Future power grids will offer enhanced controllability due to the increased availability of power flow control units (FACTS). As the installation of control units in the grid is an expensive investment, we are interested in using few controllers to achieve high controllability. In particular, two questions arise: How many flow control buses are necessary to obtain globally optimal power flows? And if fewer flow control buses are available, what can we achieve with them? Using steady state IEEE benchmark data sets, we explore experimentally that already a small number of controllers placed at certain grid buses suffices to achieve globally optimal power flows. We present a graph-theoretic explanation for this behavior. To answer the second question we perform a set of experiments that explore the existence and costs of feasible power flow solutions at increased loads with respect to the number of flow control buses in the grid. We observe that adding a small number of flow control buses reduces the flow costs and extends the existence of feasible solutions at increased load. △ Less

Submitted 21 May, 2015; originally announced May 2015.

Comments: extended version of an ACM e-Energy 2015 poster/workshop paper

Showing 1–19 of 19 results for author: Wagner, D