-
Generalizable Temperature Nowcasting with Physics-Constrained RNNs for Predictive Maintenance of Wind Turbine Components
Authors:
Johannes Exenberger,
Matteo Di Salvo,
Thomas Hirsch,
Franz Wotawa,
Gerald Schweiger
Abstract:
Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system informa…
▽ More
Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system information often impedes their application in real world scenarios. We describe a simple and efficient way for physics-constrained deep learning-based predictive maintenance for wind turbine gearbox bearings with partial system knowledge. The approach is based on temperature nowcasting constrained by physics, where unknown system coefficients are treated as learnable neural network parameters. Results show improved generalization performance to unseen environments compared to a baseline neural network, which is especially important in low data scenarios often encountered in real-world applications.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Identifying non-natural language artifacts in bug reports
Authors:
Thomas Hirsch,
Birgit Hofer
Abstract:
Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content…
▽ More
Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content into natural language and artifacts at line level implemented in Python. We show how data from GitHub issue trackers can be used for automated training set generation, and present a custom preprocessing approach for bug reports. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds. We cross evaluated our model against a foreign dataset and a foreign R model for the same task. The Python implementation of our model and our datasets are made publicly available under an open source license.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
What we can learn from how programmers debug their code
Authors:
Thomas Hirsch,
Birgit Hofer
Abstract:
Researchers have developed numerous debugging approaches to help programmers in the debugging process, but these approaches are rarely used in practice. In this paper, we investigate how programmers debug their code and what researchers should consider when develo** debugging approaches. We conducted an online questionnaire where 102 programmers provided information about recently fixed bugs. We…
▽ More
Researchers have developed numerous debugging approaches to help programmers in the debugging process, but these approaches are rarely used in practice. In this paper, we investigate how programmers debug their code and what researchers should consider when develo** debugging approaches. We conducted an online questionnaire where 102 programmers provided information about recently fixed bugs. We found that the majority of bugs (69.6 %) are semantic bugs. Memory and concurrency bugs do not occur as frequently (6.9 % and 8.8 %), but they consume more debugging time. Locating a bug is more difficult than reproducing and fixing it. Programmers often use only IDE build-in tools for debugging. Furthermore, programmers frequently use a replication-observation-deduction pattern when debugging. These results suggest that debugging support is particularly valuable for memory and concurrency bugs. Furthermore, researchers should focus on the fault localization phase and integrate their tools into commonly used IDEs.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
A Fault Localization and Debugging Support Framework driven by Bug Tracking Data
Authors:
Thomas Hirsch
Abstract:
Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tool…
▽ More
Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tools. Little research has gone into automated identification of bug types to optimize the fault localization process. Further, existing fault localization techniques leverage on historical data only for augmentation of suspiciousness rankings. This thesis aims to provide a fault localization framework by combining data from various sources to help developers in the fault localization process. To achieve this, a bug classification schema is introduced, benchmarks are created, and a novel fault localization method based on historical data is proposed.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Root cause prediction based on bug reports
Authors:
Thomas Hirsch,
Birgit Hofer
Abstract:
This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create…
▽ More
This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs' root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies
Authors:
Nikolaos Flemotomos,
Victor R. Martinez,
Zhuohao Chen,
Karan Singla,
Victor Ardulov,
Raghuveer Peri,
Derek D. Caperton,
James Gibson,
Michael J. Tanana,
Panayiotis Georgiou,
Jake Van Epps,
Sarah P. Lord,
Tad Hirsch,
Zac E. Imel,
David C. Atkins,
Shrikanth Narayanan
Abstract:
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai…
▽ More
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is however a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called Motivational Interviewing, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5,000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.
△ Less
Submitted 27 March, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.