Search | arXiv e-print repository

arXiv:2404.04126 [pdf, other]

Generalizable Temperature Nowcasting with Physics-Constrained RNNs for Predictive Maintenance of Wind Turbine Components

Authors: Johannes Exenberger, Matteo Di Salvo, Thomas Hirsch, Franz Wotawa, Gerald Schweiger

Abstract: Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system informa… ▽ More Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system information often impedes their application in real world scenarios. We describe a simple and efficient way for physics-constrained deep learning-based predictive maintenance for wind turbine gearbox bearings with partial system knowledge. The approach is based on temperature nowcasting constrained by physics, where unknown system coefficients are treated as learnable neural network parameters. Results show improved generalization performance to unseen environments compared to a baseline neural network, which is especially important in low data scenarios often encountered in real-world applications. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Published at ICLR 2024 Tackling Climate Change with Machine Learning Workshop

arXiv:2110.01336 [pdf, other]

Identifying non-natural language artifacts in bug reports

Authors: Thomas Hirsch, Birgit Hofer

Abstract: Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content… ▽ More Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content into natural language and artifacts at line level implemented in Python. We show how data from GitHub issue trackers can be used for automated training set generation, and present a custom preprocessing approach for bug reports. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds. We cross evaluated our model against a foreign dataset and a foreign R model for the same task. The Python implementation of our model and our datasets are made publicly available under an open source license. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 7 pages, to be published in 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW'21), November 15-19, 2021

arXiv:2103.12447 [pdf, other]

What we can learn from how programmers debug their code

Authors: Thomas Hirsch, Birgit Hofer

Abstract: Researchers have developed numerous debugging approaches to help programmers in the debugging process, but these approaches are rarely used in practice. In this paper, we investigate how programmers debug their code and what researchers should consider when develo** debugging approaches. We conducted an online questionnaire where 102 programmers provided information about recently fixed bugs. We… ▽ More Researchers have developed numerous debugging approaches to help programmers in the debugging process, but these approaches are rarely used in practice. In this paper, we investigate how programmers debug their code and what researchers should consider when develo** debugging approaches. We conducted an online questionnaire where 102 programmers provided information about recently fixed bugs. We found that the majority of bugs (69.6 %) are semantic bugs. Memory and concurrency bugs do not occur as frequently (6.9 % and 8.8 %), but they consume more debugging time. Locating a bug is more difficult than reproducing and fixing it. Programmers often use only IDE build-in tools for debugging. Furthermore, programmers frequently use a replication-observation-deduction pattern when debugging. These results suggest that debugging support is particularly valuable for memory and concurrency bugs. Furthermore, researchers should focus on the fault localization phase and integrate their tools into commonly used IDEs. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 4 pages, accepted and to be published in Proceedings of SER&IP 2021

arXiv:2103.02386 [pdf, other]

doi 10.1109/ISSREW51248.2020.00053

A Fault Localization and Debugging Support Framework driven by Bug Tracking Data

Authors: Thomas Hirsch

Abstract: Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tool… ▽ More Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tools. Little research has gone into automated identification of bug types to optimize the fault localization process. Further, existing fault localization techniques leverage on historical data only for augmentation of suspiciousness rankings. This thesis aims to provide a fault localization framework by combining data from various sources to help developers in the fault localization process. To achieve this, a bug classification schema is introduced, benchmarks are created, and a novel fault localization method based on historical data is proposed. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 4 pages

Journal ref: Proceedings of the 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Coimbra, Portugal, 2020, pp. 139-142

arXiv:2103.02372 [pdf, other]

doi 10.1109/ISSREW51248.2020.00067

Root cause prediction based on bug reports

Authors: Thomas Hirsch, Birgit Hofer

Abstract: This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create… ▽ More This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs' root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 6 pages

Journal ref: Proceedings of the 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Coimbra, Portugal, 2020, pp. 171-176

arXiv:2102.11265 [pdf, other]

doi 10.3758/s13428-021-01623-4

Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

Authors: Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

Abstract: With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai… ▽ More With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is however a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called Motivational Interviewing, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5,000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes. △ Less

Submitted 27 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: new version has an updated title

Showing 1–6 of 6 results for author: Hirsch, T