-
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
Authors:
Xingfang Wu,
Heng Li,
Nobukazu Yoshioka,
Hironori Washizaki,
Foutse Khomh
Abstract:
One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow ado…
▽ More
One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.
△ Less
Submitted 4 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Design by Contract Framework for Quantum Software
Authors:
Masaomi Yamaguchi,
Nobukazu Yoshioka
Abstract:
To realize reliable quantum software, techniques to automatically ensure the quantum software's correctness have recently been investigated. However, they primarily focus on fixed quantum circuits rather than the procedure of building quantum circuits. Despite being a common approach, the correctness of building circuits using different parameters following the same procedure is not guaranteed. To…
▽ More
To realize reliable quantum software, techniques to automatically ensure the quantum software's correctness have recently been investigated. However, they primarily focus on fixed quantum circuits rather than the procedure of building quantum circuits. Despite being a common approach, the correctness of building circuits using different parameters following the same procedure is not guaranteed. To this end, we propose a design-by-contract framework for quantum software. Our framework provides a python-embedded language to write assertions on the input and output states of all quantum circuits built by certain procedures. Additionally, it provides a method to write assertions about the statistical processing of measurement results to ensure the procedure's correctness for obtaining the final result. These assertions are automatically checked using a quantum computer simulator. For evaluation, we implemented our framework and wrote assertions for some widely used quantum algorithms. Consequently, we found that our framework has sufficient expressive power to verify the whole procedure of quantum software.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Preliminary Systematic Literature Review of Machine Learning System Development Process
Authors:
Yasuhiro Watanabe,
Hironori Washizaki,
Kazunori Sakamoto,
Daisuke Saito,
Kiyoshi Honda,
Naohiko Tsuda,
Yoshiaki Fukazawa,
Nobukazu Yoshioka
Abstract:
Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems. Assuming that detailed development processes depend on individual developers and are not discussed in detail. To help developers to standardize their ML system development processes, we conduct a preliminary systematic literatu…
▽ More
Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems. Assuming that detailed development processes depend on individual developers and are not discussed in detail. To help developers to standardize their ML system development processes, we conduct a preliminary systematic literature review on ML system development processes. A search query of 2358 papers identified 7 papers as well as two other papers determined in an ad-hoc review. Our findings include emphasized phases in ML system developments, frequently described practices and tailored traditional software development practices.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Incidents Are Meant for Learning, Not Repeating: Sharing Knowledge About Security Incidents in Cyber-Physical Systems
Authors:
Faeq Alrimawi,
Liliana Pasquale,
Deepak Mehta,
Nobukazu Yoshioka,
Bashar Nuseibeh
Abstract:
Cyber-physical systems (CPSs) are part of most critical infrastructures such as industrial automation and transportation systems. Thus, security incidents targeting CPSs can have disruptive consequences to assets and people. As prior incidents tend to re-occur, sharing knowledge about these incidents can help organizations be more prepared to prevent, mitigate or investigate future incidents. This…
▽ More
Cyber-physical systems (CPSs) are part of most critical infrastructures such as industrial automation and transportation systems. Thus, security incidents targeting CPSs can have disruptive consequences to assets and people. As prior incidents tend to re-occur, sharing knowledge about these incidents can help organizations be more prepared to prevent, mitigate or investigate future incidents. This paper proposes a novel approach to enable representation and sharing of knowledge about CPS incidents across different organizations. To support sharing, we represent incident knowledge (incident patterns) capturing incident characteristics that can manifest again, such as incident activities or vulnerabilities exploited by offenders. Incident patterns are a more abstract representation of specific incident instances and, thus, are general enough to be applicable to various systems - different than the one in which the incident occurred. They can also avoid disclosing potentially sensitive information about an organization's assets and resources. We provide an automated technique to extract an incident pattern from a specific incident instance. To understand how an incident pattern can manifest again in other cyber-physical systems, we also provide an automated technique to instantiate incident patterns to specific systems. We demonstrate the feasibility of our approach in the application domain of smart buildings. We evaluate correctness, scalability, and performance using two substantive scenarios inspired by real-world systems and incidents.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
Landscape of IoT Patterns
Authors:
Hironori Washizaki,
Nobukazu Yoshioka,
Atsuo Hazeyama,
Takehisa Kato,
Haruhiko Kaiya,
Shinpei Ogata,
Takao Okubo,
Eduardo B. Fernandez
Abstract:
Patterns are encapsulations of problems and solutions under specific contexts. As the industry is realizing many successes (and failures) in IoT systems development and operations, many IoT patterns have been published such as IoT design patterns and IoT architecture patterns. Because these patterns are not well classified, their adoption does not live up to their potential. To understand the reas…
▽ More
Patterns are encapsulations of problems and solutions under specific contexts. As the industry is realizing many successes (and failures) in IoT systems development and operations, many IoT patterns have been published such as IoT design patterns and IoT architecture patterns. Because these patterns are not well classified, their adoption does not live up to their potential. To understand the reasons, this paper analyzes an extensive set of published IoT architecture and design patterns according to several dimensions and outlines directions for improvements in publishing and adopting IoT patterns.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.