-
Refining Similarity Matrices to Cluster Attributed Networks Accurately
Authors:
Yuta Yajima,
Akihiro Inokuchi
Abstract:
As a result of the recent popularity of social networks and the increase in the number of research papers published across all fields, attributed networks consisting of relationships between objects, such as humans and the papers, that have attributes are becoming increasingly large. Therefore, various studies for clustering attributed networks into sub-networks are being actively conducted. When…
▽ More
As a result of the recent popularity of social networks and the increase in the number of research papers published across all fields, attributed networks consisting of relationships between objects, such as humans and the papers, that have attributes are becoming increasingly large. Therefore, various studies for clustering attributed networks into sub-networks are being actively conducted. When clustering attributed networks using spectral clustering, the clustering accuracy is strongly affected by the quality of the similarity matrices, which are input into spectral clustering and represent the similarities between pairs of objects. In this paper, we aim to increase the accuracy by refining the matrices before applying spectral clustering to them. We verify the practicability of our proposed method by comparing the accuracy of spectral clustering with similarity matrices before and after refining them.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
From Academia to Software Development: Publication Citations in Source Code Comments
Authors:
Akira Inokuchi,
Yusuf Sulistyo Nugroho,
Supatsara Wattanakriengkrai,
Fumiaki Konishi,
Hideaki Hata,
Christoph Treude,
Akito Monden,
Kenichi Matsumoto
Abstract:
Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source so…
▽ More
Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.
△ Less
Submitted 1 May, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
GTRACE-RS: Efficient Graph Sequence Mining using Reverse Search
Authors:
Akihiro Inokuchi,
Hiroaki Ikuta,
Takashi Washio
Abstract:
The mining of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences. A method, called GTRACE, has been proposed to mine frequent patterns from graph sequences under the assumption that changes in graphs are gradual. Although GTRACE mines the frequent patterns efficiently, it still…
▽ More
The mining of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences. A method, called GTRACE, has been proposed to mine frequent patterns from graph sequences under the assumption that changes in graphs are gradual. Although GTRACE mines the frequent patterns efficiently, it still needs substantial computation time to mine the patterns from graph sequences containing large graphs and long sequences. In this paper, we propose a new version of GTRACE that enables efficient mining of frequent patterns based on the principle of a reverse search. The underlying concept of the reverse search is a general scheme for designing efficient algorithms for hard enumeration problems. Our performance study shows that the proposed method is efficient and scalable for mining both long and large graph sequence patterns and is several orders of magnitude faster than the original GTRACE.
△ Less
Submitted 18 October, 2011;
originally announced October 2011.