-
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods
Authors:
Damiano Piovesan,
Davide Zago,
Parnal Joshi,
M. Clara De Paolis Kaluza,
Mahta Mehdiabadi,
Rashika Ramola,
Alexander Miguel Monzon,
Walter Reade,
Iddo Friedberg,
Predrag Radivojac,
Silvio C. E. Tosatto
Abstract:
We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requiremen…
▽ More
We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software.
△ Less
Submitted 12 March, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Diffusion-Aware Sampling and Estimation in Information Diffusion Networks
Authors:
Motahareh Eslami Mehdiabadi,
Hamid R. Rabiee,
Mostafa Salehi
Abstract:
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracin…
▽ More
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracing based sampling design which uses the infection times as local information without any knowledge about the latent structure of diffusion network. To correct the bias of sampled data, we introduce three estimators for different categories; link-based, node-based, and cascade-based. To the best of our knowledge, this is the first attempt to introduce a complete measurement framework for diffusion networks. We also show that the estimator plays an important role in correcting the bias of sampling from diffusion networks. Our comprehensive empirical analysis over large synthetic and real datasets demonstrates that in average, the proposed framework outperforms the common BFS and RW sampling methods in terms of link-based characteristics by about 37% and 35%, respectively.
△ Less
Submitted 29 May, 2014;
originally announced May 2014.
-
Sampling from Diffusion Networks
Authors:
Motahareh Eslami Mehdiabadi,
Hamid R. Rabiee,
Mostafa Salehi
Abstract:
The diffusion phenomenon has a remarkable impact on Online Social Networks (OSNs). Gathering diffusion data over these large networks encounters many challenges which can be alleviated by adopting a suitable sampling approach. The contributions of this paper is twofold. First we study the sampling approaches over diffusion networks, and for the first time, classify these approaches into two catego…
▽ More
The diffusion phenomenon has a remarkable impact on Online Social Networks (OSNs). Gathering diffusion data over these large networks encounters many challenges which can be alleviated by adopting a suitable sampling approach. The contributions of this paper is twofold. First we study the sampling approaches over diffusion networks, and for the first time, classify these approaches into two categories; (1) Structure-based Sampling (SBS), and (2) Diffusion-based Sampling (DBS). The dependency of the former approach to topological features of the network, and unavailability of real diffusion paths in the latter, converts the problem of choosing an appropriate sampling approach to a trade-off. Second, we formally define the diffusion network sampling problem and propose a number of new diffusion-based characteristics to evaluate introduced sampling approaches. Our experiments on large scale synthetic and real datasets show that although DBS performs much better than SBS in higher sampling rates (16% ~ 29% on average), their performances differ about 7% in lower sampling rates. Therefore, in real large scale systems with low sampling rate requirements, SBS would be a better choice according to its lower time complexity in gathering data compared to DBS. Moreover, we show that the introduced sampling approaches (SBS and DBS) play a more important role than the graph exploration techniques such as Breadth-First Search (BFS) and Random Walk (RW) in the analysis of diffusion processes.
△ Less
Submitted 28 May, 2014;
originally announced May 2014.