-
BIP! NDR (NoDoiRefs): A Dataset of Citations From Papers Without DOIs in Computer Science Conferences and Workshops
Authors:
Paris Koloveas,
Serafeim Chatzopoulos,
Christos Tryfonopoulos,
Thanasis Vergoulis
Abstract:
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, rais…
▽ More
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Atrapos: Real-time Evaluation of Metapath Query Workloads
Authors:
Serafeim Chatzopoulos,
Thanasis Vergoulis,
Dimitrios Skoutas,
Theodore Dalamagas,
Christos Tryfonopoulos,
Panagiotis Karras
Abstract:
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computa…
▽ More
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.
-- Note that this version of our work is more extended than the one presented in TheWebConf 2023 (doi: 10.1145/3543507.3583322)
△ Less
Submitted 25 May, 2023; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Benchmarking and scaling of deep learning models for land cover image classification
Authors:
Ioannis Papoutsis,
Nikolaos-Ioannis Bountos,
Angelos Zavras,
Dimitrios Michail,
Christos Tryfonopoulos
Abstract:
The availability of the sheer volume of Copernicus Sentinel-2 imagery has created new opportunities for exploiting deep learning (DL) methods for land use land cover (LULC) image classification. However, an extensive set of benchmark experiments is currently lacking, i.e. DL models tested on the same dataset, with a common and consistent set of metrics, and in the same hardware. In this work, we u…
▽ More
The availability of the sheer volume of Copernicus Sentinel-2 imagery has created new opportunities for exploiting deep learning (DL) methods for land use land cover (LULC) image classification. However, an extensive set of benchmark experiments is currently lacking, i.e. DL models tested on the same dataset, with a common and consistent set of metrics, and in the same hardware. In this work, we use the BigEarthNet Sentinel-2 dataset to benchmark for the first time different state-of-the-art DL models for the multi-label, multi-class LULC image classification problem, contributing with an exhaustive zoo of 60 trained models. Our benchmark includes standard CNNs, as well as non-convolutional methods. We put to the test EfficientNets and Wide Residual Networks (WRN) architectures, and leverage classification accuracy, training time and inference rate. Furthermore, we propose to use the EfficientNet framework for the compound scaling of a lightweight WRN. Enhanced with an Efficient Channel Attention mechanism, our scaled lightweight model emerged as the new state-of-the-art. It achieves 4.5% higher averaged F-Score classification accuracy for all 19 LULC classes compared to a standard ResNet50 baseline model, with an order of magnitude less trainable parameters. We provide access to all trained models, along with our code for distributed training on multiple GPU nodes. This model zoo of pre-trained encoders can be used for transfer learning and rapid prototy** in different remote sensing tasks that use Sentinel-2 data, instead of exploiting backbone models trained with data from a different domain, e.g., from ImageNet. We validate their suitability for transfer learning in different datasets of diverse volumes. Our top-performing WRN achieves state-of-the-art performance (71.1% F-Score) on the SEN12MS dataset while being exposed to only a small fraction of the training dataset.
△ Less
Submitted 14 September, 2022; v1 submitted 17 November, 2021;
originally announced November 2021.
-
A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence
Authors:
Paris Koloveas,
Thanasis Chantzios,
Christos Tryfonopoulos,
Spiros Skiadopoulos
Abstract:
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from secur…
▽ More
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Social Media Monitoring for IoT Cyber-Threats
Authors:
Sofia Alevizopoulou,
Paris Koloveas,
Christos Tryfonopoulos,
Paraskevi Raftopoulou
Abstract:
The rapid development of IoT applications and their use in various fields of everyday life has resulted in an escalated number of different possible cyber-threats, and has consequently raised the need of securing IoT devices. Collecting Cyber-Threat Intelligence (e.g., zero-day vulnerabilities or trending exploits) from various online sources and utilizing it to proactively secure IoT systems or p…
▽ More
The rapid development of IoT applications and their use in various fields of everyday life has resulted in an escalated number of different possible cyber-threats, and has consequently raised the need of securing IoT devices. Collecting Cyber-Threat Intelligence (e.g., zero-day vulnerabilities or trending exploits) from various online sources and utilizing it to proactively secure IoT systems or prepare mitigation scenarios has proven to be a promising direction. In this work, we focus on social media monitoring and investigate real-time Cyber-Threat Intelligence detection from the Twitter stream. Initially, we compare and extensively evaluate six different machine-learning based classification alternatives trained with vulnerability descriptions and tested with real-world data from the Twitter stream to identify the best-fitting solution. Subsequently, based on our findings, we propose a novel social media monitoring system tailored to the IoT domain; the system allows users to identify recent/trending vulnerabilities and exploits on IoT devices. Finally, to aid research on the field and support the reproducibility of our results we publicly release all annotated datasets created during this process.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Efficient Continuous Multi-Query Processing over Graph Streams
Authors:
Lefteris Zervakis,
Vinay Setty,
Christos Tryfonopoulos,
Katja Hose
Abstract:
Graphs are ubiquitous and ever-present data structures that have a wide range of applications involving social networks, knowledge bases and biological interactions. The evolution of a graph in such scenarios can yield important insights about the nature and activities of the underlying network, which can then be utilized for applications such as news dissemination, network monitoring, and content…
▽ More
Graphs are ubiquitous and ever-present data structures that have a wide range of applications involving social networks, knowledge bases and biological interactions. The evolution of a graph in such scenarios can yield important insights about the nature and activities of the underlying network, which can then be utilized for applications such as news dissemination, network monitoring, and content curation. Capturing the continuous evolution of a graph can be achieved by long-standing sub-graph queries. Although, for many applications this can only be achieved by a set of queries, state-of-the-art approaches focus on a single query scenario. In this paper, we therefore introduce the notion of continuous multi-query processing over graph streams and discuss its application to a number of use cases. To this end, we designed and developed a novel algorithmic solution for efficient multi-query evaluation against a stream of graph updates and experimentally demonstrated its applicability. Our results against two baseline approaches using real-world, as well as synthetic datasets, confirm a two orders of magnitude improvement of the proposed solution.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
OntoFM: A Personal Ontology-based File Manager for the Desktop
Authors:
Jenny Rompa,
Giorgos Lepouras,
Costas Vassilakis,
Christos Tryfonopoulos
Abstract:
Personal ontologies have been proposed as a means to support the semantic management of user information. Assuming that a personal ontology system is in use, new tools have to be developed at user interface level to exploit the enhanced capabilities offered by the system. In this work, we present an ontology-based file manager that allows semantic searching on the user's personal information space…
▽ More
Personal ontologies have been proposed as a means to support the semantic management of user information. Assuming that a personal ontology system is in use, new tools have to be developed at user interface level to exploit the enhanced capabilities offered by the system. In this work, we present an ontology-based file manager that allows semantic searching on the user's personal information space. The file manager exploits the ontology relations to present files associated with specific concepts, proposes new related concepts to users, and helps them explore the information space and locate the required file.
△ Less
Submitted 8 July, 2013;
originally announced July 2013.
-
Full-text Support for Publish/Subscribe Ontology Systems
Authors:
Lefteris Zervakis,
Christos Tryfonopoulos,
Antonios Papadakis-Pesaresi,
Manolis Koubarakis,
Spiros Skiadopoulos
Abstract:
We envision a publish/subscribe ontology system that is able to index millions of user subscriptions and filter them against ontology data that arrive in a streaming fashion. In this work, we propose a SPARQL extension appropriate for a publish/subscribe setting; our extension builds on the natural semantic graph matching of the language and supports the creation of full-text subscriptions. Subseq…
▽ More
We envision a publish/subscribe ontology system that is able to index millions of user subscriptions and filter them against ontology data that arrive in a streaming fashion. In this work, we propose a SPARQL extension appropriate for a publish/subscribe setting; our extension builds on the natural semantic graph matching of the language and supports the creation of full-text subscriptions. Subsequently, we propose a main-memory subscription indexing algorithm which performs both semantic and full-text matching at low complexity and minimal filtering time. Thus, when ontology data are published matching subscriptions are identified and notifications are forwarded to users.
△ Less
Submitted 8 July, 2013;
originally announced July 2013.