-
MultiCast: Zero-Shot Multivariate Time Series Forecasting Using LLMs
Authors:
Georgios Chatzigeorgakidis,
Konstantinos Lentzos,
Dimitrios Skoutas
Abstract:
Predicting future values in multivariate time series is vital across various domains. This work explores the use of large language models (LLMs) for this task. However, LLMs typically handle one-dimensional data. We introduce MultiCast, a zero-shot LLM-based approach for multivariate time series forecasting. It allows LLMs to receive multivariate time series as input, through three novel token mul…
▽ More
Predicting future values in multivariate time series is vital across various domains. This work explores the use of large language models (LLMs) for this task. However, LLMs typically handle one-dimensional data. We introduce MultiCast, a zero-shot LLM-based approach for multivariate time series forecasting. It allows LLMs to receive multivariate time series as input, through three novel token multiplexing solutions that effectively reduce dimensionality while preserving key repetitive patterns. Additionally, a quantization scheme helps LLMs to better learn these patterns, while significantly reducing token use for practical applications. We showcase the performance of our approach in terms of RMSE and execution time against state-of-the-art approaches on three real-world datasets.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Accelerating Spatio-Textual Queries with Learned Indices
Authors:
Georgios Chatzigeorgakidis,
Kostas Patroumpas,
Dimitrios Skoutas,
Spiros Athanasiou
Abstract:
Efficiently computing spatio-textual queries has become increasingly important in various applications that need to quickly retrieve geolocated entities associated with textual information, such as in location-based services and social networks. To accelerate such queries, several works have proposed combining spatial and textual indices into hybrid index structures. Recently, the novel idea of re…
▽ More
Efficiently computing spatio-textual queries has become increasingly important in various applications that need to quickly retrieve geolocated entities associated with textual information, such as in location-based services and social networks. To accelerate such queries, several works have proposed combining spatial and textual indices into hybrid index structures. Recently, the novel idea of replacing traditional indices with ML models has attracted a lot of attention. This includes works on learned spatial indices, where the main challenge is to address the lack of a total ordering among objects in a multidimensional space. In this work, we investigate how to extend this novel type of index design to the case of spatio-textual data. We study different design choices, based on either loose or tight coupling between the spatial and textual part, as well as a hybrid index that combines a traditional and a learned component. We also perform an experimental evaluation using several real-world datasets to assess the potential benefits of using a learned index for evaluating spatio-textual queries.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark]
Authors:
Alexandros Zeakis,
George Papadakis,
Dimitrios Skoutas,
Manolis Koubarakis
Abstract:
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, with the most popular ones being fastText and variants of the BERT model. However, there is no detailed analysis of their pros and cons. To cover this…
▽ More
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, with the most popular ones being fastText and variants of the BERT model. However, there is no detailed analysis of their pros and cons. To cover this gap, we perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching. Our experimental results provide novel insights into the strengths and weaknesses of the main language models, facilitating researchers and practitioners to select the most suitable ones in practice.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Atrapos: Real-time Evaluation of Metapath Query Workloads
Authors:
Serafeim Chatzopoulos,
Thanasis Vergoulis,
Dimitrios Skoutas,
Theodore Dalamagas,
Christos Tryfonopoulos,
Panagiotis Karras
Abstract:
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computa…
▽ More
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.
-- Note that this version of our work is more extended than the one presented in TheWebConf 2023 (doi: 10.1145/3543507.3583322)
△ Less
Submitted 25 May, 2023; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Local Similarity Search on Geolocated Time Series Using Hybrid Indexing
Authors:
Georgios Chatzigeorgakidis,
Dimitrios Skoutas,
Kostas Patroumpas,
Themis Palpanas,
Spiros Athanasiou,
Spiros Skiadopoulos
Abstract:
Geolocated time series, i.e., time series associated with certain locations, abound in many modern applications. In this paper, we consider hybrid queries for retrieving geolocated time series based on filters that combine spatial distance and time series similarity. For the latter, unlike existing work, we allow filtering based on local similarity, which is computed based on subsequences rather t…
▽ More
Geolocated time series, i.e., time series associated with certain locations, abound in many modern applications. In this paper, we consider hybrid queries for retrieving geolocated time series based on filters that combine spatial distance and time series similarity. For the latter, unlike existing work, we allow filtering based on local similarity, which is computed based on subsequences rather than the entire length of each series, thus allowing the discovery of more fine-grained trends and patterns. To efficiently support such queries, we first leverage the state-of-the-art BTSR-tree index, which utilizes bounds over both the locations and the shapes of time series to prune the search space. Moreover, we propose optimizations that check at specific timestamps to identify candidate time series that may exceed the required local similarity threshold. To further increase pruning power, we introduce the SBTSR-tree index, an extension to BTSR-tree, which additionally segments the time series temporally, allowing the construction of tighter bounds. Our experimental results on several real-world datasets demonstrate that SBTSR-tree can provide answers much faster for all examined query types. This paper has been published in the 27th International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2019).
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Local Pair and Bundle Discovery over Co-Evolving Time Series
Authors:
Georgios Chatzigeorgakidis,
Dimitrios Skoutas,
Kostas Patroumpas,
Themis Palpanas,
Spiros Athanasiou,
Spiros Skiadopoulos
Abstract:
Time series exploration and mining has many applications across several industrial and scientific domains. In this paper, we consider the problem of detecting locally similar pairs and groups, called bundles, over co-evolving time series. These are pairs or groups of subsequences whose values do not differ by more than ε for at least delta consecutive timestamps, thus indicating common local patte…
▽ More
Time series exploration and mining has many applications across several industrial and scientific domains. In this paper, we consider the problem of detecting locally similar pairs and groups, called bundles, over co-evolving time series. These are pairs or groups of subsequences whose values do not differ by more than ε for at least delta consecutive timestamps, thus indicating common local patterns and trends. We first present a baseline algorithm that performs a sweep line scan across all timestamps to identify matches. Then, we propose a filter-verification technique that only examines candidate matches at judiciously chosen checkpoints across time. Specifically, we introduce two block scanning algorithms for discovering local pairs and bundles respectively, which leverage the potential of checkpoints to aggressively prune the search space. We experimentally evaluate our methods against real-world and synthetic datasets, demonstrating a speed-up in execution time by an order of magnitude over the baseline. This paper has been published in the 16th International Symposium on Spatial and Temporal Databases (SSTD19).
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Twin Subsequence Search in Time Series
Authors:
Georgios Chatzigeorgakidis,
Dimitrios Skoutas,
Kostas Patroumpas,
Themis Palpanas,
Spiros Athanasiou,
Spiros Skiadopoulos
Abstract:
We address the problem of subsequence search in time series using Chebyshev distance, to which we refer as twin subsequence search. We first show how existing time series indices can be extended to perform twin subsequence search. Then, we introduce TS-Index, a novel index tailored to this problem. Our experimental evaluation compares these approaches against real time series datasets, and demonst…
▽ More
We address the problem of subsequence search in time series using Chebyshev distance, to which we refer as twin subsequence search. We first show how existing time series indices can be extended to perform twin subsequence search. Then, we introduce TS-Index, a novel index tailored to this problem. Our experimental evaluation compares these approaches against real time series datasets, and demonstrates that TS-Index can retrieve twin subsequences much faster under various query conditions. This paper has been published in the 24th International Conference on Extending Database Technology (EDBT 2021).
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]
Authors:
Sihem Amer-Yahia,
Georgia Koutrika,
Frederic Bastian,
Theofilos Belmpas,
Martin Braschler,
Ursin Brunner,
Diego Calvanese,
Maximilian Fabricius,
Orest Gkini,
Catherine Kosten,
Davide Lanti,
Antonis Litke,
Hendrik Lücke-Tieke,
Francesco Alessandro Massucci,
Tarcisio Mendes de Farias,
Alessandro Mosca,
Francesco Multari,
Nikolaos Papadakis,
Dimitris Papadopoulos,
Yogendra Patil,
Aurélien Personnaz,
Guillem Rull,
Ana Sima,
Ellery Smith,
Dimitrios Skoutas
, et al. (3 additional authors not shown)
Abstract:
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data expl…
▽ More
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
A Survey of Blocking and Filtering Techniques for Entity Resolution
Authors:
George Papadakis,
Dimitrios Skoutas,
Emmanouil Thanos,
Themis Palpanas
Abstract:
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We also provided an in-dept coverage of each category, further classifying the corresponding works into novel sub-categories. Lately, the efficiency techniques have r…
▽ More
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We also provided an in-dept coverage of each category, further classifying the corresponding works into novel sub-categories. Lately, the efficiency techniques have received more attention, due to the rise of Big Data. This includes large volumes of semi-structured data, which pose challenges not only to the scalability of efficiency techniques, but also to their core assumptions: the requirement of Blocking for schema knowledge and of Filtering for high similarity thresholds. The former led to the introduction of schema-agnostic Blocking in conjunction with Block Processing techniques, while the latter led to more relaxed criteria of similarity. Our survey covers these new fields in detail, putting in context all relevant works.
△ Less
Submitted 21 August, 2020; v1 submitted 15 May, 2019;
originally announced May 2019.
-
A Buffer-aided Successive Opportunistic Relay Selection Scheme with Power Adaptation and Inter-Relay Interference Cancellation for Cooperative Diversity Systems
Authors:
Nikolaos Nomikos,
Themistoklis Charalambous,
Ioannis Krikidis,
Dimitrios Skoutas,
Demosthenes Vouyioukas,
Mikael Johansson
Abstract:
In this paper we consider a simple cooperative network consisting of a source, a destination and a cluster of decode-and-forward half-duplex relays. At each time-slot, the source and (possibly) one of the relays transmit a packet to another relay and the destination, respectively, resulting in inter-relay interference (IRI). In this work, with the aid of buffers at the relays, we mitigate the detr…
▽ More
In this paper we consider a simple cooperative network consisting of a source, a destination and a cluster of decode-and-forward half-duplex relays. At each time-slot, the source and (possibly) one of the relays transmit a packet to another relay and the destination, respectively, resulting in inter-relay interference (IRI). In this work, with the aid of buffers at the relays, we mitigate the detrimental effect of IRI through interference cancellation. More specifically, we propose the min-power scheme that minimizes the total energy expenditure per time slot under an IRI cancellation scheme. Apart from minimizing the energy expenditure, the min-power selection scheme, also provides better throughput and lower outage probability than existing works in the literature. It is the first time that interference cancellation is combined with buffer-aided relays and power adaptation to mitigate the IRI and minimize the energy expenditure. The new relay selection policy is analyzed in terms of outage probability and diversity, by modeling the evolution of the relay buffers as a Markov Chain (MC). We construct the state transition matrix of the MC, and hence obtain the steady state with which we can characterize the outage probability. The proposed scheme outperforms relevant state-of-the-art relay selection schemes in terms of throughput, diversity and energy efficiency, as demonstrated via examples.
△ Less
Submitted 17 July, 2014; v1 submitted 6 February, 2013;
originally announced February 2013.