-
DESERE: The 1st Workshop on Decentralised Search and Recommendation
Authors:
Mohamed Ragab,
Yury Savateev,
Wenjie Wang,
Reza Moosaei,
Thanassis Tiropanis,
Alexandra Poulovassilis,
Adriane Chapman,
Helen Oliver,
George Roussos
Abstract:
The DESERE Workshop, our First Workshop on Decentralised Search and Recommendation, offers a platform for researchers to explore and share innovative ideas on decentralised web services, mainly focusing on three major topics: (i) societal impact of decentralised systems: their effect on privacy, policy, and regulation; (ii) decentralising applications: algorithmic and performance challenges that a…
▽ More
The DESERE Workshop, our First Workshop on Decentralised Search and Recommendation, offers a platform for researchers to explore and share innovative ideas on decentralised web services, mainly focusing on three major topics: (i) societal impact of decentralised systems: their effect on privacy, policy, and regulation; (ii) decentralising applications: algorithmic and performance challenges that arise from decentralisation; and (iii) infrastructure to support decentralised systems and services: peer-to-peer networks, routing, and performance evaluation tools
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Explanation Shift: Detecting distribution shifts on tabular data via the explanation space
Authors:
Carlos Mougan,
Klaus Broelemann,
Gjergji Kasneci,
Thanassis Tiropanis,
Steffen Staab
Abstract:
As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In the past, predictive performance was considered the key indicator to monitor. However, explanation aspects have come to attention within the last years. In this work, we investigate how model predictive performance and model explanation characteristics are affected under distribution…
▽ More
As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In the past, predictive performance was considered the key indicator to monitor. However, explanation aspects have come to attention within the last years. In this work, we investigate how model predictive performance and model explanation characteristics are affected under distribution shifts and how these key indicators are related to each other for tabular data. We find that the modeling of explanation shifts can be a better indicator for the detection of predictive performance changes than state-of-the-art techniques based on representations of distribution shifts. We provide a mathematical analysis of different types of distribution shifts as well as synthetic experimental examples.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Bias in Data-driven AI Systems -- An Introductory Survey
Authors:
Eirini Ntoutsi,
Pavlos Fafalios,
Ujwal Gadiraju,
Vasileios Iosifidis,
Wolfgang Nejdl,
Maria-Esther Vidal,
Salvatore Ruggieri,
Franco Turini,
Symeon Papadopoulos,
Emmanouil Krasanakis,
Ioannis Kompatsiaris,
Katharina Kinder-Kurlanda,
Claudia Wagner,
Fariba Karimi,
Miriam Fernandez,
Harith Alani,
Bettina Berendt,
Tina Kruegel,
Christian Heinze,
Klaus Broelemann,
Gjergji Kasneci,
Thanassis Tiropanis,
Steffen Staab
Abstract:
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their desig…
▽ More
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multi-disciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features like race, sex, etc.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
FAIR and Open Computer Science Research Software
Authors:
Wilhelm Hasselbring,
Leslie Carr,
Simon Hettrick,
Heather Packer,
Thanassis Tiropanis
Abstract:
In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design.
The Open Science agenda holds that science advances faster when we can build on existing…
▽ More
In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design.
The Open Science agenda holds that science advances faster when we can build on existing results. Therefore, research software has to be reusable for advancing science. Thus, we need proper research software engineering for obtaining reusable and sustainable research software. This way, software engineering methods may improve research in other disciplines. However, research in software engineering and computer science itself will also benefit from reuse when research software is involved.
For good scientific practice, the resulting research software should be open and adhere to the FAIR principles (findable, accessible, interoperable and repeatable) to allow repeatability, reproducibility, and reuse. Compared to research data, research software should be both archived for reproducibility and actively maintained for reusability. The FAIR data principles do not require openness, but research software should be open source software. Established open source software licenses provide sufficient licensing options, such that it should be the rare exception to keep research software closed.
We review and analyze the current state in this area in order to give recommendations for making computer science research software FAIR and open. We observe that research software publishing practices in computer science and in computational science show significant differences.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Analytics for the Internet of Things: A Survey
Authors:
Eugene Siow,
Thanassis Tiropanis,
Wendy Hall
Abstract:
The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving know…
▽ More
The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving knowledge from data, generating value like actionable insights from them. This article reviews work in the IoT and big data analytics from the perspective of their utility in creating efficient, effective and innovative applications and services for a wide spectrum of domains. We review the broad vision for the IoT as it is shaped in various communities, examine the application of data analytics across IoT domains, provide a categorisation of analytic approaches and propose a layered taxonomy from IoT data to analytics. This taxonomy provides us with insights on the appropriateness of analytical techniques, which in turn shapes a survey of enabling technology and infrastructure for IoT analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape future research.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
TritanDB: Time-series Rapid Internet of Things Analytics
Authors:
Eugene Siow,
Thanassis Tiropanis,
Xin Wang,
Wendy Hall
Abstract:
The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured time-series IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resource-constrained Things and in the cloud. In this p…
▽ More
The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured time-series IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resource-constrained Things and in the cloud. In this paper, we examine the structure of public IoT data and discover that the majority exhibit unique flat, wide and numerical characteristics with a mix of evenly and unevenly-spaced time-series. We investigate the advances in time-series databases for telemetry data and combine these findings with microbenchmarks to determine the best compression techniques and storage data structures to inform the design of a novel solution optimised for IoT data. A query translation method with low overhead even on resource-constrained Things allows us to utilise rich data models like the Resource Description Framework (RDF) for interoperability and data integration on top of the optimised storage. Our solution, TritanDB, shows an order of magnitude performance improvement across both Things and cloud hardware on many state-of-the-art databases within IoT scenarios. Finally, we describe how TritanDB supports various analyses of IoT time-series data like forecasting.
△ Less
Submitted 24 January, 2018;
originally announced January 2018.
-
PRESTO: Probabilistic Cardinality Estimation for RDF Queries Based on Subgraph Overlap**
Authors:
Xin Wang,
Eugene Siow,
Aastha Madaan,
Thanassis Tiropanis
Abstract:
In query optimisation accurate cardinality estimation is essential for finding optimal query plans. It is especially challenging for RDF due to the lack of explicit schema and the excessive occurrence of joins in RDF queries. Existing approaches typically collect statistics based on the counts of triples and estimate the cardinality of a query as the product of its join components, where errors ca…
▽ More
In query optimisation accurate cardinality estimation is essential for finding optimal query plans. It is especially challenging for RDF due to the lack of explicit schema and the excessive occurrence of joins in RDF queries. Existing approaches typically collect statistics based on the counts of triples and estimate the cardinality of a query as the product of its join components, where errors can accumulate even when the estimation of each component is accurate. As opposed to existing methods, we propose PRESTO, a cardinality estimation method that is based on the counts of subgraphs instead of triples and uses a probabilistic method to estimate cardinalities of RDF queries as a whole. PRESTO avoids some major issues of existing approaches and is able to accurately estimate arbitrary queries under a bound memory constraint. We evaluate PRESTO with YAGO and show that PRESTO is more accurate for both simple and complex queries.
△ Less
Submitted 19 January, 2018;
originally announced January 2018.
-
Signal Diffusion Map**: Optimal Forecasting with Time Varying Lags
Authors:
Paul Gaskell,
Frank McGroarty,
Thanassis Tiropanis
Abstract:
We introduce a new methodology for forecasting which we call Signal Diffusion Map**. Our approach accommodates features of real world financial data which have been ignored historically in existing forecasting methodologies. Our method builds upon well-established and accepted methods from other areas of statistical analysis. We develop and adapt those models for use in forecasting. We also pres…
▽ More
We introduce a new methodology for forecasting which we call Signal Diffusion Map**. Our approach accommodates features of real world financial data which have been ignored historically in existing forecasting methodologies. Our method builds upon well-established and accepted methods from other areas of statistical analysis. We develop and adapt those models for use in forecasting. We also present tests of our model on data in which we demonstrate the efficacy of our approach.
△ Less
Submitted 23 September, 2014;
originally announced September 2014.