Skip to main content

Showing 1–14 of 14 results for author: Triantafillou, P

.
  1. arXiv:2406.09073  [pdf, other

    cs.LG

    Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

    Authors: Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon

    Abstract: We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.01257  [pdf, other

    cs.LG

    What makes unlearning hard and what to do about it

    Authors: Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, Peter Triantafillou

    Abstract: Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.03097  [pdf, other

    cs.LG cs.AI cs.CL

    To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models

    Authors: George-Octavian Barbulescu, Peter Triantafillou

    Abstract: LLMs have been found to memorize training textual sequences and regurgitate verbatim said sequences during text generation time. This fact is known to be the cause of privacy and related (e.g., copyright) problems. Unlearning in LLMs then takes the form of devising new algorithms that will properly deal with these side-effects of memorized data, while not hurting the model's utility. We offer a fr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICML 2024

  4. arXiv:2401.14375  [pdf, other

    cs.SI

    The GraphTempo Framework for Exploring the Evolution of a Graph through Pattern Aggregation

    Authors: Evangelia Tsoukanara, Georgia Koloniari, Evaggelia Pitoura, Peter Triantafillou

    Abstract: When the focus is on the relationships or interactions between entities, graphs offer an intuitive model for many real-world data. Such graphs are usually large and change over time, thus, requiring models and strategies that explore their evolution. We study the evolution of aggregated graphs and introduce the GraphTempo model that allows temporal and attribute aggregation not only on node level… ▽ More

    Submitted 6 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  5. arXiv:2311.17276  [pdf, other

    cs.DB

    Machine Unlearning in Learned Databases: An Experimental Analysis

    Authors: Meghdad Kurmanji, Eleni Triantafillou, Peter Triantafillou

    Abstract: Machine learning models based on neural networks (NNs) are enjoying ever-increasing attention in the DB community. However, an important issue has been largely overlooked, namely the challenge of dealing with the highly dynamic nature of DBs, where data updates are fundamental, highly-frequent operations. Although some recent research has addressed the issues of maintaining updated NN models in th… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted as a conference paper at SIGMOD 2024

  6. arXiv:2302.09880  [pdf, other

    cs.LG cs.CR

    Towards Unbounded Machine Unlearning

    Authors: Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, Eleni Triantafillou

    Abstract: Deep machine unlearning is the problem of `removing' from a trained neural network a subset of its training set. This problem is very timely and has many applications, including the key tasks of removing biases (RB), resolving confusion (RC) (caused by mislabelled data in trained models), as well as allowing users to exercise their `right to be forgotten' to protect User Privacy (UP). This paper i… ▽ More

    Submitted 30 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

  7. arXiv:2210.05508  [pdf, other

    cs.DB cs.LG

    Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

    Authors: Meghdad Kurmanji, Peter Triantafillou

    Abstract: Machine Learning (ML) is changing DBs as many DB components are being replaced by ML models. One open problem in this setting is how to update such ML models in the presence of data updates. We start this investigation focusing on data insertions (dominating updates in analytical DBs). We study how to update neural network (NN) models when new data follows a different distribution (a.k.a. it is "o… ▽ More

    Submitted 8 December, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted as a conference paper for SIGMOD 2023

  8. arXiv:2206.10435  [pdf

    cs.DB cs.AI

    Graphical Join: A New Physical Join Algorithm for RDBMSs

    Authors: Ali Mohammadi Shanghooshabad, Peter Triantafillou

    Abstract: Join operations (especially n-way, many-to-many joins) are known to be time- and resource-consuming. At large scales, with respect to table and join-result sizes, current state of the art approaches (including both binary-join plans which use Nested-loop/Hash/Sort-merge Join algorithms or, alternatively, worst-case optimal join algorithms (WOJAs)), may even fail to produce any answer given reasona… ▽ More

    Submitted 22 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: 13 pages

  9. arXiv:2206.10434  [pdf

    cs.DB cs.LG

    Model Joins: Enabling Analytics Over Joins of Absent Big Tables

    Authors: Ali Mohammadi Shanghooshabad, Peter Triantafillou

    Abstract: This work is motivated by two key facts. First, it is highly desirable to be able to learn and perform knowledge discovery and analytics (LKD) tasks without the need to access raw-data tables. This may be due to organizations finding it increasingly frustrating and costly to manage and maintain ever-growing tables, or for privacy reasons. Hence, compact models can be developed from the raw data an… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: 12 pages

  10. arXiv:2201.02670  [pdf, other

    cs.DB cs.DS

    Weighted Random Sampling over Joins

    Authors: Michael Shekelyan, Graham Cormode, Peter Triantafillou, Ali Shanghooshabad, Qingzhi Ma

    Abstract: Joining records with all other records that meet a linkage condition can result in an astronomically large number of combinations due to many-to-many relationships. For such challenging (acyclic) joins, a random sample over the join result is a practical alternative to working with the oversized join result. Whereas prior works are limited to uniform join sampling where each join row is assigned t… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: 14 pages, 12 figures, 6 tables

  11. arXiv:2003.06613  [pdf, other

    cs.DB

    ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

    Authors: Fotis Savva, Christos Anagnostopoulos, Peter Triantafillou

    Abstract: As more and more organizations rely on data-driven decision making, large-scale analytics become increasingly important. However, an analyst is often stuck waiting for an exact result. As such, organizations turn to Cloud providers that have infrastructure for efficiently analyzing large quantities of data. But, with increasing costs, organizations have to optimize their usage. Having a cheap alte… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

  12. Adaptive Learning of Aggregate Analytics under Dynamic Workloads

    Authors: Fotis Savva, Christos Anagnostopoulos, Peter Triantafillou

    Abstract: Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very costly in terms of query response time, resource consumption, and money in cloud deployments, especially when base data are stored across geographically distributed… ▽ More

    Submitted 14 March, 2020; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: 12 pages, 9 figures

  13. Explaining Aggregates for Exploratory Analytics

    Authors: Fotis Savva, Christos Anagnostopoulos, Peter Triantafillou

    Abstract: Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information abou… ▽ More

    Submitted 12 March, 2020; v1 submitted 29 December, 2018; originally announced December 2018.

    Comments: 13 pages

  14. arXiv:1201.2766  [pdf, ps, other

    cs.DB

    ART : Sub-Logarithmic Decentralized Range Query Processing with Probabilistic Guarantees

    Authors: Spyros Sioutas, Peter Triantafillou, George Papaloukopoulos, Evangelos Sakkopoulos, Kostas Tsichlas, Yannis Manolopoulos

    Abstract: We focus on range query processing on large-scale, typically distributed infrastructures, such as clouds of thousands of nodes of shared-datacenters, of p2p distributed overlays, etc. In such distributed environments, efficient range query processing is the key for managing the distributed data sets per se, and for monitoring the infrastructure's resources. We wish to develop an architecture that… ▽ More

    Submitted 13 January, 2012; originally announced January 2012.

    Comments: Submitted to Distributed and Parallel Databases (DAPD) Journal, Springer