Skip to main content

Showing 1–26 of 26 results for author: Meliou, A

.
  1. arXiv:2307.02860  [pdf, other

    cs.DB

    Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

    Authors: Anh L. Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J. Haas, Alexandra Meliou

    Abstract: A package query returns a package - a multiset of tuples - that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important fi… ▽ More

    Submitted 14 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

  2. arXiv:2303.17566  [pdf, other

    cs.LG cs.DB

    Non-Invasive Fairness in Learning through the Lens of Data Drift

    Authors: Ke Yang, Alexandra Meliou

    Abstract: Machine Learning (ML) models are widely employed to drive many modern data systems. While they are undeniably powerful tools, ML models often demonstrate imbalanced performance and unfair behaviors. The root of this problem often lies in the fact that different subpopulations commonly display divergent trends: as a learning algorithm tries to identify trends in the data, it naturally favors the tr… ▽ More

    Submitted 9 August, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  3. arXiv:2201.06678  [pdf, other

    cs.DS

    Improved Approximation and Scalability for Fair Max-Min Diversification

    Authors: Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, Zafeiria Moumoulidou

    Abstract: Given an $n$-point metric space $(\mathcal{X},d)$ where each point belongs to one of $m=O(1)$ different categories or groups and a set of integers $k_1, \ldots, k_m$, the fair Max-Min diversification problem is to select $k_i$ points belonging to category $i\in [m]$, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: To appear in ICDT 2022

  4. arXiv:2105.06058  [pdf, other

    cs.DB

    DataExposer: Exposing Disconnect between Data and Systems

    Authors: Sainyam Galhotra, Anna Fariha, Raoni Lourenço, Juliana Freire, Alexandra Meliou, Divesh Srivastava

    Abstract: As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of the data. For example, a health-monitoring system that is designed under the assumption that weight is reported in imperial units (lbs) will malfunction when encountering weight reported in metric units (kilograms). Similar to software debuggi… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  5. Stochastic Package Queries in Probabilistic Databases

    Authors: Matteo Brucato, Nishant Yadav, Azza Abouzied, Peter J. Haas, Alexandra Meliou

    Abstract: We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a package (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall cost function; in most real-world problems, the data is uncertain. We provide methods for specifying -- via a SQL extension -- and processi… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Journal ref: SIGMOD 2020

  6. arXiv:2101.07361  [pdf, other

    cs.LG cs.CY cs.DB

    Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification

    Authors: Maliha Tashfia Islam, Anna Fariha, Alexandra Meliou, Babak Salimi

    Abstract: Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area.… ▽ More

    Submitted 9 April, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Technical report of SIGMOD 2022 paper

  7. arXiv:2012.14800  [pdf, other

    cs.HC cs.DB

    Example-Driven User Intent Discovery: Empowering Users to Cross the SQL Barrier Through Query by Example

    Authors: Anna Fariha, Lucy Cousins, Narges Mahyar, Alexandra Meliou

    Abstract: Traditional data systems require specialized technical skills where users need to understand the data organization and write precise queries to access data. Therefore, novice users who lack technical expertise face hurdles in perusing and analyzing data. Existing tools assist in formulating queries through keyword search, query recommendation, and query auto-completion, but still require some tech… ▽ More

    Submitted 2 January, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

  8. arXiv:2010.09141  [pdf, other

    cs.DS

    Diverse Data Selection under Fairness Constraints

    Authors: Zafeiria Moumoulidou, Andrew McGregor, Alexandra Meliou

    Abstract: Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe $U$ of $n$ elements that can… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

  9. Causality-Guided Adaptive Interventional Debugging

    Authors: Anna Fariha, Suman Nath, Alexandra Meliou

    Abstract: Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and gr… ▽ More

    Submitted 9 April, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: Technical report of AID (SIGMOD 2020)

  10. arXiv:2003.01289  [pdf, other

    cs.DB

    Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems

    Authors: Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani, Alexandra Meliou

    Abstract: The reliability and proper function of data-driven applications hinge on the data's continued conformance to the applications' initial design. When data deviates from this initial profile, system behavior becomes unpredictable. Data profiling techniques such as functional dependencies and denial constraints encode patterns in the data that can be used to detect deviations. But traditional methods… ▽ More

    Submitted 4 January, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: * Technical report for the conference paper to appear in SIGMOD 2021 * An earlier version of this paper had a different title: "Data Invariants: On Trust in Data-Driven Systems"

  11. arXiv:1907.01129  [pdf, other

    cs.DB cs.CC

    New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins

    Authors: Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou

    Abstract: The resilience of a Boolean query is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for… ▽ More

    Submitted 15 June, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: 23 pages, 19 figures, included a new section

  12. arXiv:1906.10322  [pdf, other

    cs.DB

    Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity

    Authors: Anna Fariha, Alexandra Meliou

    Abstract: Traditional relational data interfaces require precise structured queries over potentially complex schemas. These rigid data retrieval mechanisms pose hurdles for non-expert users, who typically lack language expertise and are unfamiliar with the details of the schema. Query by Example (QBE) methods offer an alternative mechanism: users provide examples of their intended query output and the QBE s… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: SQuID Technical Report, 18 pages. [PVLDB 2019, Volume 12, No 10]

  13. arXiv:1903.09246  [pdf, other

    cs.DB

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Authors: Xiaolan Wang, Alexandra Meliou

    Abstract: Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But th… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  14. Causal Testing: Finding Defects' Root Causes

    Authors: Brittany Johnson, Yuriy Brun, Alexandra Meliou

    Abstract: Understanding the root cause of a defect is critical to isolating and repairing buggy behavior. We present Causal Testing, a new method of root-cause analysis that relies on the theory of counterfactual causality to identify a set of executions that likely hold key causal information necessary to understand and repair buggy behavior. Using the Defects4J benchmark, we find that Causal Testing could… ▽ More

    Submitted 18 February, 2020; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: in Proceedings of the 42nd International Conference on Software Engineering (ICSE), 2020

  15. arXiv:1709.03221  [pdf, other

    cs.SE cs.AI cs.CY cs.DB cs.LG

    Fairness Testing: Testing Software for Discrimination

    Authors: Sainyam Galhotra, Yuriy Brun, Alexandra Meliou

    Abstract: This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotio… ▽ More

    Submitted 10 September, 2017; originally announced September 2017.

    Comments: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Paderborn, Germany, September 4-8, 2017 (ESEC/FSE'17). https://doi.org/10.1145/3106237.3106277, ESEC/FSE, 2017

  16. arXiv:1609.02104  [pdf, other

    cs.DB cs.DC

    A Consumer-Centric Market for Database Computation in the Cloud

    Authors: Yue Wang, Alexandra Meliou, Gerome Miklau

    Abstract: The availability of public computing resources in the cloud has revolutionized data analysis, but requesting cloud resources often involves complex decisions for consumers. Under the current pricing mechanisms, cloud service providers offer several service options and charge consumers based on the resources they use. Before they can decide which cloud resources to request, consumers have to estima… ▽ More

    Submitted 16 June, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

  17. arXiv:1601.07539  [pdf, other

    cs.DB

    QFix: Diagnosing errors through query histories

    Authors: Xiaolan Wang, Alexandra Meliou, Eugene Wu

    Abstract: Data-driven applications rely on the correctness of their data to function properly and effectively. Errors in data can be incredibly costly and disruptive, leading to loss of revenue, incorrect conclusions, and misguided policy decisions. While data cleaning tools can purge datasets of many errors before the data is used, applications and users interacting with the data can introduce new errors.… ▽ More

    Submitted 11 February, 2016; v1 submitted 27 January, 2016; originally announced January 2016.

  18. arXiv:1512.03564  [pdf, other

    cs.DB

    Scalable Package Queries in Relational Database Systems

    Authors: Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, Alexandra Meliou

    Abstract: Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this… ▽ More

    Submitted 15 December, 2015; v1 submitted 11 December, 2015; originally announced December 2015.

    Comments: Extended version of PVLDB 2016 submission

  19. arXiv:1507.00942  [pdf, other

    cs.DB

    PackageBuilder: From Tuples to Packages

    Authors: Matteo Brucato, Rahul Ramakrishna, Azza Abouzied, Alexandra Meliou

    Abstract: In this demo, we present PackageBuilder, a system that extends database systems to support package queries. A package is a collection of tuples that individually satisfy base constraints and collectively satisfy global constraints. The need for package support arises in a variety of scenarios: For example, in the creation of meal plans, users are not only interested in the nutritional content of i… ▽ More

    Submitted 3 July, 2015; originally announced July 2015.

    Journal ref: PVLDB, vol. 7, no. 13, 2014, pp. 1593-1596

  20. arXiv:1507.00819  [pdf, other

    cs.DB

    Improving package recommendations through query relaxation

    Authors: Matteo Brucato, Azza Abouzied, Alexandra Meliou

    Abstract: Recommendation systems aim to identify items that are likely to be of interest to users. In many cases, users are interested in package recommendations as collections of items. For example, a dietitian may wish to derive a dietary plan as a collection of recipes that is nutritionally balanced, and a travel agent may want to produce a vacation package as a coordinated collection of travel and hotel… ▽ More

    Submitted 3 July, 2015; originally announced July 2015.

    Journal ref: Matteo Brucato, Azza Abouzied, and Alexandra Meliou. Improving Package Recommendations Through Query Relaxation. In Proceedings of the 1st International DATA4U Workshop, in conjunction with VLDB, 2014

  21. arXiv:1507.00674  [pdf, other

    cs.DB cs.CC

    A Characterization of the Complexity of Resilience and Responsibility for Self-join-free Conjunctive Queries

    Authors: Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou

    Abstract: Several research thrusts in the area of data management have focused on understanding how changes in the data affect the output of a view or standing query. Example applications are explaining query results, propagating updates through views, and anonymizing datasets. These applications usually rely on understanding how interventions in a database impact the output of a query. An important aspect… ▽ More

    Submitted 2 July, 2015; originally announced July 2015.

    Comments: 36 pages, 13 figures

  22. arXiv:1503.00306  [pdf, other

    cs.DB

    Fusing Data with Correlations

    Authors: Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, Divesh Srivastava

    Abstract: Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown… ▽ More

    Submitted 1 March, 2015; originally announced March 2015.

    Comments: Sigmod'2014

  23. arXiv:1105.4395  [pdf, other

    cs.DB

    Default-all is dangerous!

    Authors: Wolfgang Gatterbauer, Alexandra Meliou, Dan Suciu

    Abstract: We show that the default-all propagation scheme for database annotations is dangerous. Dangerous here means that it can propagate annotations to the query output which are semantically irrelevant to the query the user asked. This is the result of considering all relationally equivalent queries and returning the union of their where-provenance in an attempt to define a propagation scheme that is in… ▽ More

    Submitted 22 May, 2011; originally announced May 2011.

    Comments: 4 pages, 6 figures, preprint of paper appearing in Proceedings of TaPP '11 (3rd USENIX Workshop on the Theory and Practice of Provenance); for details see the project page: http://db.cs.washington.edu/causality/

  24. arXiv:1009.2021  [pdf, other

    cs.DB cs.AI

    The Complexity of Causality and Responsibility for Query Answers and non-Answers

    Authors: Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, Dan Suciu

    Abstract: An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Choc… ▽ More

    Submitted 29 September, 2011; v1 submitted 10 September, 2010; originally announced September 2010.

    Comments: 15 pages, 12 figures, PVLDB 2011

  25. arXiv:1007.3781  [pdf, other

    cs.DB

    Multiresolution Cube Estimators for Sensor Network Aggregate Queries

    Authors: Alexandra Meliou, Carlos Guestrin, Joseph M. Hellerstein

    Abstract: In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted techniques for their handling. Our approach constructs and maintains multi-resolution cube hierarchies inside the network, which can be constructed in a distributed fashion. In case of failures, recovery can also be perfo… ▽ More

    Submitted 21 July, 2010; originally announced July 2010.

    Comments: 14 pages, 8 figures, IV Alberto Mendelzon Workshop on Foundations of Data Management

    Journal ref: IV Alberto Mendelzon Workshop on Foundations of Data Management, 2010

  26. arXiv:0912.5340  [pdf, other

    cs.DB cs.AI

    Why so? or Why no? Functional Causality for Explaining Query Answers

    Authors: Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, Dan Suciu

    Abstract: In this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed approaches of provenance and missing query result explanations. We develop our framework starting from the well-studied definition of actual causes by Halpern and Pearl. After identifying some undesirable characteristics of the origina… ▽ More

    Submitted 29 December, 2009; originally announced December 2009.

    Comments: 18 pages, 15 figures

    Report number: University of Washington CSE Technical Report 09-12-01 ACM Class: H.2.1