Reptile: Aggregation-level Explanations for Hierarchical Data

Huang, Zezhou; Wu, Eugene

Computer Science > Databases

arXiv:2103.07037 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 12 Mar 2021]

Title:Reptile: Aggregation-level Explanations for Hierarchical Data

Authors:Zezhou Huang, Eugene Wu

View PDF

Abstract:Recent query explanation systems help users understand anomalies in aggregation results by proposing predicates that describe input records that, if deleted, would resolve the anomalies. However, it can be difficult for users to understand how a predicate was chosen, and these approaches are limited to errors that can be resolved through deletion. In contrast, data errors may be due to group-wise errors, such as missing records or systematic value errors. This paper presents Reptile, an explanation system for hierarchical data. Given an anomalous aggregate query result, Reptile recommends the next drill-down attribute,and ranks the drill-down groups based on the extent repairing the group's statistics to its expected values resolves the anomaly. Reptile efficiently trains a multi-level model that leverages the data's hierarchy to estimate the expected values, and uses a factorised representation of the feature matrix to remove redundancies due to the data's hierarchical structure. We further extend model training to support factorised data, and develop a suite of optimizations that leverage the data's hierarchical structure. Reptile reduces end-to-end runtimes by more than 6 times compared to a Matlab-based implementation, correctly identifies 21/30 data errors in John Hopkin's COVID-19 data, and correctly resolves 20/22 complaints in a user study using data and researchers from Columbia University's Financial Instruments Sector Team.

Subjects:	Databases (cs.DB)
Cite as:	arXiv:2103.07037 [cs.DB]
	(or arXiv:2103.07037v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2103.07037

Submission history

From: Zezhou Huang [view email]
[v1] Fri, 12 Mar 2021 01:53:45 UTC (12,194 KB)

Computer Science > Databases

Title:Reptile: Aggregation-level Explanations for Hierarchical Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Reptile: Aggregation-level Explanations for Hierarchical Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators