Faster Algorithms for Fair Max-Min Diversification in $\mathbb{R}^d$

Kurkure, Yash; Shamo, Miles; Wiseman, Joseph; Galhotra, Sainyam; Sintos, Stavros

doi:10.1145/3654940

Computer Science > Databases

arXiv:2404.04713 (cs)

[Submitted on 6 Apr 2024 (v1), last revised 14 May 2024 (this version, v2)]

Title:Faster Algorithms for Fair Max-Min Diversification in $\mathbb{R}^d$

Authors:Yash Kurkure, Miles Shamo, Joseph Wiseman, Sainyam Galhotra, Stavros Sintos

View PDF HTML (experimental)

Abstract:The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size $k$ from a large collection of $n$ data points (FairDiv).
The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to $n$ or $k$) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution.

Subjects:	Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2404.04713 [cs.DB]
	(or arXiv:2404.04713v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2404.04713
Journal reference:	SIGMOD 2024
Related DOI:	https://doi.org/10.1145/3654940

Submission history

From: Stavros Sintos [view email]
[v1] Sat, 6 Apr 2024 19:25:00 UTC (14,427 KB)
[v2] Tue, 14 May 2024 07:14:29 UTC (14,427 KB)

Computer Science > Databases

Title:Faster Algorithms for Fair Max-Min Diversification in $\mathbb{R}^d$

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Faster Algorithms for Fair Max-Min Diversification in $\mathbb{R}^d$

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators