-
Beyond $1/2$-Approximation for Submodular Maximization on Massive Data Streams
Authors:
Ashkan Norouzi-Fard,
Jakub Tarnawski,
Slobodan Mitrović,
Amir Zandieh,
Aida Mousavifar,
Ola Svensson
Abstract:
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive…
▽ More
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a $0.5$-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor $0.5$, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than $0.5$-approximation when elements arrive in arbitrary order. Our experiments demonstrate that SALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.
△ Less
Submitted 6 August, 2018;
originally announced August 2018.
-
Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach
Authors:
Slobodan Mitrović,
Ilija Bogunovic,
Ashkan Norouzi-Fard,
Jakub Tarnawski,
Volkan Cevher
Abstract:
We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing…
▽ More
We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing thresholding rule. STAR-T makes one pass over the data and retains a short but robust summary. We show that after the removal of any m elements from the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining elements achieves a constant-factor approximation guarantee. In two different data summarization tasks, we demonstrate that it matches or outperforms existing greedy and streaming methods, even if they are allowed the benefit of knowing the removed subset in advance.
△ Less
Submitted 7 November, 2017;
originally announced November 2017.
-
Robust Submodular Maximization: A Non-Uniform Partitioning Approach
Authors:
Ilija Bogunovic,
Slobodan Mitrović,
Jonathan Scarlett,
Volkan Cevher
Abstract:
We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $τ$ from the returned set may be removed. We focus on the worst-case setting considered in (Orlin et al., 2016), in which a constant-factor approximation guarantee was given for $τ= o(\sqrt{k})$. In this paper, we solve a key open problem raised the…
▽ More
We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $τ$ from the returned set may be removed. We focus on the worst-case setting considered in (Orlin et al., 2016), in which a constant-factor approximation guarantee was given for $τ= o(\sqrt{k})$. In this paper, we solve a key open problem raised therein, presenting a new Partitioned Robust (PRo) submodular maximization algorithm that achieves the same guarantee for more general $τ= o(k)$. Our algorithm constructs partitions consisting of buckets with exponentially increasing sizes, and applies standard submodular optimization subroutines on the buckets in order to construct the robust solution. We numerically demonstrate the performance of PRo in data summarization and influence maximization, demonstrating gains over both the greedy algorithm and the algorithm of (Orlin et al., 2016).
△ Less
Submitted 15 June, 2017;
originally announced June 2017.