-
Too Much Information Kills Information: A Clustering Perspective
Authors:
Yicheng Xu,
Vincent Chau,
Chenchen Wu,
Yong Zhang,
Vassilis Zissimopoulos,
Yifei Zou
Abstract:
Clustering is one of the most fundamental tools in the artificial intelligence area, particularly in the pattern recognition and learning theory. In this paper, we propose a simple, but novel approach for variance-based k-clustering tasks, included in which is the widely known k-means clustering. The proposed approach picks a sampling subset from the given dataset and makes decisions based on the…
▽ More
Clustering is one of the most fundamental tools in the artificial intelligence area, particularly in the pattern recognition and learning theory. In this paper, we propose a simple, but novel approach for variance-based k-clustering tasks, included in which is the widely known k-means clustering. The proposed approach picks a sampling subset from the given dataset and makes decisions based on the data information in the subset only. With certain assumptions, the resulting clustering is provably good to estimate the optimum of the variance-based objective with high probability. Extensive experiments on synthetic datasets and real-world datasets show that to obtain competitive results compared with k-means method (Llyod 1982) and k-means++ method (Arthur and Vassilvitskii 2007), we only need 7% information of the dataset. If we have up to 15% information of the dataset, then our algorithm outperforms both the k-means method and k-means++ method in at least 80% of the clustering tasks, in terms of the quality of clustering. Also, an extended algorithm based on the same idea guarantees a balanced k-clustering result.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Improved Budgeted Connected Domination and Budgeted Edge-Vertex Domination
Authors:
Ioannis Lamprou,
Ioannis Sigalas,
Vassilis Zissimopoulos
Abstract:
We consider the \emph{Budgeted} version of the classical \emph{Connected Dominating Set} problem (BCDS). Given a graph $G$ and a budget $k$, we seek a connected subset of at most $k$ vertices maximizing the number of dominated vertices in $G$. We improve over the previous $(1-1/e)/13$ approximation in [Khuller, Purohit, and Sarpatwar,\ \emph{SODA 2014}] by introducing a new method for performing t…
▽ More
We consider the \emph{Budgeted} version of the classical \emph{Connected Dominating Set} problem (BCDS). Given a graph $G$ and a budget $k$, we seek a connected subset of at most $k$ vertices maximizing the number of dominated vertices in $G$. We improve over the previous $(1-1/e)/13$ approximation in [Khuller, Purohit, and Sarpatwar,\ \emph{SODA 2014}] by introducing a new method for performing tree decompositions in the analysis of the last part of the algorithm. This new approach provides a $(1-1/e)/12$ approximation guarantee. By generalizing the analysis of the first part of the algorithm, we are able to modify it appropriately and obtain a further improvement to $(1-e^{-7/8})/11$. On the other hand, we prove a $(1-1/e+ε)$ inapproximability bound, for any $ε> 0$.
We also examine the \emph{edge-vertex domination} variant, where an edge dominates its endpoints and all vertices neighboring them. In \emph{Budgeted Edge-Vertex Domination} (BEVD), we are given a graph $G$, and a budget $k$, and we seek a, not necessarily connected, subset of $k$ edges such that the number of dominated vertices in $G$ is maximized. We prove there exists a $(1-1/e)$-approximation algorithm. Also, for any $ε> 0$, we present a $(1-1/e+ε)$-inapproximability result by a gap-preserving reduction from the \emph{maximum coverage} problem. Finally, we examine the "dual" \emph{Partial Edge-Vertex Domination} (PEVD) problem, where a graph $G$ and a quota $n'$ are given. The goal is to select a minimum-size set of edges to dominate at least $n'$ vertices in $G$. In this case, we present a $H(n')$-approximation algorithm by a reduction to the \emph{partial cover} problem.
△ Less
Submitted 25 March, 2020; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Maximum Rooted Connected Expansion
Authors:
Ioannis Lamprou,
Russell Martin,
Sven Schewe,
Ioannis Sigalas,
Vassilis Zissimopoulos
Abstract:
Prefetching constitutes a valuable tool toward efficient Web surfing. As a result, estimating the amount of resources that need to be preloaded during a surfer's browsing becomes an important task. In this regard, prefetching can be modeled as a two-player combinatorial game [Fomin et al., Theoretical Computer Science 2014], where a surfer and a marker alternately play on a given graph (representi…
▽ More
Prefetching constitutes a valuable tool toward efficient Web surfing. As a result, estimating the amount of resources that need to be preloaded during a surfer's browsing becomes an important task. In this regard, prefetching can be modeled as a two-player combinatorial game [Fomin et al., Theoretical Computer Science 2014], where a surfer and a marker alternately play on a given graph (representing the Web graph). During its turn, the marker chooses a set of $k$ nodes to mark (prefetch), whereas the surfer, represented as a token resting on graph nodes, moves to a neighboring node (Web resource). The surfer's objective is to reach an unmarked node before all nodes become marked and the marker wins. Intuitively, since the surfer is step-by-step traversing a subset of nodes in the Web graph, a satisfactory prefetching procedure would load in cache all resources lying in the neighborhood of this growing subset.
Motivated by the above, we consider the following problem to which we refer to as the Maximum Rooted Connected Expansion (MRCE) problem. Given a graph $G$ and a root node $v_0$, we wish to find a subset of vertices $S$ such that $S$ is connected, $S$ contains $v_0$ and the ratio $|N[S]|/|S|$ is maximized, where $N[S]$ denotes the closed neighborhood of $S$, that is, $N[S]$ contains all nodes in $S$ and all nodes with at least one neighbor in $S$.
We prove that the problem is NP-hard even when the input graph $G$ is restricted to be a split graph. On the positive side, we demonstrate a polynomial time approximation scheme for split graphs. Furthermore, we present a $\frac{1}{6}(1-\frac{1}{e})$-approximation algorithm for general graphs based on techniques for the Budgeted Connected Domination problem [Khuller et al., SODA 2014]. Finally, we provide a polynomial-time algorithm for the special case of interval graphs.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Calibrations Scheduling Problem with Arbitrary Lengths and Activation Length
Authors:
Eric Angel,
Evripidis Bampis,
Vincent Chau,
Vassilis Zissimopoulos
Abstract:
Bender et al. (SPAA 2013) have proposed a theoretical framework for testing in contexts where safety mistakes must be avoided. Testing in such a context is made by machines that need to be often calibrated. Given that calibration costs, it is important to study policies minimizing the calibration cost while performing all the necessary tests. We focus on the single-machine setting and we extend th…
▽ More
Bender et al. (SPAA 2013) have proposed a theoretical framework for testing in contexts where safety mistakes must be avoided. Testing in such a context is made by machines that need to be often calibrated. Given that calibration costs, it is important to study policies minimizing the calibration cost while performing all the necessary tests. We focus on the single-machine setting and we extend the model proposed by Bender et al. by considering that the jobs have arbitrary processing times and that the preemption of jobs is allowed. For this case, we propose an optimal polynomial time algorithm. Then, we study the case where there are several types of calibrations with different lengths and costs. We first prove that the problem becomes NP-hard for arbitrary processing times even when the preemption of the jobs is allowed. Finally, we focus on the case of unit-time jobs and we show that a more general problem, where the recalibration of the machine is not instantaneous but takes time, can be solved in polynomial time.
△ Less
Submitted 4 February, 2020; v1 submitted 10 July, 2015;
originally announced July 2015.
-
Optimal Data Placement on Networks With Constant Number of Clients
Authors:
Eric Angel,
Evripidis Bampis,
Gerasimos G. Pollatos,
Vassilis Zissimopoulos
Abstract:
We introduce optimal algorithms for the problems of data placement (DP) and page placement (PP) in networks with a constant number of clients each of which has limited storage availability and issues requests for data objects. The objective for both problems is to efficiently utilize each client's storage (deciding where to place replicas of objects) so that the total incurred access and installat…
▽ More
We introduce optimal algorithms for the problems of data placement (DP) and page placement (PP) in networks with a constant number of clients each of which has limited storage availability and issues requests for data objects. The objective for both problems is to efficiently utilize each client's storage (deciding where to place replicas of objects) so that the total incurred access and installation cost over all clients is minimized. In the PP problem an extra constraint on the maximum number of clients served by a single client must be satisfied. Our algorithms solve both problems optimally when all objects have uniform lengths. When objects lengths are non-uniform we also find the optimal solution, albeit a small, asymptotically tight violation of each client's storage size by $ε$lmax where lmax is the maximum length of the objects and $ε$ some arbitrarily small positive constant. We make no assumption on the underlying topology of the network (metric, ultrametric etc.), thus obtaining the first non-trivial results for non-metric data placement problems.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.