Skip to main content

Showing 1–25 of 25 results for author: Jiang, H -

.
  1. Moderate Dimension Reduction for $k$-Center Clustering

    Authors: Shaofeng H. -C. Jiang, Robert Krauthgamer, Shay Sapir

    Abstract: The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $ε>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+ε$. Perhaps surpr… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: 23 pages, appeared in SoCG 2024. Minor corrections in page 8 and in section 5

  2. arXiv:2311.18365  [pdf, ps, other

    cs.CG

    Fully Dynamic Algorithms for Euclidean Steiner Tree

    Authors: T-H. Hubert Chan, Gramoz Goranci, Shaofeng H. -C. Jiang, Bo Wang, Quan Xue

    Abstract: The Euclidean Steiner tree problem asks to find a min-cost metric graph that connects a given set of \emph{terminal} points $X$ in $\mathbb{R}^d$, possibly using points not in $X$ which are called Steiner points. Even though near-linear time $(1 + ε)$-approximation was obtained in the offline setting in seminal works of Arora and Mitchell, efficient dynamic algorithms for Steiner tree is still ope… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  3. arXiv:2307.07848  [pdf, ps, other

    cs.DS cs.DC

    Fully Scalable MPC Algorithms for Clustering in High Dimension

    Authors: Artur Czumaj, Guichen Gao, Shaofeng H. -C. Jiang, Robert Krauthgamer, Pavel Veselý

    Abstract: We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be $n^σ$ for arbitrarily small fixed $σ>0$. Importantly, the local memory may be substantially smaller than the number of clusters $k$, yet all our algorithms are fast… ▽ More

    Submitted 6 July, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

  4. arXiv:2306.02826  [pdf, ps, other

    quant-ph cs.AI cs.DS cs.LG stat.ML

    Near-Optimal Quantum Coreset Construction Algorithms for Clustering

    Authors: Yecheng Xue, Xiaoyu Chen, Tongyang Li, Shaofeng H. -C. Jiang

    Abstract: $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Comments: 32 pages, 0 figures, 1 table. To appear in the Fortieth International Conference on Machine Learning (ICML 2023)

  5. arXiv:2304.01623  [pdf, other

    cs.DS

    Algorithms for the Generalized Poset Sorting Problem

    Authors: Shaofeng H. -C. Jiang, Wenqian Wang, Yubo Zhang, Yuhao Zhang

    Abstract: We consider a generalized poset sorting problem (GPS), in which we are given a query graph $G = (V, E)$ and an unknown poset $\mathcal{P}(V, \prec)$ that is defined on the same vertex set $V$, and the goal is to make as few queries as possible to edges in $G$ in order to fully recover $\mathcal{P}$, where each query $(u, v)$ returns the relation between $u, v$, i.e., $u \prec v$, $v \prec u$ or… ▽ More

    Submitted 15 July, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  6. arXiv:2302.11339  [pdf, other

    cs.DS

    The Power of Uniform Sampling for $k$-Median

    Authors: Lingxiao Huang, Shaofeng H. -C. Jiang, Jianing Lou

    Abstract: We study the power of uniform sampling for $k$-Median in various metric spaces. We relate the query complexity for approximating $k$-Median, to a key parameter of the dataset, called the balancedness $β\in (0, 1]$ (with $1$ being perfectly balanced). We show that any algorithm must make $Ω(1 / β)$ queries to the point set in order to achieve $O(1)$-approximation for $k$-Median. This particularly i… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  7. arXiv:2211.05293  [pdf, ps, other

    cs.DS

    Streaming Euclidean Max-Cut: Dimension vs Data Reduction

    Authors: Xiaoyu Chen, Shaofeng H. -C. Jiang, Robert Krauthgamer

    Abstract: Max-Cut is a fundamental problem that has been studied extensively in various settings. We design an algorithm for Euclidean Max-Cut, where the input is a set of points in $\mathbb{R}^d$, in the model of dynamic geometric streams, where the input $X\subseteq [Δ]^d$ is presented as a sequence of point insertions and deletions. Previously, Frahling and Sohler [STOC 2005] designed a $(1+ε)$-approxima… ▽ More

    Submitted 29 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  8. arXiv:2210.10394  [pdf, other

    cs.DS

    Near-optimal Coresets for Robust Clustering

    Authors: Lingxiao Huang, Shaofeng H. -C. Jiang, Jianing Lou, Xuan Wu

    Abstract: We consider robust clustering problems in $\mathbb{R}^d$, specifically $k$-clustering problems (e.g., $k$-Median and $k$-Means with $m$ outliers, where the cost for a given center set $C \subset \mathbb{R}^d$ aggregates the distances from $C$ to all but the furthest $m$ data points, instead of all points as in classical clustering. We focus on the $ε$-coreset for robust clustering, a small proxy o… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  9. arXiv:2210.00244  [pdf, other

    cs.LG cs.DS

    On The Relative Error of Random Fourier Features for Preserving Kernel Distance

    Authors: Kuan Cheng, Shaofeng H. -C. Jiang, Luojian Wei, Zhide Wei

    Abstract: The method of random Fourier features (RFF), proposed in a seminal paper by Rahimi and Recht (NIPS'07), is a powerful technique to find approximate low-dimensional representations of points in (high-dimensional) kernel space, for shift-invariant kernels. While RFF has been analyzed under various notions of error guarantee, the ability to preserve the kernel distance with \emph{relative} error is l… ▽ More

    Submitted 13 April, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

  10. arXiv:2209.01901  [pdf, ps, other

    cs.DS

    The Power of Uniform Sampling for Coresets

    Authors: Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

    Abstract: Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive er… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

  11. arXiv:2204.02095  [pdf, other

    cs.DS

    Streaming Facility Location in High Dimension via Geometric Hashing

    Authors: Artur Czumaj, Arnold Filtser, Shaofeng H. -C. Jiang, Robert Krauthgamer, Pavel Veselý, Mingwei Yang

    Abstract: In Euclidean Uniform Facility Location (UFL), the input is a set of clients in $\mathbb{R}^d$ and the goal is to place facilities to serve them, so as to minimize the total cost of opening facilities plus connecting the clients. We study the setting of dynamic geometric streams, where the clients are presented as a sequence of insertions and deletions of points in the grid $\{1,\ldots,Δ\}^d$, and… ▽ More

    Submitted 28 January, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: The abstract is shortened to meet the length constraint of arXiv

  12. arXiv:2111.06422  [pdf, ps, other

    cond-mat.supr-con cond-mat.str-el

    Enhanced charge density wave with mobile superconducting vortices in La$_{1.885}$Sr$_{0.115}$CuO$_4$

    Authors: J. -J. Wen, W. He, H. Jang, H. Nojiri, S. Matsuzawa, S. Song, M. Chollet, D. Zhu, Y. -J. Liu, M. Fujita, J. M. Jiang, C. R. Rotundu, C. -C. Kao, H. -C. Jiang, J. -S. Lee, Y. S. Lee

    Abstract: Superconductivity in the cuprates is found to be intertwined with charge and spin density waves. Determining the interactions between the different types of order is crucial for understanding these important materials. Here, we elucidate the role of the charge density wave (CDW) in the prototypical cuprate La$_{1.885}$Sr$_{0.115}$CuO$_4$, by studying the effects of large magnetic fields ($H$) up t… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: 7 pages, 3 figures

  13. arXiv:2110.08840  [pdf, other

    cs.DS

    Online Facility Location with Predictions

    Authors: Shaofeng H. -C. Jiang, Erzhi Liu, You Lyu, Zhihao Gavin Tang, Yubo Zhang

    Abstract: We provide nearly optimal algorithms for online facility location (OFL) with predictions. In OFL, $n$ demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities plus the facility opening cost. We further assume the algorithm is addit… ▽ More

    Submitted 5 August, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: Updated the comparison to a previous work

  14. arXiv:2110.02898  [pdf, other

    cs.DS

    Coresets for Kernel Clustering

    Authors: Shaofeng H. -C. Jiang, Robert Krauthgamer, Jianing Lou, Yubo Zhang

    Abstract: We devise coresets for kernel $k$-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel $k$-Means has superior clustering capability compared to classical $k$-Means, particularly when clusters are non-linearly separable, but it also introduces significant computational challenges. We address this computational issue by constructing a coreset, which is a reduce… ▽ More

    Submitted 6 April, 2024; v1 submitted 6 October, 2021; originally announced October 2021.

  15. arXiv:2106.16112  [pdf, other

    cs.DS

    Coresets for Clustering with Missing Values

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like $k$-Means, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $ε$-coreset of a large datas… ▽ More

    Submitted 11 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

  16. arXiv:2011.04324  [pdf, other

    cs.DS

    Streaming Algorithms for Geometric Steiner Forest

    Authors: Artur Czumaj, Shaofeng H. -C. Jiang, Robert Krauthgamer, Pavel Veselý

    Abstract: We consider an important generalization of the Steiner tree problem, the \emph{Steiner forest problem}, in the Euclidean plane: the input is a multiset $X \subseteq \mathbb{R}^2$, partitioned into $k$ color classes $C_1, C_2, \ldots, C_k \subseteq X$. The goal is to find a minimum-cost Euclidean graph $G$ such that every color class $C_i$ is connected in $G$. We study this Steiner forest problem i… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 November, 2020; originally announced November 2020.

  17. arXiv:2004.07718  [pdf, ps, other

    cs.DS

    Coresets for Clustering in Excluded-minor Graphs and Beyond

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on $k$, $ε$ and the excluded-minor… ▽ More

    Submitted 15 July, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

  18. arXiv:1907.04733  [pdf, other

    cs.DS

    Coresets for Clustering in Graphs of Bounded Treewidth

    Authors: Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems are essential to data analysis and used for example in road networks and data visualization. A coreset is a compact summary of the data that approximately preserves the clustering objective for every possible center set, and it offers significant effic… ▽ More

    Submitted 12 December, 2022; v1 submitted 10 July, 2019; originally announced July 2019.

  19. arXiv:1906.08484  [pdf, other

    cs.DS cs.CG cs.LG stat.ML

    Coresets for Clustering with Fairness Constraints

    Authors: Lingxiao Huang, Shaofeng H. -C. Jiang, Nisheeth K. Vishnoi

    Abstract: In a recent work, [19] studied the following "fair" variants of classical clustering problems such as $k$-means and $k$-median: given a set of $n$ data points in $\mathbb{R}^d$ and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focuse… ▽ More

    Submitted 17 December, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

  20. arXiv:1903.04351  [pdf, other

    cs.DS

    Coresets for Ordered Weighted Clustering

    Authors: Vladimir Braverman, Shaofeng H. -C. Jiang, Robert Krauthgamer, Xuan Wu

    Abstract: We design coresets for Ordered k-Median, a generalization of classical clustering problems such as k-Median and k-Center, that offers a more flexible data analysis, like easily combining multiple objectives (e.g., to increase fairness or for Pareto optimization). Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted acco… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: 23 pages, 3 figures, 2 tables

  21. arXiv:1804.02530  [pdf, other

    cs.DS

    $\varepsilon$-Coresets for Clustering (with Outliers) in Doubling Metrics

    Authors: Lingxiao Huang, Shaofeng H. -C. Jiang, Jian Li, Xuan Wu

    Abstract: We study the problem of constructing $\varepsilon$-coresets for the $(k, z)$-clustering problem in a doubling metric $M(X, d)$. An $\varepsilon$-coreset is a weighted subset $S\subseteq X$ with weight function $w : S \rightarrow \mathbb{R}_{\geq 0}$, such that for any $k$-subset $C \in [X]^k$, it holds that… ▽ More

    Submitted 18 August, 2018; v1 submitted 7 April, 2018; originally announced April 2018.

    Comments: Appeared in FOCS 2018, this is the full version

  22. arXiv:1710.07774  [pdf, other

    cs.DS

    A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

    Authors: T-H. Hubert Chan, Haotian Jiang, Shaofeng H. -C. Jiang

    Abstract: We present a unified polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due… ▽ More

    Submitted 20 June, 2018; v1 submitted 21 October, 2017; originally announced October 2017.

    Comments: Appeared in ESA 2018. This is the full version

  23. arXiv:1706.06922  [pdf, other

    cs.DM

    Online Submodular Maximization Problem with Vector Packing Constraint

    Authors: T-H. Hubert Chan, Shaofeng H. -C. Jiang, Zhihao Gavin Tang, Xiaowei Wu

    Abstract: We consider the online vector packing problem in which we have a $d$ dimensional knapsack and items $u$ with weight vectors $\mathbf{w}_u \in \mathbb{R}_+^d$ arrive online in an arbitrary order. Upon the arrival of an item, the algorithm must decide immediately whether to discard or accept the item into the knapsack. When item $u$ is accepted, $\mathbf{w}_u(i)$ units of capacity on dimension $i$ w… ▽ More

    Submitted 21 June, 2017; originally announced June 2017.

    Comments: The conference version of this paper appears in ESA 2017

  24. arXiv:1610.07770  [pdf, ps, other

    cs.DM cs.DS

    Online Submodular Maximization with Free Disposal: Randomization Beats 0.25 for Partition Matroids

    Authors: T-H. Hubert Chan, Zhiyi Huang, Shaofeng H. -C. Jiang, Ning Kang, Zhihao Gavin Tang

    Abstract: We study the online submodular maximization problem with free disposal under a matroid constraint. Elements from some ground set arrive one by one in rounds, and the algorithm maintains a feasible set that is independent in the underlying matroid. In each round when a new element arrives, the algorithm may accept the new element into its feasible set and possibly remove elements from it, provided… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

  25. arXiv:1608.06325  [pdf, ps, other

    cs.DS

    A PTAS for the Steiner Forest Problem in Doubling Metrics

    Authors: T-H. Hubert Chan, Shuguang Hu, Shaofeng H. -C. Jiang

    Abstract: We achieve a (randomized) polynomial-time approximation scheme (PTAS) for the Steiner Forest Problem in doubling metrics. Before our work, a PTAS is given only for the Euclidean plane in [FOCS 2008: Borradaile, Klein and Mathieu]. Our PTAS also shares similarities with the dynamic programming for sparse instances used in [STOC 2012: Bartal, Gottlieb and Krauthgamer] and [SODA 2016: Chan and Jiang]… ▽ More

    Submitted 22 August, 2016; originally announced August 2016.