Search | arXiv e-print repository

In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs

Authors: Grzegorz Kaszuba, Amirhossein D. Naghdi, Dario Massa, Stefanos Papanikolaou, Andrzej Jaszkiewicz, Piotr Sankowski

Abstract: Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for struct… ▽ More Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures

arXiv:2404.16267 [pdf, ps, other]

Dynamic PageRank: Algorithms and Lower Bounds

Authors: Rajesh Jayaram, Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, Piotr Sankowski

Abstract: We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $π\in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations. First, we establish matching… ▽ More We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $π\in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations. First, we establish matching lower and upper bounds for maintaining additive approximate PageRank in both incremental and decremental settings. In particular, we demonstrate that in the worst-case $(1/α)^{Θ(\log \log n)}$ update time is necessary and sufficient for this problem, where $α$ is the desired additive approximation. On the other hand, we demonstrate that the commonly employed ForwardPush approach performs substantially worse than this optimal runtime. Specifically, we show that ForwardPush requires $Ω(n^{1-δ})$ time per update on average, for any $δ> 0$, even in the incremental setting. For multiplicative approximations, however, we demonstrate that the situation is significantly more challenging. Specifically, we prove that any algorithm that explicitly maintains a constant factor multiplicative approximation of the PageRank vector of a directed graph must have amortized update time $Ω(n^{1-δ})$, for any $δ> 0$, even in the incremental setting, thereby resolving a 13-year old open question of Bahmani et al.~(VLDB 2010). This sharply contrasts with the undirected setting, where we show that $\rm{poly}\ \log n$ update time is feasible, even in the fully dynamic setting under oblivious adversary. △ Less

Submitted 16 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.09711 [pdf, other]

Online Multi-level Aggregation with Delays and Stochastic Arrivals

Authors: Mathieu Mari, Michał Pawłowski, Runtian Ren, Piotr Sankowski

Abstract: This paper presents a new research direction for online Multi-Level Aggregation (MLA) with delays. In this problem, we are given an edge-weighted rooted tree $T$, and we have to serve a sequence of requests arriving at its vertices in an online manner. Each request $r$ is characterized by two parameters: its arrival time $t(r)$ and location $l(r)$ (a vertex). Once a request $r$ arrives, we can eit… ▽ More This paper presents a new research direction for online Multi-Level Aggregation (MLA) with delays. In this problem, we are given an edge-weighted rooted tree $T$, and we have to serve a sequence of requests arriving at its vertices in an online manner. Each request $r$ is characterized by two parameters: its arrival time $t(r)$ and location $l(r)$ (a vertex). Once a request $r$ arrives, we can either serve it immediately or postpone this action until any time $t > t(r)$. We can serve several pending requests at the same time, and the service cost of a service corresponds to the weight of the subtree that contains all the requests served and the root of $T$. Postponing the service of a request $r$ to time $t > t(r)$ generates an additional delay cost of $t - t(r)$. The goal is to serve all requests in an online manner such that the total cost (i.e., the total sum of service and delay costs) is minimized. The current best algorithm for this problem achieves a competitive ratio of $O(d^2)$ (Azar and Touitou, FOCS'19), where $d$ denotes the depth of the tree. Here, we consider a stochastic version of MLA where the requests follow a Poisson arrival process. We present a deterministic online algorithm which achieves a constant ratio of expectations, meaning that the ratio between the expected costs of the solution generated by our algorithm and the optimal offline solution is bounded by a constant. Our algorithm is obtained by carefully combining two strategies. In the first one, we plan periodic oblivious visits to the subset of frequent vertices, whereas in the second one, we greedily serve the pending requests in the remaining vertices. This problem is complex enough to demonstrate a very rare phenomenon that ``single-minded" or ``sample-average" strategies are not enough in stochastic optimization. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 37 pages, 3 figures

arXiv:2404.03426 [pdf, other]

Accurate estimation of feature importance faithfulness for tree models

Authors: Mateusz Gajewski, Adam Karczmarz, Mateusz Rapicki, Piotr Sankowski

Abstract: In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been… ▽ More In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2402.07871 [pdf, other]

Scaling Laws for Fine-Grained Mixture of Experts

Authors: Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur

Abstract: Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling la… ▽ More Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling laws for fine-grained MoE, taking into account the number of training tokens, model size, and granularity. Leveraging these laws, we derive the optimal training configuration for a given computational budget. Our findings not only show that MoE models consistently outperform dense Transformers but also highlight that the efficiency gap between dense and MoE models widens as we scale up the model size and training budget. Furthermore, we demonstrate that the common practice of setting the size of experts in MoE to mirror the feed-forward layer is not optimal at almost any computational budget. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.09301 [pdf, other]

Material Informatics through Neural Networks on Ab-Initio Electron Charge Densities: the Role of Transfer Learning

Authors: Dario Massa, Stefanos Papanikolaou, Piotr Sankowski

Abstract: In this work, the dynamic realms of Materials Science and Computer Science advancements meet the critical challenge of identifying efficient descriptors capable of capturing the essential features of physical systems. Such task has remained formidable, with solutions often involving ad-hoc scalar and vectorial sets of materials properties, making optimization and transferability challenging. We ex… ▽ More In this work, the dynamic realms of Materials Science and Computer Science advancements meet the critical challenge of identifying efficient descriptors capable of capturing the essential features of physical systems. Such task has remained formidable, with solutions often involving ad-hoc scalar and vectorial sets of materials properties, making optimization and transferability challenging. We extract representations directly from ab-initio differential electron charge density profiles using Neural Networks, highlighting the pivotal role of transfer learning in such task. Firstly, we demonstrate significant improvements in regression of a specific defected-materials property with respect to training a deep network from scratch, both in terms of predictions and their reproducibilities, by considering various pre-trained models and selecting the optimal one after fine-tuning. The remarkable performances obtained confirmed the transferability of the existent pre-trained Convolutional Neural Networks (CNNs) on physics domain data, very different from the original training data. Secondly, we demonstrate a saturation in the regression capabilities of computer vision models towards properties of an extensive variety of undefected systems, and how it can be overcome with the help of large language model (LLM) transformers, with as little text information as composition names. Finally, we prove the insufficiency of open-models, like GPT-4, in achieving the analogous tasks and performances as the proposed domain-specific ones. The work offers a promising avenue for enhancing the effectiveness of descriptor identification in complex physical systems, shedding light over the power of transfer learning to easily adapt and combine available models, with different modalities, to the physics domain, at the same time opening space to a benchmark for LLMs capabilities in such domain. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.05834 [pdf, ps, other]

Modeling Online Paging in Multi-Core Systems

Authors: Mathieu Mari, Anish Mukherjee, Runtian Ren, Piotr Sankowski

Abstract: Web requests are growing exponentially since the 90s due to the rapid development of the Internet. This process was further accelerated by the introduction of cloud services. It has been observed statistically that memory or web requests generally follow power-law distribution, Breslau et al. INFOCOM'99. That is, the $i^{\text{th}}$ most popular web page is requested with a probability proportiona… ▽ More Web requests are growing exponentially since the 90s due to the rapid development of the Internet. This process was further accelerated by the introduction of cloud services. It has been observed statistically that memory or web requests generally follow power-law distribution, Breslau et al. INFOCOM'99. That is, the $i^{\text{th}}$ most popular web page is requested with a probability proportional to $1 / i^α$ ($α> 0$ is a constant). Furthermore, this study, which was performed more than 20 years ago, indicated Zipf-like behavior, i.e., that $α\le 1$. Surprisingly, the memory access traces coming from petabyte-size modern cloud systems not only show that $α$ can be bigger than one but also illustrate a shifted power-law distribution -- called Pareto type II or Lomax. These previously not reported phenomenon calls for statistical explanation. Our first contribution is a new statistical {\it multi-core power-law} model indicating that double-power law can be attributed to the presence of multiple cores running many virtual machines in parallel on such systems. We verify experimentally the applicability of this model using the Kolmogorov-Smirnov test (K-S test). The second contribution of this paper is a theoretical analysis indicating why LRU and LFU-based algorithms perform well in practice on data satisfying power-law or multi-core assumptions. We provide an explanation by studying the online paging problem in the stochastic input model, i.e., the input is a random sequence with each request independently drawn from a page set according to a distribution $π$. We derive formulas (as a function of the page probabilities in $π$) to upper bound their ratio-of-expectations, which help in establishing O(1) performance ratio given the random sequence following power-law and multi-core power-law distributions. △ Less

Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.16073 [pdf, other]

Compositional Search of Stable Crystalline Structures in Multi-Component Alloys Using Generative Diffusion Models

Authors: Grzegorz Kaszuba, Amirhossein Naghdi Dorabati, Stefanos Papanikolaou, Andrzej Jaszkiewicz, Piotr Sankowski

Abstract: Exploring the vast composition space of multi-component alloys presents a challenging task for both \textit{ab initio} (first principles) and experimental methods due to the time-consuming procedures involved. This ultimately impedes the discovery of novel, stable materials that may display exceptional properties. Here, the Crystal Diffusion Variational Autoencoder (CDVAE) model is adapted to char… ▽ More Exploring the vast composition space of multi-component alloys presents a challenging task for both \textit{ab initio} (first principles) and experimental methods due to the time-consuming procedures involved. This ultimately impedes the discovery of novel, stable materials that may display exceptional properties. Here, the Crystal Diffusion Variational Autoencoder (CDVAE) model is adapted to characterize the stable compositions of a well studied multi-component alloy, NiFeCr, with two distinct crystalline phases known to be stable across its compositional space. To this end, novel extensions to CDVAE were proposed, enhancing the model's ability to reconstruct configurations from their latent space within the test set by approximately 30\% . A fact that increases a model's probability of discovering new materials when dealing with various crystalline structures. Afterwards, the new model is applied for materials generation, demonstrating excellent agreement in identifying stable configurations within the ternary phase space when compared to first principles data. Finally, a computationally efficient framework for inverse design is proposed, employing Molecular Dynamics (MD) simulations of multi-component alloys with reliable interatomic potentials, enabling the optimization of materials property across the phase space. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.07599 [pdf, other]

Contrastive News and Social Media Linking using BERT for Articles and Tweets across Dual Platforms

Authors: Jan Piotrowski, Marek Wachnicki, Mateusz Perlik, Jakub Podolak, Grzegorz Rucki, Michał Brzozowski, Paweł Olejnik, Julian Kozłowski, Tomasz Nocoń, Jakub Kozieł, Stanisław Giziński, Piotr Sankowski

Abstract: X (formerly Twitter) has evolved into a contemporary agora, offering a platform for individuals to express opinions and viewpoints on current events. The majority of the topics discussed on Twitter are directly related to ongoing events, making it an important source for monitoring public discourse. However, linking tweets to specific news presents a significant challenge due to their concise and… ▽ More X (formerly Twitter) has evolved into a contemporary agora, offering a platform for individuals to express opinions and viewpoints on current events. The majority of the topics discussed on Twitter are directly related to ongoing events, making it an important source for monitoring public discourse. However, linking tweets to specific news presents a significant challenge due to their concise and informal nature. Previous approaches, including topic models, graph-based models, and supervised classifiers, have fallen short in effectively capturing the unique characteristics of tweets and articles. Inspired by the success of the CLIP model in computer vision, which employs contrastive learning to model similarities between images and captions, this paper introduces a contrastive learning approach for training a representation space where linked articles and tweets exhibit proximity. We present our contrastive learning approach, CATBERT (Contrastive Articles Tweets BERT), leveraging pre-trained BERT models. The model is trained and tested on a dataset containing manually labeled English and Polish tweets and articles related to the Russian-Ukrainian war. We evaluate CATBERT's performance against traditional approaches like LDA, and the novel method based on OpenAI embeddings, which has not been previously applied to this task. Our findings indicate that CATBERT demonstrates superior performance in associating tweets with relevant news articles. Furthermore, we demonstrate the performance of the models when applied to finding the main topic -- represented by an article -- of the whole cascade of tweets. In this new task, we report the performance of the different models in dependence on the cascade size. △ Less

Submitted 11 December, 2023; originally announced December 2023.

ACM Class: I.2.7

arXiv:2311.16905 [pdf, other]

Analyzing the Influence of Language Model-Generated Responses in Mitigating Hate Speech on Social Media Directed at Ukrainian Refugees in Poland

Authors: Jakub Podolak, Szymon Łukasik, Paweł Balawender, Jan Ossowski, Katarzyna Bąkowicz, Piotr Sankowski

Abstract: In the context of escalating hate speech and polarization on social media, this study investigates the potential of employing responses generated by Large Language Models (LLM), complemented with pertinent verified knowledge links, to counteract such trends. Through extensive A/B testing involving the posting of 753 automatically generated responses, the goal was to minimize the propagation of hat… ▽ More In the context of escalating hate speech and polarization on social media, this study investigates the potential of employing responses generated by Large Language Models (LLM), complemented with pertinent verified knowledge links, to counteract such trends. Through extensive A/B testing involving the posting of 753 automatically generated responses, the goal was to minimize the propagation of hate speech directed at Ukrainian refugees in Poland. The results indicate that deploying LLM-generated responses as replies to harmful tweets effectively diminishes user engagement, as measured by likes/impressions. When we respond to an original tweet, i.e., which is not a reply, we reduce the engagement of users by over 20\% without increasing the number of impressions. On the other hand, our responses increase the ratio of the number of replies to a harmful tweet to impressions, especially if the harmful tweet is not original. Additionally, the study examines how generated responses influence the overall sentiment of tweets in the discussion, revealing that our intervention does not significantly alter the mean sentiment. This paper suggests the implementation of an automatic moderation system to combat hate speech on social media and provides an in-depth analysis of the A/B experiment, covering methodology, data collection, and statistical outcomes. Ethical considerations and challenges are also discussed, offering guidance for the development of discourse moderation systems leveraging the capabilities of generative AI. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2308.08870 [pdf, ps, other]

Sensitivity and Dynamic Distance Oracles via Generic Matrices and Frobenius Form

Authors: Adam Karczmarz, Piotr Sankowski

Abstract: Algebraic techniques have had an important impact on graph algorithms so far. Porting them, e.g., the matrix inverse, into the dynamic regime improved best-known bounds for various dynamic graph problems. In this paper, we develop new algorithms for another cornerstone algebraic primitive, the Frobenius normal form (FNF). We apply our developments to dynamic and fault-tolerant exact distance oracl… ▽ More Algebraic techniques have had an important impact on graph algorithms so far. Porting them, e.g., the matrix inverse, into the dynamic regime improved best-known bounds for various dynamic graph problems. In this paper, we develop new algorithms for another cornerstone algebraic primitive, the Frobenius normal form (FNF). We apply our developments to dynamic and fault-tolerant exact distance oracle problems on directed graphs. For generic matrices $A$ over a finite field accompanied by an FNF, we show (1) an efficient data structure for querying submatrices of the first $k\geq 1$ powers of $A$, and (2) a near-optimal algorithm updating the FNF explicitly under rank-1 updates. By representing an unweighted digraph using a generic matrix over a sufficiently large field (obtained by random sampling) and leveraging the developed FNF toolbox, we obtain: (a) a conditionally optimal distance sensitivity oracle (DSO) in the case of single-edge or single-vertex failures, providing a partial answer to the open question of Gu and Ren [ICALP'21], (b) a multiple-failures DSO improving upon the state of the art (vd. Brand and Saranurak [FOCS'19]) wrt. both preprocessing and query time, (c) improved dynamic distance oracles in the case of single-edge updates, and (d) a dynamic distance oracle supporting vertex updates, i.e., changing all edges incident to a single vertex, in $\tilde{O}(n^2)$ worst-case time and distance queries in $\tilde{O}(n)$ time. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: To appear at FOCS 2023

arXiv:2210.07018 [pdf, ps, other]

Online matching with delays and stochastic arrival times

Authors: Mathieu Mari, Michał Pawłowski, Runtian Ren, Piotr Sankowski

Abstract: This paper presents a new research direction for the Min-cost Perfect Matching with Delays (MPMD) - a problem introduced by Emek et al. (STOC'16). In the original version of this problem, we are given an $n$-point metric space, where requests arrive in an online fashion. The goal is to minimise the matching cost for an even number of requests. However, contrary to traditional online matching probl… ▽ More This paper presents a new research direction for the Min-cost Perfect Matching with Delays (MPMD) - a problem introduced by Emek et al. (STOC'16). In the original version of this problem, we are given an $n$-point metric space, where requests arrive in an online fashion. The goal is to minimise the matching cost for an even number of requests. However, contrary to traditional online matching problems, a request does not have to be paired immediately at the time of its arrival. Instead, the decision of whether to match a request can be postponed for time $t$ at a delay cost of $t$. For this reason, the goal of the MPMD is to minimise the overall sum of distance and delay costs. Interestingly, for adversarially generated requests, no online algorithm can achieve a competitive ratio better than $O(\log n/\log \log n)$ (Ashlagi et al., APPROX/RANDOM'17). Here, we consider a stochastic version of the MPMD problem where the input requests follow a Poisson arrival process. For such a problem, we show that the above lower bound can be improved by presenting two deterministic online algorithms, which, in expectation, are constant-competitive. The first one is a simple greedy algorithm that matches any two requests once the sum of their delay costs exceeds their connection cost, i.e., the distance between them. The second algorithm builds on the tools used to analyse the first one in order to obtain even better performance guarantees. This result is rather surprising as the greedy approach for the adversarial model achieves a competitive ratio of $Ω(m^{\log \frac{3}{2}+\varepsilon})$, where $m$ denotes the number of requests served (Azar et al., TOCS'20). Finally, we prove that it is possible to obtain similar results for the general case when the delay cost follows an arbitrary positive and non-decreasing function, as well as for the MPMD variant with penalties to clear pending requests. △ Less

Submitted 16 January, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: 34 pages, 7 figures, accepted at AAMAS'23

arXiv:2206.02630 [pdf, other]

Improving Ads-Profitability Using Traffic-Fingerprints

Authors: Adam Gabriel Dobrakowski, Andrzej Pacuk, Piotr Sankowski, Marcin Mucha, Paweł Brach

Abstract: This paper introduces the concept of traffic-fingerprints, i.e., normalized 24-dimensional vectors representing a distribution of daily traffic on a web page. Using k-means clustering we show that similarity of traffic-fingerprints is related to the similarity of profitability time patterns for ads shown on these pages. In other words, these fingerprints are correlated with the conversions rates,… ▽ More This paper introduces the concept of traffic-fingerprints, i.e., normalized 24-dimensional vectors representing a distribution of daily traffic on a web page. Using k-means clustering we show that similarity of traffic-fingerprints is related to the similarity of profitability time patterns for ads shown on these pages. In other words, these fingerprints are correlated with the conversions rates, thus allowing us to argue about conversion rates on pages with negligible traffic. By blocking or unblocking whole clusters of pages we were able to increase the revenue of online campaigns by more than 50%. △ Less

Submitted 31 May, 2022; originally announced June 2022.

arXiv:2203.16992 [pdf, ps, other]

Subquadratic Dynamic Path Reporting in Directed Graphs Against an Adaptive Adversary

Authors: Adam Karczmarz, Anish Mukherjee, Piotr Sankowski

Abstract: We study reachability and shortest paths problems in dynamic directed graphs. Whereas algebraic dynamic data structures supporting edge updates and reachability/distance queries have been known for quite a long time, they do not, in general, allow reporting the underlying paths within the same time bounds, especially against an adaptive adversary. In this paper we develop the first known fully d… ▽ More We study reachability and shortest paths problems in dynamic directed graphs. Whereas algebraic dynamic data structures supporting edge updates and reachability/distance queries have been known for quite a long time, they do not, in general, allow reporting the underlying paths within the same time bounds, especially against an adaptive adversary. In this paper we develop the first known fully dynamic reachability data structures working against an adaptive adversary and supporting edge updates and path queries for two natural variants: (1) point-to-point path reporting, and (2) single-source reachability tree reporting. For point-to-point queries in DAGs, we achieve $O(n^{1.529})$ worst-case update and query bounds, whereas for tree reporting in DAGs, the worst-case bounds are $O(n^{1.765})$. More importantly, we show how to lift these algorithms to work on general graphs at the cost of increasing the bounds to $n^{1+5/6+o(1)}$ and making the update times amortized. On the way to accomplishing that, we obtain two interesting subresults. We give subquadratic fully dynamic algorithms for topological order (in a DAG), and strongly connected components. To the best of our knowledge, such algorithms have not been described before. Additionally, we provide deterministic incremental data structures for reachability and shortest paths that handle edge insertions and report the respective paths within subquadratic worst-case time bounds. For reachability and $(1+ε)$-approximate shortest paths in weighted digraphs, these bounds match the best known dynamic matrix inverse-based randomized bounds for fully dynamic reachability [v.d.Brand, Nanongkai and Saranurak, FOCS'19]. For exact shortest paths in unweighted graphs, the obtained bounds in the incremental setting polynomially improve upon the respective best known randomized update/distance query bounds in the fully dynamic setting. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: To appear at STOC'22

arXiv:2108.04126 [pdf, other]

Improved Feature Importance Computations for Tree Models: Shapley vs. Banzhaf

Authors: Adam Karczmarz, Anish Mukherjee, Piotr Sankowski, Piotr Wygocki

Abstract: Shapley values are one of the main tools used to explain predictions of tree ensemble models. The main alternative to Shapley values are Banzhaf values that have not been understood equally well. In this paper we make a step towards filling this gap, providing both experimental and theoretical comparison of these model explanation methods. Surprisingly, we show that Banzhaf values offer several ad… ▽ More Shapley values are one of the main tools used to explain predictions of tree ensemble models. The main alternative to Shapley values are Banzhaf values that have not been understood equally well. In this paper we make a step towards filling this gap, providing both experimental and theoretical comparison of these model explanation methods. Surprisingly, we show that Banzhaf values offer several advantages over Shapley values while providing essentially the same explanations. We verify that Banzhaf values: (1) have a more intuitive interpretation, (2) allow for more efficient algorithms, and (3) are much more numerically robust. We provide an experimental evaluation of these theses. In particular, we show that on real world instances. Additionally, from a theoretical perspective we provide new and improved algorithm computing the same Shapley value based explanations as the algorithm of Lundberg et al. [Nat. Mach. Intell. 2020]. Our algorithm runs in $O(TLD+n)$ time, whereas the previous algorithm had $O(TLD^2+n)$ running time bound. Here, $T$ is the number of trees, $L$ is the maximum number of leaves in a tree, and $D$ denotes the maximum depth of a tree in the ensemble. Using the computational techniques developed for Shapley values we deliver an optimal $O(TL+n)$ time algorithm for computing Banzhaf values based explanations. In our experiments these algorithms give running times smaller even by an order of magnitude. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2103.09684 [pdf, ps, other]

Sublinear Average-Case Shortest Paths in Weighted Unit-Disk Graphs

Authors: Adam Karczmarz, Jakub Pawlewicz, Piotr Sankowski

Abstract: We consider the problem of computing shortest paths in weighted unit-disk graphs in constant dimension $d$. Although the single-source and all-pairs variants of this problem are well-studied in the plane case, no non-trivial exact distance oracles for unit-disk graphs have been known to date, even for $d=2$. The classical result of Sedgewick and Vitter [Algorithmica '86] shows that for weighted… ▽ More We consider the problem of computing shortest paths in weighted unit-disk graphs in constant dimension $d$. Although the single-source and all-pairs variants of this problem are well-studied in the plane case, no non-trivial exact distance oracles for unit-disk graphs have been known to date, even for $d=2$. The classical result of Sedgewick and Vitter [Algorithmica '86] shows that for weighted unit-disk graphs in the plane the $A^*$ search has average-case performance superior to that of a standard shortest path algorithm, e.g., Dijkstra's algorithm. Specifically, if the $n$ corresponding points of a weighted unit-disk graph $G$ are picked from a unit square uniformly at random, and the connectivity radius is $r\in (0,1)$, $A^*$ finds a shortest path in $G$ in $O(n)$ expected time when $r=Ω(\sqrt{\log n/n})$, even though $G$ has $Θ((nr)^2)$ edges in expectation. In other words, the work done by the algorithm is in expectation proportional to the number of vertices and not the number of edges. In this paper, we break this natural barrier and show even stronger sublinear time results. We propose a new heuristic approach to computing point-to-point exact shortest paths in unit-disk graphs. We analyze the average-case behavior of our heuristic using the same random graph model as used by Sedgewick and Vitter and prove it superior to $A^*$. Specifically, we show that, if we are able to report the set of all $k$ points of $G$ from an arbitrary rectangular region of the plane in $O(k + t(n))$ time, then a shortest path between arbitrary two points of such a random graph on the plane can be found in $O(1/r^2 + t(n))$ expected time. In particular, the state-of-the-art range reporting data structures imply a sublinear expected bound for all $r=Ω(\sqrt{\log n/n})$ and $O(\sqrt{n})$ expected bound for $r=Ω(n^{-1/4})$ after only near-linear preprocessing of the point set. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: Full version of a SoCG'21 paper. Abstract truncated to meet arxiv requirements

arXiv:2103.03868 [pdf, ps, other]

Decomposable Submodular Function Minimization via Maximum Flow

Authors: Kyriakos Axiotis, Adam Karczmarz, Anish Mukherjee, Piotr Sankowski, Adrian Vladu

Abstract: This paper bridges discrete and continuous optimization approaches for decomposable submodular function minimization, in both the standard and parametric settings. We provide improved running times for this problem by reducing it to a number of calls to a maximum flow oracle. When each function in the decomposition acts on $O(1)$ elements of the ground set $V$ and is polynomially bounded, our ru… ▽ More This paper bridges discrete and continuous optimization approaches for decomposable submodular function minimization, in both the standard and parametric settings. We provide improved running times for this problem by reducing it to a number of calls to a maximum flow oracle. When each function in the decomposition acts on $O(1)$ elements of the ground set $V$ and is polynomially bounded, our running time is up to polylogarithmic factors equal to that of solving maximum flow in a sparse graph with $O(\vert V \vert)$ vertices and polynomial integral capacities. We achieve this by providing a simple iterative method which can optimize to high precision any convex function defined on the submodular base polytope, provided we can efficiently minimize it on the base polytope corresponding to the cut function of a certain graph that we construct. We solve this minimization problem by lifting the solutions of a parametric cut problem, which we obtain via a new efficient combinatorial reduction to maximum flow. This reduction is of independent interest and implies some previously unknown bounds for the parametric minimum $s,t$-cut problem in multiple settings. △ Less

Submitted 5 March, 2021; originally announced March 2021.

arXiv:2101.02311 [pdf, ps, other]

A Deterministic Parallel APSP Algorithm and its Applications

Authors: Adam Karczmarz, Piotr Sankowski

Abstract: In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has $\tilde{O}(nm+(n/d)^3)$ work and $\tilde{O}(d)$ depth for any depth parameter $d\in [1,n]$. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann e… ▽ More In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has $\tilde{O}(nm+(n/d)^3)$ work and $\tilde{O}(d)$ depth for any depth parameter $d\in [1,n]$. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has $\tilde{O}(nm+n^3/d^2)$ work and $\tilde{O}(d)$ depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. One notable ingredient of our parallel APSP algorithm is a simple deterministic $\tilde{O}(nm)$-work $\tilde{O}(d)$-depth procedure for computing $\tilde{O}(n/d)$-size hitting sets of shortest $d$-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called $d$-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an $\tilde{O}(nm)$-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved using a randomized algorithm [Orlin et al., Discret. Appl. Math. '18]. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: A SODA'21 paper. Slightly extended preliminaries. Abstract shortened to meet arXiv requirements

arXiv:1907.05391 [pdf, ps, other]

Walking Randomly, Massively, and Efficiently

Authors: Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, Piotr Sankowski

Abstract: We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massive Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the $Θ(\log n)$ MPC round complexity barrier. Our techniques enable breaking this bar… ▽ More We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massive Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the $Θ(\log n)$ MPC round complexity barrier. Our techniques enable breaking this barrier for PageRank---one of the most important applications of random walks---even in more challenging directed graphs, and for approximate bipartiteness and expansion testing. In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length $l$ in $Θ(\log l)$ rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight. For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process. For PageRank, we achieve the following round complexities for dam** factor equal to $1 - ε$: * in $O(\log \log n + \log 1 / ε)$ rounds for undirected graphs (with $\tilde O(m / ε^2)$ total space), * in $\tilde O(\log^2 \log n + \log^2 1/ε)$ rounds for directed graphs (with $\tilde O((m+n^{1+o(1)}) / poly\, ε)$ total space). △ Less

Submitted 5 November, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

arXiv:1907.02274 [pdf, ps, other]

Min-Cost Flow in Unit-Capacity Planar Graphs

Authors: Adam Karczmarz, Piotr Sankowski

Abstract: In this paper we give an $\widetilde{O}((nm)^{2/3}\log C)$ time algorithm for computing min-cost flow (or min-cost circulation) in unit capacity planar multigraphs where edge costs are integers bounded by $C$. For planar multigraphs, this improves upon the best known algorithms for general graphs: the $\widetilde{O}(m^{10/7}\log C)$ time algorithm of Cohen et al. [SODA 2017], the… ▽ More In this paper we give an $\widetilde{O}((nm)^{2/3}\log C)$ time algorithm for computing min-cost flow (or min-cost circulation) in unit capacity planar multigraphs where edge costs are integers bounded by $C$. For planar multigraphs, this improves upon the best known algorithms for general graphs: the $\widetilde{O}(m^{10/7}\log C)$ time algorithm of Cohen et al. [SODA 2017], the $O(m^{3/2}\log(nC))$ time algorithm of Gabow and Tarjan [SIAM J. Comput. 1989] and the $\widetilde{O}(\sqrt{n}m \log C)$ time algorithm of Lee and Sidford [FOCS 2014]. In particular, our result constitutes the first known fully combinatorial algorithm that breaks the $\widetilde{O}(m^{3/2})$ time barrier for min-cost flow problem in planar graphs. To obtain our result we first give a very simple successive shortest paths based scaling algorithm for unit-capacity min-cost flow problem that does not explicitly operate on dual variables. This algorithm also runs in $\widetilde{O}(m^{3/2}\log{C})$ time for general graphs, and, to the best of our knowledge, it has not been described before. We subsequently show how to implement this algorithm faster on planar graphs using well-established tools: $r$-divisions and efficient algorithms for computing (shortest) paths in so-called dense distance graphs. △ Less

Submitted 4 July, 2019; originally announced July 2019.

arXiv:1807.03839 [pdf, ps, other]

Online Facility Location with Deletions

Authors: Marek Cygan, Artur Czumaj, Marcin Mucha, Piotr Sankowski

Abstract: In this paper we study three previously unstudied variants of the online Facility Location problem, considering an intrinsic scenario when the clients and facilities are not only allowed to arrive to the system, but they can also depart at any moment. We begin with the study of a natural fully-dynamic online uncapacitated model where clients can be both added and removed. When a client arrives,… ▽ More In this paper we study three previously unstudied variants of the online Facility Location problem, considering an intrinsic scenario when the clients and facilities are not only allowed to arrive to the system, but they can also depart at any moment. We begin with the study of a natural fully-dynamic online uncapacitated model where clients can be both added and removed. When a client arrives, then it has to be assigned either to an existing facility or to a new facility opened at the client's location. However, when a client who has been also one of the open facilities is to be removed, then our model has to allow to reconnect all clients that have been connected to that removed facility. In this model, we present an optimal O(log n_act / log log n_act)-competitive algorithm, where n_act is the number of active clients at the end of the input sequence. Next, we turn our attention to the capacitated Facility Location problem. We first note that if no deletions are allowed, then one can achieve an optimal competitive ratio of O(log n/ log log n), where n is the length of the sequence. However, when deletions are allowed, the capacitated version of the problem is significantly more challenging than the uncapacitated one. We show that still, using a more sophisticated algorithmic approach, one can obtain an online O(log m + log c log n)-competitive algorithm for the capacitated Facility Location problem in the fully dynamic model, where m is number of points in the input metric and c is the capacity of any open facility. △ Less

Submitted 10 July, 2018; originally announced July 2018.

Comments: full version of ESA'18 submission

arXiv:1709.07869 [pdf, other]

NC Algorithms for Weighted Planar Perfect Matching and Related Problems

Authors: Piotr Sankowski

Abstract: Consider a planar graph $G=(V,E)$ with polynomially bounded edge weight function $w:E\to [0, poly(n)]$. The main results of this paper are NC algorithms for the following problems: - minimum weight perfect matching in $G$, - maximum cardinality and maximum weight matching in $G$ when $G$ is bipartite, - maximum multiple-source multiple-sink flow in $G$ where $c:E\to [1, poly(n)]$ is a polyno… ▽ More Consider a planar graph $G=(V,E)$ with polynomially bounded edge weight function $w:E\to [0, poly(n)]$. The main results of this paper are NC algorithms for the following problems: - minimum weight perfect matching in $G$, - maximum cardinality and maximum weight matching in $G$ when $G$ is bipartite, - maximum multiple-source multiple-sink flow in $G$ where $c:E\to [1, poly(n)]$ is a polynomially bounded edge capacity function, - minimum weight $f$-factor in $G$ where $f:V\to [1, poly(n)]$, - min-cost flow in $G$ where $c:E\to [1, poly(n)]$ is a polynomially bounded edge capacity function and $b:V\to [1, poly(n)]$ is a polynomially bounded vertex demand function. There have been no known NC algorithms for any of these problems previously (Before this and independent paper by Anari and Vazirani). In order to solve these problems we develop a new relatively simple but versatile framework that is combinatorial in spirit. It handles the combinatorial structure of matchings directly and needs to only know weights of appropriately defined matchings from algebraic subroutines. △ Less

Submitted 18 April, 2018; v1 submitted 22 September, 2017; originally announced September 2017.

arXiv:1708.06395 [pdf, other]

Approximate nearest neighbors search without false negatives for $l_2$ for $c>\sqrt{\log\log{n}}$

Authors: Piotr Sankowski, Piotr Wygocki

Abstract: In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space $\mathcal{R}^d$. These data structures work for any… ▽ More In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space $\mathcal{R}^d$. These data structures work for any $c = ω(\sqrt{\log{\log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Ω(\sqrt{d})$. This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $\big(\mathbb{R}^k\big)^L$ space equipped with metric $m(x,y) = \max_{1 \le i \le L}(\lVert x_i - y_i\rVert_2)$. △ Less

Submitted 13 September, 2017; v1 submitted 21 August, 2017; originally announced August 2017.

arXiv:1707.03478 [pdf, ps, other]

Round Compression for Parallel Matching Algorithms

Authors: Artur Czumaj, Jakub Łącki, Aleksander Mądry, Slobodan Mitrović, Krzysztof Onak, Piotr Sankowski

Abstract: For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow… ▽ More For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(\log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+Ω(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(\log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above $O(\log n)$ round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a $(2+ε)$-approximation to maximum matching, for any fixed constant $ε>0$, in $O((\log \log n)^2)$ rounds. △ Less

Submitted 1 February, 2018; v1 submitted 11 July, 2017; originally announced July 2017.

arXiv:1706.10228 [pdf, other]

Contracting a Planar Graph Efficiently

Authors: Jacob Holm, Giuseppe F. Italiano, Adam Karczmarz, Jakub Łącki, Eva Rotenberg, Piotr Sankowski

Abstract: We present a data structure that can maintain a simple planar graph under edge contractions in linear total time. The data structure supports adjacency queries and provides access to neighbor lists in $O(1)$ time. Moreover, it can report all the arising self-loops and parallel edges. By applying the data structure, we can achieve optimal running times for decremental bridge detection, 2-edge con… ▽ More We present a data structure that can maintain a simple planar graph under edge contractions in linear total time. The data structure supports adjacency queries and provides access to neighbor lists in $O(1)$ time. Moreover, it can report all the arising self-loops and parallel edges. By applying the data structure, we can achieve optimal running times for decremental bridge detection, 2-edge connectivity, maximal 3-edge connected components, and the problem of finding a unique perfect matching for a static planar graph. Furthermore, we improve the running times of algorithms for several planar graph problems, including decremental 2-vertex and 3-edge connectivity, and we show that using our data structure in a black-box manner, one obtains conceptually simple optimal algorithms for computing MST and 5-coloring in planar graphs. △ Less

Submitted 30 June, 2017; originally announced June 2017.

arXiv:1705.11163 [pdf, other]

Decremental Single-Source Reachability in Planar Digraphs

Authors: Giuseppe F. Italiano, Adam Karczmarz, Jakub Łącki, Piotr Sankowski

Abstract: In this paper we show a new algorithm for the decremental single-source reachability problem in directed planar graphs. It processes any sequence of edge deletions in $O(n\log^2{n}\log\log{n})$ total time and explicitly maintains the set of vertices reachable from a fixed source vertex. Hence, if all edges are eventually deleted, the amortized time of processing each edge deletion is only… ▽ More In this paper we show a new algorithm for the decremental single-source reachability problem in directed planar graphs. It processes any sequence of edge deletions in $O(n\log^2{n}\log\log{n})$ total time and explicitly maintains the set of vertices reachable from a fixed source vertex. Hence, if all edges are eventually deleted, the amortized time of processing each edge deletion is only $O(\log^2 n \log \log n)$, which improves upon a previously known $O(\sqrt{n})$ solution. We also show an algorithm for decremental maintenance of strongly connected components in directed planar graphs with the same total update time. These results constitute the first almost optimal (up to polylogarithmic factors) algorithms for both problems. To the best of our knowledge, these are the first dynamic algorithms with polylogarithmic update times on general directed planar graphs for non-trivial reachability-type problems, for which only polynomial bounds are known in general graphs. △ Less

Submitted 31 May, 2017; originally announced May 2017.

arXiv:1704.02093 [pdf, other]

A Tight Bound for Shortest Augmenting Paths on Trees

Authors: Bartłomiej Bosek, Dariusz Leniowski, Piotr Sankowski, Anna Zych-Pawlewicz

Abstract: The shortest augmenting path technique is one of the fundamental ideas used in maximum matching and maximum flow algorithms. Since being introduced by Edmonds and Karp in 1972, it has been widely applied in many different settings. Surprisingly, despite this extensive usage, it is still not well understood even in the simplest case: online bipartite matching problem on trees. In this problem a bip… ▽ More The shortest augmenting path technique is one of the fundamental ideas used in maximum matching and maximum flow algorithms. Since being introduced by Edmonds and Karp in 1972, it has been widely applied in many different settings. Surprisingly, despite this extensive usage, it is still not well understood even in the simplest case: online bipartite matching problem on trees. In this problem a bipartite tree $T=(W \uplus B, E)$ is being revealed online, i.e., in each round one vertex from $B$ with its incident edges arrives. It was conjectured by Chaudhuri et. al. [K. Chaudhuri, C. Daskalakis, R. D. Kleinberg, and H. Lin. Online bipartite perfect matching with augmentations. In INFOCOM 2009] that the total length of all shortest augmenting paths found is $O(n \log n)$. In this paper, we prove a tight $O(n \log n)$ upper bound for the total length of shortest augmenting paths for trees improving over $O(n \log^2 n)$ bound [B. Bosek, D. Leniowski, P. Sankowski, and A. Zych. Shortest augmenting paths for online matchings on trees. In WAOA 2015]. △ Less

Submitted 20 December, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

Comments: 22 pages, 10 figures

MSC Class: 05C70; 05C85; 05C05 ACM Class: F.2.2; G.2.2

arXiv:1702.05913 [pdf, other]

doi 10.1145/3038912.3052565

Why Do Cascade Sizes Follow a Power-Law?

Authors: Karol Węgrzycki, Piotr Sankowski, Andrzej Pacuk, Piotr Wygocki

Abstract: We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is co… ▽ More We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation. △ Less

Submitted 20 February, 2017; originally announced February 2017.

Comments: 8 pages, 7 figures, accepted to WWW 2017

ACM Class: J.4

arXiv:1612.03150 [pdf, ps, other]

Budget Feasible Mechanisms on Matroids

Authors: Stefano Leonardi, Gianpiero Monaco, Piotr Sankowski, Qiang Zhang

Abstract: Motivated by many practical applications, in this paper we study {\em budget feasible mechanisms} where the goal is to procure independent sets from matroids. More specifically, we are given a matroid $\mathcal{M}=(E,\mathcal{I})$ where each ground (indivisible) element is a selfish agent. The cost of each element (i.e., for selling the item or performing a service) is only known to the element it… ▽ More Motivated by many practical applications, in this paper we study {\em budget feasible mechanisms} where the goal is to procure independent sets from matroids. More specifically, we are given a matroid $\mathcal{M}=(E,\mathcal{I})$ where each ground (indivisible) element is a selfish agent. The cost of each element (i.e., for selling the item or performing a service) is only known to the element itself. There is a buyer with a budget having additive valuations over the set of elements $E$. The goal is to design an incentive compatible (truthful) budget feasible mechanism which procures an independent set of the matroid under the given budget that yields the largest value possible to the buyer. Our result is a deterministic, polynomial-time, individually rational, truthful and budget feasible mechanism with $4$-approximation to the optimal independent set. Then, we extend our mechanism to the setting of matroid intersections in which the goal is to procure common independent sets from multiple matroids. We show that, given a polynomial time deterministic blackbox that returns $α-$approximation solutions to the matroid intersection problem, there exists a deterministic, polynomial time, individually rational, truthful and budget feasible mechanism with $(3α+1)-$approximation to the optimal common independent set. △ Less

Submitted 8 March, 2021; v1 submitted 9 December, 2016; originally announced December 2016.

arXiv:1612.00959 [pdf, ps, other]

doi 10.1145/2987538.2987544

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

Authors: Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki, Adam Witkowski, Piotr Wygocki

Abstract: We present the Mim-Solution's approach to the RecSys Challenge 2016, which ranked 2nd. The goal of the competition was to prepare job recommendations for the users of the website Xing.com. Our two phase algorithm consists of candidate selection followed by the candidate ranking. We ranked the candidates by the predicted probability that the user will positively interact with the job offer. We ha… ▽ More We present the Mim-Solution's approach to the RecSys Challenge 2016, which ranked 2nd. The goal of the competition was to prepare job recommendations for the users of the website Xing.com. Our two phase algorithm consists of candidate selection followed by the candidate ranking. We ranked the candidates by the predicted probability that the user will positively interact with the job offer. We have used Gradient Boosting Decision Trees as the regression tool. △ Less

Submitted 3 December, 2016; originally announced December 2016.

Comments: 6 pages, 1 figure, 2 tables, Description of 2nd place winning solution of RecSys 2016 Challange. To be published in RecSys'16 Challange Proceedings

ACM Class: H.3.3; I.2.6; D.2.8

Journal ref: Proceedings of the Recommender Systems Challenge, RecSys Challenge '16, Boston, Massachusetts - September 15 - 15, 2016, pages 10:1--10:4

arXiv:1611.09387 [pdf, other]

doi 10.1145/2914586.2914623

There is Something Beyond the Twitter Network

Authors: Andrzej Pacuk, Piotr Sankowski, Karol Wegrzycki, Piotr Wygocki

Abstract: How information spreads through a social network? Can we assume, that the information is spread only through a given social network graph? What is the correct way to compare the models of information flow? These are the basic questions we address in this work. We focus on meticulous comparison of various, well-known models of rumor propagation in the social network. We introduce the model incorp… ▽ More How information spreads through a social network? Can we assume, that the information is spread only through a given social network graph? What is the correct way to compare the models of information flow? These are the basic questions we address in this work. We focus on meticulous comparison of various, well-known models of rumor propagation in the social network. We introduce the model incorporating mass media and effects of absent nodes. In this model the information appears spontaneously in the graph. Using the most conservative metric, we showed that the distribution of cascades sizes generated by this model fits the real data much better than the previously considered models. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: 8 pages, 2 figures, Hypertext 2016

ACM Class: I.6; H.3.4

Journal ref: Proceedings of the 27th {ACM} Conference on Hypertext and Social Media, {HT} 2016, Halifax, NS, Canada, July 10-13, 2016, pages 279--284

arXiv:1611.09317 [pdf, ps, other]

doi 10.1007/978-3-319-42634-1_9

Locality-Sensitive Hashing without False Negatives for l_p

Authors: Andrzej Pacuk, Piotr Sankowski, Karol Wegrzycki, Piotr Wygocki

Abstract: In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius $R$ in $d$ dimensional space equipped with $l_p$ norm when $p \in [1,\infty]$. Furthermore, we show how to use these hash functions to solve the $c$-approximate nearest neighbor search problem without false negatives. Namely… ▽ More In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius $R$ in $d$ dimensional space equipped with $l_p$ norm when $p \in [1,\infty]$. Furthermore, we show how to use these hash functions to solve the $c$-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance $R$, we will certainly report it and points at distance greater than $cR$ will not be reported for $c=Ω(\sqrt{d},d^{1-\frac{1}{p}})$. The constructed algorithms work: - with preprocessing time $\mathcal{O}(n \log(n))$ and sublinear expected query time, - with preprocessing time $\mathcal{O}(\mathrm{poly}(n))$ and expected query time $\mathcal{O}(\log(n))$. Our paper reports progress on answering the open problem presented by Pagh [8] who considered the nearest neighbor search without false negatives for the Hamming distance. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: 11 pages, 2 figures, COCOON 2016

ACM Class: F.2.2; G.3

Journal ref: Computing and Combinatorics - 22nd International Conference, {COCOON} 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016, Proceedings, pages 105--118

arXiv:1611.03789

doi 10.1007/s00224-018-9894-x

Improved Distance Queries and Cycle Counting by Frobenius Normal Form

Authors: Piotr Sankowski, Karol Węgrzycki

Abstract: Consider an unweighted, directed graph $G$ with the diameter $D$. In this paper, we introduce the framework for counting cycles and walks of given length in matrix multiplication time $\widetilde{O}(n^ω)$. The framework is based on the fast decomposition into Frobenius normal form and the Hankel matrix-vector multiplication. It allows us to solve the All-Nodes Shortest Cycles, All-Pairs All Walks… ▽ More Consider an unweighted, directed graph $G$ with the diameter $D$. In this paper, we introduce the framework for counting cycles and walks of given length in matrix multiplication time $\widetilde{O}(n^ω)$. The framework is based on the fast decomposition into Frobenius normal form and the Hankel matrix-vector multiplication. It allows us to solve the All-Nodes Shortest Cycles, All-Pairs All Walks problems efficiently and also give some improvement upon distance queries in unweighted graphs. △ Less

Submitted 13 March, 2023; v1 submitted 11 November, 2016; originally announced November 2016.

Comments: Lemma 2 in the previous version of this paper is incorrect. We thank Adam Karczmarz for pointing out the mistake

ACM Class: F.2.2

Journal ref: Theory of Computing Systems 2018

arXiv:1605.01717 [pdf, other]

Negative-Weight Shortest Paths and Unit Capacity Minimum Cost Flow in $\tilde{O}(m^{10/7} \log W)$ Time

Authors: Michael B. Cohen, Aleksander Madry, Piotr Sankowski, Adrian Vladu

Abstract: In this paper, we study a set of combinatorial optimization problems on weighted graphs: the shortest path problem with negative weights, the weighted perfect bipartite matching problem, the unit-capacity minimum-cost maximum flow problem and the weighted perfect bipartite $b$-matching problem under the assumption that $\Vert b\Vert_1=O(m)$. We show that each one of these four problems can be solv… ▽ More In this paper, we study a set of combinatorial optimization problems on weighted graphs: the shortest path problem with negative weights, the weighted perfect bipartite matching problem, the unit-capacity minimum-cost maximum flow problem and the weighted perfect bipartite $b$-matching problem under the assumption that $\Vert b\Vert_1=O(m)$. We show that each one of these four problems can be solved in $\tilde{O}(m^{10/7}\log W)$ time, where $W$ is the absolute maximum weight of an edge in the graph, which gives the first in over 25 years polynomial improvement in their sparse-graph time complexity. At a high level, our algorithms build on the interior-point method-based framework developed by Madry (FOCS 2013) for solving unit-capacity maximum flow problem. We develop a refined way to analyze this framework, as well as provide new variants of the underlying preconditioning and perturbation techniques. Consequently, we are able to extend the whole interior-point method-based approach to make it applicable in the weighted graph regime. △ Less

Submitted 13 July, 2016; v1 submitted 5 May, 2016; originally announced May 2016.

arXiv:1511.02612 [pdf, other]

Optimal Dynamic Strings

Authors: Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, Piotr Sankowski

Abstract: In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure f… ▽ More In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(\log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Ω(\log n)$ time; hence our implementation is optimal. Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(\log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(\log t \log \log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| \log t \log \log t)$ time. Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars. △ Less

Submitted 8 April, 2016; v1 submitted 9 November, 2015; originally announced November 2015.

MSC Class: 68P05; 68W32 ACM Class: F.2.2

arXiv:1507.02426 [pdf, other]

Algorithmic Complexity of Power Law Networks

Authors: Paweł Brach, Marek Cygan, Jakub Łącki, Piotr Sankowski

Abstract: It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold. First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world netw… ▽ More It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold. First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range $[1,2]$ such networks exhibit double power law phenomenon that was observed for several real-world networks. Our observation indicates that this phenomenon could be explained by just pure graph theoretical properties. The second aim of our work is to give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classical P-time problems including transitive closure, maximum matching, determinant, PageRank and matrix inverse. Moreover, we deal with the problems of counting triangles and finding maximum clique. Previously, it has been only shown that these problems can be solved very efficiently on power law graphs when these graphs are random, e.g., drawn at random from some distribution. However, it is unclear how to relate such a theoretical analysis to real-world graphs, which are fixed. Instead of that, we show that the randomness assumption can be replaced with a simple condition on the degrees of adjacent vertices, which can be used to obtain similar results. As a result, in some range of power law exponents, we are able to solve the maximum clique problem in polynomial time, although in general power law networks the problem is NP-complete. △ Less

Submitted 9 July, 2015; originally announced July 2015.

ACM Class: F.2.2; G.2.2

arXiv:1410.7534 [pdf, other]

Approximation Algorithms for Steiner Tree Problems Based on Universal Solution Frameworks

Authors: Krzysztof Ciebiera, Piotr Godlewski, Piotr Sankowski, Piotr Wygocki

Abstract: This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by… ▽ More This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by Byrka et al. using iterative rounding framework. These two algorithms are experimentally compared with greedy 2-approximation, with exact but exponential time Dreyfus-Wagner algorithm, as well as with results given by a state-of-the-art local search techniques by Uchoa and Werneck. The results of this paper are twofold. On one hand, we demonstrate that high level algorithmic concepts can be designed and efficiently used in C++. On the other hand, we show that the above algorithms with good theoretical guarantees, give decent results in practice, but are inferior to state-of-the-art heuristical approaches. △ Less

Submitted 28 October, 2014; originally announced October 2014.

arXiv:1409.7240 [pdf, other]

Optimal decremental connectivity in planar graphs

Authors: Jakub Łącki, Piotr Sankowski

Abstract: We show an algorithm for dynamic maintenance of connectivity information in an undirected planar graph subject to edge deletions. Our algorithm may answer connectivity queries of the form `Are vertices $u$ and $v$ connected with a path?' in constant time. The queries can be intermixed with any sequence of edge deletions, and the algorithm handles all updates in $O(n)$ time. This results improves o… ▽ More We show an algorithm for dynamic maintenance of connectivity information in an undirected planar graph subject to edge deletions. Our algorithm may answer connectivity queries of the form `Are vertices $u$ and $v$ connected with a path?' in constant time. The queries can be intermixed with any sequence of edge deletions, and the algorithm handles all updates in $O(n)$ time. This results improves over previously known $O(n \log n)$ time algorithm. △ Less

Submitted 25 September, 2014; originally announced September 2014.

ACM Class: F.2.2; G.2.2

arXiv:1407.3957 [pdf, ps, other]

Efficiency of Truthful and Symmetric Mechanisms in One-sided Matching

Authors: Marek Adamczyk, Piotr Sankowski, Qiang Zhang

Abstract: We study the efficiency (in terms of social welfare) of truthful and symmetric mechanisms in one-sided matching problems with {\em dichotomous preferences} and {\em normalized von Neumann-Morgenstern preferences}. We are particularly interested in the well-known {\em Random Serial Dictatorship} mechanism. For dichotomous preferences, we first show that truthful, symmetric and optimal mechanisms ex… ▽ More We study the efficiency (in terms of social welfare) of truthful and symmetric mechanisms in one-sided matching problems with {\em dichotomous preferences} and {\em normalized von Neumann-Morgenstern preferences}. We are particularly interested in the well-known {\em Random Serial Dictatorship} mechanism. For dichotomous preferences, we first show that truthful, symmetric and optimal mechanisms exist if intractable mechanisms are allowed. We then provide a connection to online bipartite matching. Using this connection, it is possible to design truthful, symmetric and tractable mechanisms that extract 0.69 of the maximum social welfare, which works under assumption that agents are not adversarial. Without this assumption, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least a third of the maximum social welfare. For normalized von Neumann-Morgenstern preferences, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least $\frac{1}{e}\frac{ν(\opt)^2}{n}$, where $ν(\opt)$ is the maximum social welfare and $n$ is the number of both agents and items. On the hardness side, we show that no truthful mechanism can achieve a social welfare better than $\frac{ν(\opt)^2}{n}$. △ Less

Submitted 17 September, 2014; v1 submitted 15 July, 2014; originally announced July 2014.

Comments: 13 pages, 1 figure

arXiv:1308.3336 [pdf, other]

The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree

Authors: Jakub Łącki, Jakub Oćwieja, Marcin Pilipczuk, Piotr Sankowski, Anna Zych

Abstract: In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an $n$-vertex graph $G=(V,E,w)$ with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set $S \subseteq V$, which changes over time. The changes applied to the terminal set are either ter… ▽ More In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an $n$-vertex graph $G=(V,E,w)$ with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set $S \subseteq V$, which changes over time. The changes applied to the terminal set are either terminal additions (incremental scenario), terminal removals (decremental scenario), or both (fully dynamic scenario). Our task here is twofold. We want to support updates in sublinear $o(n)$ time, and keep the approximation factor of the algorithm as small as possible. We show that we can maintain a $(6+\varepsilon)$-approximate Steiner tree of a general graph in $\tilde{O}(\sqrt{n} \log D)$ time per terminal addition or removal. Here, $D$ denotes the stretch of the metric induced by $G$. For planar graphs we achieve the same running time and the approximation ratio of $(2+\varepsilon)$. Moreover, we show faster algorithms for incremental and decremental scenarios. Finally, we show that if we allow higher approximation ratio, even more efficient algorithms are possible. In particular we show a polylogarithmic time $(4+\varepsilon)$-approximate algorithm for planar graphs. One of the main building blocks of our algorithms are dynamic distance oracles for vertex-labeled graphs, which are of independent interest. We also improve and use the online algorithms for the Steiner tree problem. △ Less

Submitted 24 June, 2016; v1 submitted 15 August, 2013; originally announced August 2013.

Comments: Full version of the paper accepted to STOC'15

arXiv:1306.6593 [pdf, other]

Network Sparsification for Steiner Problems on Planar and Bounded-Genus Graphs

Authors: Marcin Pilipczuk, Michał Pilipczuk, Piotr Sankowski, Erik Jan van Leeuwen

Abstract: We propose polynomial-time algorithms that sparsify planar and bounded-genus graphs while preserving optimal or near-optimal solutions to Steiner problems. Our main contribution is a polynomial-time algorithm that, given an unweighted graph $G$ embedded on a surface of genus $g$ and a designated face $f$ bounded by a simple cycle of length $k$, uncovers a set $F \subseteq E(G)$ of size polynomial… ▽ More We propose polynomial-time algorithms that sparsify planar and bounded-genus graphs while preserving optimal or near-optimal solutions to Steiner problems. Our main contribution is a polynomial-time algorithm that, given an unweighted graph $G$ embedded on a surface of genus $g$ and a designated face $f$ bounded by a simple cycle of length $k$, uncovers a set $F \subseteq E(G)$ of size polynomial in $g$ and $k$ that contains an optimal Steiner tree for any set of terminals that is a subset of the vertices of $f$. We apply this general theorem to prove that: * given an unweighted graph $G$ embedded on a surface of genus $g$ and a terminal set $S \subseteq V(G)$, one can in polynomial time find a set $F \subseteq E(G)$ that contains an optimal Steiner tree $T$ for $S$ and that has size polynomial in $g$ and $|E(T)|$; * an analogous result holds for an optimal Steiner forest for a set $S$ of terminal pairs; * given an unweighted planar graph $G$ and a terminal set $S \subseteq V(G)$, one can in polynomial time find a set $F \subseteq E(G)$ that contains an optimal (edge) multiway cut $C$ separating $S$ and that has size polynomial in $|C|$. In the language of parameterized complexity, these results imply the first polynomial kernels for Steiner Tree and Steiner Forest on planar and bounded-genus graphs (parameterized by the size of the tree and forest, respectively) and for (Edge) Multiway Cut on planar graphs (parameterized by the size of the cutset). Additionally, we obtain a weighted variant of our main contribution. △ Less

Submitted 11 July, 2017; v1 submitted 27 June, 2013; originally announced June 2013.

arXiv:1304.6740 [pdf, ps, other]

Algebraic Algorithms for b-Matching, Shortest Undirected Paths, and f-Factors

Authors: Harold N. Gabow, Piotr Sankowski

Abstract: Let G=(V,E) be a graph with f:V\to Z_+ a function assigning degree bounds to vertices. We present the first efficient algebraic algorithm to find an f-factor. The time is \tilde{O}(f(V)^ω). More generally for graphs with integral edge weights of maximum absolute value W we find a maximum weight f-factor in time \tilde{O}(Wf(V)^ω). (The algorithms are randomized, correct with high probability and L… ▽ More Let G=(V,E) be a graph with f:V\to Z_+ a function assigning degree bounds to vertices. We present the first efficient algebraic algorithm to find an f-factor. The time is \tilde{O}(f(V)^ω). More generally for graphs with integral edge weights of maximum absolute value W we find a maximum weight f-factor in time \tilde{O}(Wf(V)^ω). (The algorithms are randomized, correct with high probability and Las Vegas; the time bound is worst-case.) We also present three specializations of these algorithms: For maximum weight perfect f-matching the algorithm is considerably simpler (and almost identical to its special case of ordinary weighted matching). For the single-source shortest-path problem in undirected graphs with conservative edge weights, we present a generalization of the shortest-path tree, and we compute it in \tilde{O(Wn^ω) time. For bipartite graphs, we improve the known complexity bounds for vertex capacitated max-flow and min-cost max-flow on a subclass of graphs. △ Less

Submitted 24 April, 2013; originally announced April 2013.

arXiv:1210.4811 [pdf, ps, other]

Single Source - All Sinks Max Flows in Planar Digraphs

Authors: Jakub Łącki, Yahav Nussbaum, Piotr Sankowski, Christian Wulff-Nilsen

Abstract: Let G = (V,E) be a planar n-vertex digraph. Consider the problem of computing max st-flow values in G from a fixed source s to all sinks t in V\{s}. We show how to solve this problem in near-linear O(n log^3 n) time. Previously, no better solution was known than running a single-source single-sink max flow algorithm n-1 times, giving a total time bound of O(n^2 log n) with the algorithm of Borrada… ▽ More Let G = (V,E) be a planar n-vertex digraph. Consider the problem of computing max st-flow values in G from a fixed source s to all sinks t in V\{s}. We show how to solve this problem in near-linear O(n log^3 n) time. Previously, no better solution was known than running a single-source single-sink max flow algorithm n-1 times, giving a total time bound of O(n^2 log n) with the algorithm of Borradaile and Klein. An important implication is that all-pairs max st-flow values in G can be computed in near-quadratic time. This is close to optimal as the output size is Theta(n^2). We give a quadratic lower bound on the number of distinct max flow values and an Omega(n^3) lower bound for the total size of all min cut-sets. This distinguishes the problem from the undirected case where the number of distinct max flow values is O(n). Previous to our result, no algorithm which could solve the all-pairs max flow values problem faster than the time of Theta(n^2) max-flow computations for every planar digraph was known. This result is accompanied with a data structure that reports min cut-sets. For fixed s and all t, after O(n^{3/2} log^{3/2} n) preprocessing time, it can report the set of arcs C crossing a min st-cut in time roughly proportional to the size of C. △ Less

Submitted 17 October, 2012; originally announced October 2012.

Comments: 25 pages, 4 figures; extended abstract appeared in FOCS 2012

ACM Class: G.2.2; F.2.2

arXiv:1204.1616 [pdf, other]

Algorithmic Applications of Baur-Strassen's Theorem: Shortest Cycles, Diameter and Matchings

Authors: Marek Cygan, Harold N. Gabow, Piotr Sankowski

Abstract: Consider a directed or an undirected graph with integral edge weights from the set [-W, W], that does not contain negative weight cycles. In this paper, we introduce a general framework for solving problems on such graphs using matrix multiplication. The framework is based on the usage of Baur-Strassen's theorem and of Strojohann's determinant algorithm. It allows us to give new and simple solutio… ▽ More Consider a directed or an undirected graph with integral edge weights from the set [-W, W], that does not contain negative weight cycles. In this paper, we introduce a general framework for solving problems on such graphs using matrix multiplication. The framework is based on the usage of Baur-Strassen's theorem and of Strojohann's determinant algorithm. It allows us to give new and simple solutions to the following problems: * Finding Shortest Cycles -- We give a simple \tilde{O}(Wn^ω) time algorithm for finding shortest cycles in undirected and directed graphs. For directed graphs (and undirected graphs with non-negative weights) this matches the time bounds obtained in 2011 by Roditty and Vassilevska-Williams. On the other hand, no algorithm working in \tilde{O}(Wn^ω) time was previously known for undirected graphs with negative weights. Furthermore our algorithm for a given directed or undirected graph detects whether it contains a negative weight cycle within the same running time. * Computing Diameter and Radius -- We give a simple \tilde{O}(Wn^ω) time algorithm for computing a diameter and radius of an undirected or directed graphs. To the best of our knowledge no algorithm with this running time was known for undirected graphs with negative weights. * Finding Minimum Weight Perfect Matchings -- We present an \tilde{O}(Wn^ω) time algorithm for finding minimum weight perfect matchings in undirected graphs. This resolves an open problem posted by Sankowski in 2006, who presented such an algorithm but only in the case of bipartite graphs. In order to solve minimum weight perfect matching problem we develop a novel combinatorial interpretation of the dual solution which sheds new light on this problem. Such a combinatorial interpretation was not know previously, and is of independent interest. △ Less

Submitted 17 August, 2012; v1 submitted 7 April, 2012; originally announced April 2012.

Comments: To appear in FOCS 2012

ACM Class: F.2.2

arXiv:1203.4763 [pdf, ps, other]

doi 10.1103/PhysRevB.86.085205

Structural and electronic properties of Pb1-xCdxTe and Pb1-xMnxTe ternary alloys

Authors: Malgorzata Bukala, Piotr Sankowski, Ryszard Buczko, Perla Kacman

Abstract: A systematic theoretical study of two PbTe-based ternary alloys, Pb1-xCdxTe and Pb1-xMnxTe, is reported. First, using ab initio methods we study the stability of the crystal structure of CdTe - PbTe solid solutions, to predict the composition for which rock-salt structure of PbTe changes into zinc-blende structure of CdTe. The dependence of the lattice parameter on Cd (Mn) content x in the mixed c… ▽ More A systematic theoretical study of two PbTe-based ternary alloys, Pb1-xCdxTe and Pb1-xMnxTe, is reported. First, using ab initio methods we study the stability of the crystal structure of CdTe - PbTe solid solutions, to predict the composition for which rock-salt structure of PbTe changes into zinc-blende structure of CdTe. The dependence of the lattice parameter on Cd (Mn) content x in the mixed crystals is studied by the same methods. The obtained decrease of the lattice constant with x agrees with what is observed in both alloys. The band structures of PbTe-based ternary compounds are calculated within a tight-binding approach. To describe correctly the constituent materials new tight-binding parameterizations for PbTe and MnTe bulk crystals as well as a tight-binding description of rock-salt CdTe are proposed. For both studied ternary alloys, the calculated band gap in the L point increases with x, in qualitative agreement with photoluminescence measurements in the infrared. The results show also that in p-type Pb1-xCdxTe and Pb1-xMnxTe mixed crystals an enhancement of thermoelectrical power can be expected. △ Less

Submitted 21 March, 2012; originally announced March 2012.

Comments: 10 pages, 13 figures, submitted to Physical Review B

arXiv:1104.4890 [pdf, ps, other]

Min-cuts and Shortest Cycles in Planar Graphs in O(n log log n) Time

Authors: Jakub Łącki, Piotr Sankowski

Abstract: We present a deterministic O(n log log n) time algorithm for finding shortest cycles and minimum cuts in planar graphs. The algorithm improves the previously known fastest algorithm by Italiano et al. in STOC'11 by a factor of log n. This speedup is obtained through the use of dense distance graphs combined with a divide-and-conquer approach. We present a deterministic O(n log log n) time algorithm for finding shortest cycles and minimum cuts in planar graphs. The algorithm improves the previously known fastest algorithm by Italiano et al. in STOC'11 by a factor of log n. This speedup is obtained through the use of dense distance graphs combined with a divide-and-conquer approach. △ Less

Submitted 26 April, 2011; originally announced April 2011.

MSC Class: 05C85; 05C10 ACM Class: G.2.2

arXiv:1102.5105 [pdf, ps, other]

Approximation Algorithms for Union and Intersection Covering Problems

Authors: Marek Cygan, Fabrizio Grandoni, Stefano Leonardi, Marcin Mucha, Marcin Pilipczuk, Piotr Sankowski

Abstract: In a classical covering problem, we are given a set of requests that we need to satisfy (fully or partially), by buying a subset of items at minimum cost. For example, in the k-MST problem we want to find the cheapest tree spanning at least k nodes of an edge-weighted graph. Here nodes and edges represent requests and items, respectively. In this paper, we initiate the study of a new family of m… ▽ More In a classical covering problem, we are given a set of requests that we need to satisfy (fully or partially), by buying a subset of items at minimum cost. For example, in the k-MST problem we want to find the cheapest tree spanning at least k nodes of an edge-weighted graph. Here nodes and edges represent requests and items, respectively. In this paper, we initiate the study of a new family of multi-layer covering problems. Each such problem consists of a collection of h distinct instances of a standard covering problem (layers), with the constraint that all layers share the same set of requests. We identify two main subfamilies of these problems: - in a union multi-layer problem, a request is satisfied if it is satisfied in at least one layer; - in an intersection multi-layer problem, a request is satisfied if it is satisfied in all layers. To see some natural applications, consider both generalizations of k-MST. Union k-MST can model a problem where we are asked to connect a set of users to at least one of two communication networks, e.g., a wireless and a wired network. On the other hand, intersection k-MST can formalize the problem of connecting a subset of users to both electricity and water. We present a number of hardness and approximation results for union and intersection versions of several standard optimization problems: MST, Steiner tree, set cover, facility location, TSP, and their partial covering variants. △ Less

Submitted 24 February, 2011; originally announced February 2011.

arXiv:1011.2843 [pdf, ps, other]

Improved Minimum Cuts and Maximum Flows in Undirected Planar Graphs

Authors: Giuseppe F. Italiano, Piotr Sankowski

Abstract: In this paper we study minimum cut and maximum flow problems on planar graphs, both in static and in dynamic settings. First, we present an algorithm that given an undirected planar graph computes the minimum cut between any two given vertices in O(n log log n) time. Second, we show how to achieve the same O(n log log n) bound for the problem of computing maximum flows in undirected planar graphs.… ▽ More In this paper we study minimum cut and maximum flow problems on planar graphs, both in static and in dynamic settings. First, we present an algorithm that given an undirected planar graph computes the minimum cut between any two given vertices in O(n log log n) time. Second, we show how to achieve the same O(n log log n) bound for the problem of computing maximum flows in undirected planar graphs. To the best of our knowledge, these are the first algorithms for those two problems that break the O(n log n) barrier, which has been standing for more than 25 years. Third, we present a fully dynamic algorithm that is able to maintain information about minimum cuts and maximum flows in a plane graph (i.e., a planar graph with a fixed embedding): our algorithm is able to insert edges, delete edges and answer min-cut and max-flow queries between any pair of vertices in O(n^(2/3) log^3 n) time per operation. This result is based on a new dynamic shortest path algorithm for planar graphs which may be of independent interest. We remark that this is the first known non-trivial algorithm for min-cut and max-flow problems in a dynamic setting. △ Less

Submitted 22 November, 2010; v1 submitted 12 November, 2010; originally announced November 2010.

Comments: This paper is being merged with the paper by Christian Wulff-Nilsen "Min st-Cut of a Planar Graph in O(n loglog n) Time" http://arxiv.longhoe.net/abs/1007.3609

ACM Class: G.2.2

arXiv:1003.1320 [pdf, other]

Min st-Cut Oracle for Planar Graphs with Near-Linear Preprocessing Time

Authors: Glencora Borradaile, Piotr Sankowski, Christian Wulff-Nilsen

Abstract: For an undirected $n$-vertex planar graph $G$ with non-negative edge-weights, we consider the following type of query: given two vertices $s$ and $t$ in $G$, what is the weight of a min $st$-cut in $G$? We show how to answer such queries in constant time with $O(n\log^4n)$ preprocessing time and $O(n\log n)$ space. We use a Gomory-Hu tree to represent all the pairwise min cuts implicitly. Previous… ▽ More For an undirected $n$-vertex planar graph $G$ with non-negative edge-weights, we consider the following type of query: given two vertices $s$ and $t$ in $G$, what is the weight of a min $st$-cut in $G$? We show how to answer such queries in constant time with $O(n\log^4n)$ preprocessing time and $O(n\log n)$ space. We use a Gomory-Hu tree to represent all the pairwise min cuts implicitly. Previously, no subquadratic time algorithm was known for this problem. Since all-pairs min cut and the minimum cycle basis are dual problems in planar graphs, we also obtain an implicit representation of a minimum cycle basis in $O(n\log^4n)$ time and $O(n\log n)$ space. Additionally, an explicit representation can be obtained in $O(C)$ time and space where $C$ is the size of the basis. These results require that shortest paths are unique. This can be guaranteed either by using randomization without overhead, or deterministically with an additional $\log^2 n$ factor in the preprocessing times. △ Less

Submitted 9 October, 2013; v1 submitted 5 March, 2010; originally announced March 2010.

Comments: This is the final version submitted for journal publication and has improved the running time of an earlier version by a log n factor. This version includes the bibliography

ACM Class: G.2.2

arXiv:1001.1686 [pdf, ps, other]

Combinatorial Auctions with Budgets

Authors: Amos Fiat, Stefano Leonardi, Jared Saia, Piotr Sankowski

Abstract: We consider budget constrained combinatorial auctions where bidder $i$ has a private value $v_i$, a budget $b_i$, and is interested in all the items in $S_i$. The value to agent $i$ of a set of items $R$ is $|R \cap S_i| \cdot v_i$. Such auctions capture adword auctions, where advertisers offer a bid for ads in response to an advertiser-dependent set of adwords, and advertisers have budgets. It is… ▽ More We consider budget constrained combinatorial auctions where bidder $i$ has a private value $v_i$, a budget $b_i$, and is interested in all the items in $S_i$. The value to agent $i$ of a set of items $R$ is $|R \cap S_i| \cdot v_i$. Such auctions capture adword auctions, where advertisers offer a bid for ads in response to an advertiser-dependent set of adwords, and advertisers have budgets. It is known that even of all items are identical and all budgets are public it is not possible to be truthful and efficient. Our main result is a novel auction that runs in polynomial time, is incentive compatible, and ensures Pareto-optimality for such auctions when the valuations are private and the budgets are public knowledge. This extends the result of Dobzinski et al. (FOCS 2008) for auctions of multiple {\sl identical} items and public budgets to single-valued {\sl combinatorial} auctions with public budgets. △ Less

Submitted 20 April, 2010; v1 submitted 11 January, 2010; originally announced January 2010.

ACM Class: F.2.2; G.2.2

Showing 1–50 of 55 results for author: Sankowski, P