Search | arXiv e-print repository

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

Authors: Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski

Abstract: We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the applica… ▽ More We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.). △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2305.10935 [pdf, other]

Submodularity Gaps for Selected Network Design and Matching Problems

Authors: Martin Böhm, Jarosław Byrka, Mateusz Lewandowski, Jan Marcinkowski

Abstract: Submodularity in combinatorial optimization has been a topic of many studies and various algorithmic techniques exploiting submodularity of a studied problem have been proposed. It is therefore natural to ask, in cases where the cost function of the studied problem is not submodular, whether it is possible to approximate this cost function with a proxy submodular function. We answer this questio… ▽ More Submodularity in combinatorial optimization has been a topic of many studies and various algorithmic techniques exploiting submodularity of a studied problem have been proposed. It is therefore natural to ask, in cases where the cost function of the studied problem is not submodular, whether it is possible to approximate this cost function with a proxy submodular function. We answer this question in the negative for two major problems in metric optimization, namely Steiner Tree and Uncapacitated Facility Location. We do so by proving super-constant lower bounds on the submodularity gap for these problems, which are in contrast to the known constant factor cost sharing schemes known for them. Technically, our lower bounds build on strong lower bounds for the online variants of these two problems. Nevertheless, online lower bounds do not always imply submodularity lower bounds. We show that the problem Maximum Bipartite Matching does not exhibit any submodularity gap, despite its online variant being only (1 - 1/e)-competitive in the randomized setting. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2205.09533 [pdf, other]

Estimating the ultrasound attenuation coefficient using convolutional neural networks -- a feasibility study

Authors: Piotr Jarosik, Michal Byra, Marcin Lewandowski, Ziemowit Klimonda

Abstract: Attenuation coefficient (AC) is a fundamental measure of tissue acoustical properties, which can be used in medical diagnostics. In this work, we investigate the feasibility of using convolutional neural networks (CNNs) to directly estimate AC from radio-frequency (RF) ultrasound signals. To develop the CNNs we used RF signals collected from tissue mimicking numerical phantoms for the AC values in… ▽ More Attenuation coefficient (AC) is a fundamental measure of tissue acoustical properties, which can be used in medical diagnostics. In this work, we investigate the feasibility of using convolutional neural networks (CNNs) to directly estimate AC from radio-frequency (RF) ultrasound signals. To develop the CNNs we used RF signals collected from tissue mimicking numerical phantoms for the AC values in a range from 0.1 to 1.5 dB/(MHz*cm). The models were trained based on 1-D patches of RF data. We obtained mean absolute AC estimation errors of 0.08, 0.12, 0.20, 0.25 for the patch lengths: 10 mm, 5 mm, 2 mm and 1 mm, respectively. We explain the performance of the model by visualizing the frequency content associated with convolutional filters. Our study presents that the AC can be calculated using deep learning, and the weights of the CNNs can have physical interpretation. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 4 figures

arXiv:2007.02650 [pdf, other]

On Data Augmentation and Adversarial Risk: An Empirical Analysis

Authors: Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer

Abstract: Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models. These techniques rely on different ideas such as invariance-preserving transformations (e.g, expert-defined augmentation), statistical heuristics (e.g, Mixup), and learning the data distribution (e.g, GANs). However, in the adversarial setting… ▽ More Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models. These techniques rely on different ideas such as invariance-preserving transformations (e.g, expert-defined augmentation), statistical heuristics (e.g, Mixup), and learning the data distribution (e.g, GANs). However, in the adversarial settings it remains unclear under what conditions such data augmentation methods reduce or even worsen the misclassification risk. In this paper, we therefore analyse the effect of different data augmentation techniques on the adversarial risk by three measures: (a) the well-known risk under adversarial attacks, (b) a new measure of prediction-change stress based on the Laplacian operator, and (c) the influence of training examples on prediction. The results of our empirical analysis disprove the hypothesis that an improvement in the classification performance induced by a data augmentation is always accompanied by an improvement in the risk under adversarial attack. Further, our results reveal that the augmented data has more influence than the non-augmented data, on the resulting models. Taken together, our results suggest that general-purpose data augmentations that do not take into the account the characteristics of the data and the task, must be applied with care. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: 21 pages, 15 figures, 3 tables

arXiv:2005.09903 [pdf, other]

ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy

Authors: Natalia Shepeleva, Werner Zellinger, Michal Lewandowski, Bernhard Moser

Abstract: We propose a new metric space of ReLU activation codes equipped with a truncated Hamming distance which establishes an isometry between its elements and polyhedral bodies in the input space which have recently been shown to be strongly related to safety, robustness, and confidence. This isometry allows the efficient computation of adjacency relations between the polyhedral bodies. Experiments on M… ▽ More We propose a new metric space of ReLU activation codes equipped with a truncated Hamming distance which establishes an isometry between its elements and polyhedral bodies in the input space which have recently been shown to be strongly related to safety, robustness, and confidence. This isometry allows the efficient computation of adjacency relations between the polyhedral bodies. Experiments on MNIST and CIFAR-10 indicate that information besides accuracy might be stored in the code space. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: in ICLR 2020 Workshop on Neural Architecture Search (NAS 2020)

arXiv:2004.14102 [pdf, other]

Dense Steiner problems: Approximation algorithms and inapproximability

Authors: Marek Karpinski, Mateusz Lewandowski, Syed Mohammad Meesum, Matthias Mnich

Abstract: The Steiner Tree problem is a classical problem in combinatorial optimization: the goal is to connect a set $T$ of terminals in a graph $G$ by a tree of minimum size. Karpinski and Zelikovsky (1996) studied the $δ$-dense version of {\sc Steiner Tree}, where each terminal has at least $δ|V(G)\setminus T|$ neighbours outside $T$, for a fixed $δ> 0$. They gave a PTAS for this problem. We study a ge… ▽ More The Steiner Tree problem is a classical problem in combinatorial optimization: the goal is to connect a set $T$ of terminals in a graph $G$ by a tree of minimum size. Karpinski and Zelikovsky (1996) studied the $δ$-dense version of {\sc Steiner Tree}, where each terminal has at least $δ|V(G)\setminus T|$ neighbours outside $T$, for a fixed $δ> 0$. They gave a PTAS for this problem. We study a generalization of pairwise $δ$-dense {\sc Steiner Forest}, which asks for a minimum-size forest in $G$ in which the nodes in each terminal set $T_1,\dots,T_k$ are connected, and every terminal in $T_i$ has at least $δ|T_j|$ neighbours in $T_j$, and at least $δ|S|$ nodes in $S = V(G)\setminus (T_1\cup\dots\cup T_k)$, for each $i, j$ in $\{1,\dots, k\}$ with $i\neq j$. Our first result is a polynomial-time approximation scheme for all $δ> 1/2$. Then, we show a $(\frac{13}{12}+\varepsilon)$-approximation algorithm for $δ= 1/2$ and any $\varepsilon > 0$. We also consider the $δ$-dense Group Steiner Tree problem as defined by Hauptmann and show that the problem is $\mathsf{APX}$-hard. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:1912.00770 [pdf, other]

Concave connection cost Facility Location and the Star Inventory Routing problem

Authors: Jarosław Byrka, Mateusz Lewandowski

Abstract: We study a variant of the uncapacitated facility location problem (UFL), where connection costs of clients are defined by (client specific) concave nondecreasing functions of the connection distance in the underlying metric. A special case capturing the complexity of this variant is the setting called facility location with penalties where clients may either connect to a facility or pay a (client… ▽ More We study a variant of the uncapacitated facility location problem (UFL), where connection costs of clients are defined by (client specific) concave nondecreasing functions of the connection distance in the underlying metric. A special case capturing the complexity of this variant is the setting called facility location with penalties where clients may either connect to a facility or pay a (client specific) penalty. We show that the best known approximation algorithms for UFL may be adapted to the concave connection cost setting. The key technical contribution is an argument that the JMS algorithm for UFL may be adapted to provide the same approximation guarantee for the more general concave connection cost variant. We also study the star inventory routing with facility location (SIRPFL) problem that was recently introduced by Jiao and Ravi, which asks to jointly optimize the task of clustering of demand points with the later serving of requests within created clusters. We show that the problem may be reduced to the concave connection cost facility location and substantially improve the approximation ratio for all three variants of SIRPFL. △ Less

Submitted 10 October, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

arXiv:1912.00717 [pdf, other]

PTAS for Steiner Tree on Map Graphs

Authors: Jarosław Byrka, Mateusz Lewandowski, Syed Mohammad Meesum, Joachim Spoerhase, Sumedha Uniyal

Abstract: We study the Steiner tree problem on map graphs, which substantially generalize planar graphs as they allow arbitrarily large cliques. We obtain a PTAS for Steiner tree on map graphs, which builds on the result for planar edge weighted instances of Borradaile et al. The Steiner tree problem on map graphs can be casted as a special case of the planar node-weighted Steiner tree problem, for which on… ▽ More We study the Steiner tree problem on map graphs, which substantially generalize planar graphs as they allow arbitrarily large cliques. We obtain a PTAS for Steiner tree on map graphs, which builds on the result for planar edge weighted instances of Borradaile et al. The Steiner tree problem on map graphs can be casted as a special case of the planar node-weighted Steiner tree problem, for which only a 2.4-approximation is known. We prove and use a contraction decomposition theorem for planar node weighted instances. This readily reduces the problem of finding a PTAS for planar node-weighted Steiner tree to finding a spanner, i.e., a constant-factor approximation containing a nearly optimum solution. Finally, we pin-point places where known techniques for constructing such spanner fail on node weighted instances and further progress requires new ideas. △ Less

Submitted 2 December, 2019; originally announced December 2019.

arXiv:1908.00766 [pdf, other]

doi 10.33682/1syg-dy60

Sound source detection, localization and classification using consecutive ensemble of CRNN models

Authors: Sławomir Kapka, Mateusz Lewandowski

Abstract: In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival… ▽ More In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events' onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants' submissions. △ Less

Submitted 5 September, 2019; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: 5 pages, 3 figures, conference

Journal ref: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York University, NY, USA, October 2019

arXiv:1811.08506 [pdf, ps, other]

Tight Approximation Ratio for Minimum Maximal Matching

Authors: Szymon Dudycz, Mateusz Lewandowski, Jan Marcinkowski

Abstract: We study a combinatorial problem called Minimum Maximal Matching, where we are asked to find in a general graph the smallest that can not be extended. We show that this problem is hard to approximate with a constant smaller than 2, assuming the Unique Games Conjecture. As a corollary we show, that Minimum Maximal Matching in bipartite graphs is hard to approximate with constant smaller than… ▽ More We study a combinatorial problem called Minimum Maximal Matching, where we are asked to find in a general graph the smallest that can not be extended. We show that this problem is hard to approximate with a constant smaller than 2, assuming the Unique Games Conjecture. As a corollary we show, that Minimum Maximal Matching in bipartite graphs is hard to approximate with constant smaller than $\frac{4}{3}$, with the same assumption. With a stronger variant of the Unique Games Conjecture --- that is Small Set Expansion Hypothesis --- we are able to improve the hardness result up to the factor of $\frac{3}{2}$. △ Less

Submitted 20 November, 2018; originally announced November 2018.

arXiv:1801.00313 [pdf, other]

Approximating Node-Weighted k-MST on Planar Graphs

Authors: Jarosław Byrka, Mateusz Lewandowski, Joachim Spoerhase

Abstract: We study the problem of finding a minimum weight connected subgraph spanning at least $k$ vertices on planar, node-weighted graphs. We give a $(4+\eps)$-approximation algorithm for this problem. We achieve this by utilizing the recent LMP primal-dual $3$-approximation for the node-weighted prize-collecting Steiner tree problem by Byrka et al (SWAT'16) and adopting an approach by Chudak et al. (Mat… ▽ More We study the problem of finding a minimum weight connected subgraph spanning at least $k$ vertices on planar, node-weighted graphs. We give a $(4+\eps)$-approximation algorithm for this problem. We achieve this by utilizing the recent LMP primal-dual $3$-approximation for the node-weighted prize-collecting Steiner tree problem by Byrka et al (SWAT'16) and adopting an approach by Chudak et al. (Math.\ Prog.\ '04) regarding Lagrangian relaxation for the edge-weighted variant. In particular, we improve the procedure of picking additional vertices (tree merging procedure) given by Sadeghian (2013) by taking a constant number of recursive steps and utilizing the limited guessing procedure of Arora and Karakostas (Math.\ Prog.\ '06). More generally, our approach readily gives a $(\nicefrac{4}{3}\cdot r+\eps)$-approximation on any graph class where the algorithm of Byrka et al.\ for the prize-collecting version gives an $r$-approximation. We argue that this can be interpreted as a generalization of an analogous result by Könemann et al. (Algorithmica~'11) for partial cover problems. Together with a lower bound construction by Mestre (STACS'08) for partial cover this implies that our bound is essentially best possible among algorithms that utilize an LMP algorithm for the Lagrangian relaxation as a black box. In addition to that, we argue by a more involved lower bound construction that even using the LMP algorithm by Byrka et al.\ in a \emph{non-black-box} fashion could not beat the factor $\nicefrac{4}{3}\cdot r$ when the tree merging step relies only on the solutions output by the LMP algorithm. △ Less

Submitted 8 May, 2018; v1 submitted 31 December, 2017; originally announced January 2018.

arXiv:1601.02481 [pdf, ps, other]

Approximation algorithms for node-weighted prize-collecting Steiner tree problems on planar graphs

Authors: Jarosław Byrka, Mateusz Lewandowski, Carsten Moldenhauer

Abstract: We study the prize-collecting version of the Node-weighted Steiner Tree problem (NWPCST) restricted to planar graphs. We give a new primal-dual Lagrangian-multiplier-preserving (LMP) 3-approximation algorithm for planar NWPCST. We then show a ($2.88 + ε$)-approximation which establishes a new best approximation guarantee for planar NWPCST. This is done by combining our LMP algorithm with a thresho… ▽ More We study the prize-collecting version of the Node-weighted Steiner Tree problem (NWPCST) restricted to planar graphs. We give a new primal-dual Lagrangian-multiplier-preserving (LMP) 3-approximation algorithm for planar NWPCST. We then show a ($2.88 + ε$)-approximation which establishes a new best approximation guarantee for planar NWPCST. This is done by combining our LMP algorithm with a threshold rounding technique and utilizing the 2.4-approximation of Berman and Yaroslavtsev for the version without penalties. We also give a primal-dual 4-approximation algorithm for the more general forest version using techniques introduced by Hajiaghay and Jain. △ Less

Submitted 11 January, 2016; originally announced January 2016.

Showing 1–12 of 12 results for author: Lewandowski, M