-
Relaxed Clique Percolation and Disinformation-Resilient Domains for Social Commerce Networks
Authors:
Himangshu Paul,
Alexander Nikolaev
Abstract:
Must we trace and block all fake content in a social commerce network so that genuine users may enjoy fake-free information? Such efforts largely fail, because, as we get better at spam detection, spammers use the same advances for anti-detection. As a fundamentally new approach, we show that an online platform can aggregate and route user-generated content in a smart personalized way, which foste…
▽ More
Must we trace and block all fake content in a social commerce network so that genuine users may enjoy fake-free information? Such efforts largely fail, because, as we get better at spam detection, spammers use the same advances for anti-detection. As a fundamentally new approach, we show that an online platform can aggregate and route user-generated content in a smart personalized way, which fosters and relies on "collective social responsibility". We introduce the notion of information aggregation domain, or simply, domain: composed for a given "central" node (user account), a domain is a connected set of nodes whose user-generated content is eligible to be used to meet the central node's information needs. Admitting malicious information sources - "bad citizen" nodes - into "good citizen" nodes' domains puts the good citizens at risk for disinformation attacks. We show how a platform can limit this risk by exploiting the social link structure between its nodes without the need to know which nodes are good or bad citizens. We introduce Relaxed Clique Percolation (RCP), a class of policies to compose personalized disinformation-resilient domains. Then, we define "RCP cores" and show how they can be used to efficiently compose resilient domains for all network nodes at once. Finally, we analyze the properties of RCP domains found in real-world social networks including Slashdot, Facebook, Flickr, and Yelp, to affirm that in practice, RCP domains turn out to be large and spatially diverse.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
FishNet: Deep Neural Networks for Low-Cost Fish Stock Estimation
Authors:
Moseli Mots'oehli,
Anton Nikolaev,
Wawan B. IGede,
John Lynham,
Peter J. Mous,
Peter Sadowski
Abstract:
Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish f…
▽ More
Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish from images containing multiple fish, possibly consisting of different species. Then each fish species is classified and the length is predicted using separate machine learning models. To develop the model, we use a dataset of 300,000 hand-labeled images containing 1.2M fish of 163 different species and ranging in length from 10cm to 250cm, with additional annotations and quality control methods used to curate high-quality training data. On held-out test data sets, our system achieves a 92% intersection over union on the fish segmentation task, a 89% top-1 classification accuracy on single fish species classification, and a 2.3cm mean absolute error on the fish length estimation task.
△ Less
Submitted 27 June, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
CoRSAI: A System for Robust Interpretation of CT Scans of COVID-19 Patients Using Deep Learning
Authors:
Manvel Avetisian,
Ilya Burenko,
Konstantin Egorov,
Vladimir Kokh,
Aleksandr Nesterov,
Aleksandr Nikolaev,
Alexander Ponomarchuk,
Elena Sokolova,
Alex Tuzhilin,
Dmitry Umerenkov
Abstract:
Analysis of chest CT scans can be used in detecting parts of lungs that are affected by infectious diseases such as COVID-19.Determining the volume of lungs affected by lesions is essential for formulating treatment recommendations and prioritizingpatients by severity of the disease. In this paper we adopted an approach based on using an ensemble of deep convolutionalneural networks for segmentati…
▽ More
Analysis of chest CT scans can be used in detecting parts of lungs that are affected by infectious diseases such as COVID-19.Determining the volume of lungs affected by lesions is essential for formulating treatment recommendations and prioritizingpatients by severity of the disease. In this paper we adopted an approach based on using an ensemble of deep convolutionalneural networks for segmentation of slices of lung CT scans. Using our models we are able to segment the lesions, evaluatepatients dynamics, estimate relative volume of lungs affected by lesions and evaluate the lung damage stage. Our modelswere trained on data from different medical centers. We compared predictions of our models with those of six experiencedradiologists and our segmentation model outperformed most of them. On the task of classification of disease severity, ourmodel outperformed all the radiologists.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting
Authors:
Langshi Chen,
Jiayu Li,
Ariful Azad,
Cenk Sahinalp,
Madhav Marathe,
Anil Vullikanti,
Andrey Nikolaev,
Egor Smirnov,
Ruslan Israfilov,
Judy Qiu
Abstract:
Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle…
▽ More
Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.
△ Less
Submitted 4 October, 2020; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Backtracking algorithms for constructing the Hamiltonian decomposition of a 4-regular multigraph
Authors:
Alexander V. Korostil,
Andrei V. Nikolaev
Abstract:
We consider a Hamiltonian decomposition problem of partitioning a regular graph into edge-disjoint Hamiltonian cycles. It is known that verifying vertex non-adjacency in the 1-skeleton of the symmetric and asymmetric traveling salesperson polytopes is NP-complete. On the other hand, a sufficient condition for two vertices to be non-adjacent can be formulated as a combinatorial problem of finding a…
▽ More
We consider a Hamiltonian decomposition problem of partitioning a regular graph into edge-disjoint Hamiltonian cycles. It is known that verifying vertex non-adjacency in the 1-skeleton of the symmetric and asymmetric traveling salesperson polytopes is NP-complete. On the other hand, a sufficient condition for two vertices to be non-adjacent can be formulated as a combinatorial problem of finding a second Hamiltonian decomposition of a 4-regular multigraph. We present two backtracking algorithms for constructing a second Hamiltonian decomposition and verifying vertex non-adjacency: an algorithm based on a simple path extension and an algorithm based on the chain edge fixing procedure.
Based on the results of computational experiments for undirected multigraphs, both backtracking algorithms lost to the known general variable neighborhood search heuristics. However, for directed multigraphs, the algorithm based on chain fixing of edges showed results comparable to heuristics on instances with an existing solution and better results on infeasible instances where the Hamiltonian decomposition does not exist.
△ Less
Submitted 26 May, 2022; v1 submitted 10 September, 2020;
originally announced September 2020.
-
On subset sum problem in branch groups
Authors:
Andrey Nikolaev,
Alexander Ushakov
Abstract:
We consider a group-theoretic analogue of the classic subset sum problem. In this brief note, we show that the subset sum problem is NP-complete in the first Grigorchuk group. More generally, we show NP-hardness of that problem in weakly regular branch groups, which implies NP-completeness if the group is, in addition, contracting.
We consider a group-theoretic analogue of the classic subset sum problem. In this brief note, we show that the subset sum problem is NP-complete in the first Grigorchuk group. More generally, we show NP-hardness of that problem in weakly regular branch groups, which implies NP-completeness if the group is, in addition, contracting.
△ Less
Submitted 22 June, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Adversarial Balancing-based Representation Learning for Causal Effect Inference with Observational Data
Authors:
Xin Du,
Lei Sun,
Wouter Duivesteijn,
Alexander Nikolaev,
Mykola Pechenizkiy
Abstract:
Learning causal effects from observational data greatly benefits a variety of domains such as health care, education and sociology. For instance, one could estimate the impact of a new drug on specific individuals to assist the clinic plan and improve the survival rate. In this paper, we focus on studying the problem of estimating Conditional Average Treatment Effect (CATE) from observational data…
▽ More
Learning causal effects from observational data greatly benefits a variety of domains such as health care, education and sociology. For instance, one could estimate the impact of a new drug on specific individuals to assist the clinic plan and improve the survival rate. In this paper, we focus on studying the problem of estimating Conditional Average Treatment Effect (CATE) from observational data. The challenges for this problem are two-fold: on the one hand, we have to derive a causal estimator to estimate the causal quantity from observational data, where there exists confounding bias; on the other hand, we have to deal with the identification of CATE when the distribution of covariates in treatment and control groups are imbalanced. To overcome these challenges, we propose a neural network framework called Adversarial Balancing-based representation learning for Causal Effect Inference (ABCEI), based on the recent advances in representation learning. To ensure the identification of CATE, ABCEI uses adversarial learning to balance the distributions of covariates in treatment and control groups in the latent representation space, without any assumption on the form of the treatment selection/assignment function. In addition, during the representation learning and balancing process, highly predictive information from the original covariate space might be lost. ABCEI can tackle this information loss problem by preserving useful information for predicting causal effects under the regularization of a mutual information estimator. The experimental results show that ABCEI is robust against treatment selection bias, and matches/outperforms the state-of-the-art approaches. Our experiments show promising results on several datasets, representing different health care domains among others.
△ Less
Submitted 16 October, 2020; v1 submitted 30 April, 2019;
originally announced April 2019.
-
A GraphBLAS Approach for Subgraph Counting
Authors:
Langshi Chen,
Jiayu Li,
Ariful Azad,
Lei Jiang,
Madhav Marathe,
Anil Vullikanti,
Andrey Nikolaev,
Egor Smirnov,
Ruslan Israfilov,
Judy Qiu
Abstract:
Subgraph counting aims to count the occurrences of a subgraph template T in a given network G. The basic problem of computing structural properties such as counting triangles and other subgraphs has found applications in diverse domains. Recent biological, social, cybersecurity and sensor network applications have motivated solving such problems on massive networks with billions of vertices. The l…
▽ More
Subgraph counting aims to count the occurrences of a subgraph template T in a given network G. The basic problem of computing structural properties such as counting triangles and other subgraphs has found applications in diverse domains. Recent biological, social, cybersecurity and sensor network applications have motivated solving such problems on massive networks with billions of vertices. The larger subgraph problem is known to be memory bounded and computationally challenging to scale; the complexity grows both as a function of T and G. In this paper, we study the non-induced tree subgraph counting problem, propose a novel layered softwarehardware co-design approach, and implement a shared-memory multi-threaded algorithm: 1) reducing the complexity of the parallel color-coding algorithm by identifying and pruning redundant graph traversal; 2) achieving a fully-vectorized implementation upon linear algebra kernels inspired by GraphBLAS, which significantly improves cache usage and maximizes memory bandwidth utilization. Experiments show that our implementation improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x for subgraph templates with size over 12 on a dual-socket Intel(R) Xeon(R) Platinum 8160 server. We believe our approach using GraphBLAS with optimized sparse linear algebra can be applied to other massive subgraph counting problems and emerging high-memory bandwidth hardware architectures.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
SD-WAN Internet Census
Authors:
Sergey Gordeychik,
Denis Kolegov,
Antony Nikolaev
Abstract:
The concept of software defined wide area network (SD-WAN or SDWAN) is central to modern computer networking, particularly in enterprise networks. By definition, these systems form network perimeter and connect Internet, WAN, extranet, and branches that makes them crucial from cybersecurity point of view. The goal of this paper is to provide the results of passive and active fingerprinting for SD-…
▽ More
The concept of software defined wide area network (SD-WAN or SDWAN) is central to modern computer networking, particularly in enterprise networks. By definition, these systems form network perimeter and connect Internet, WAN, extranet, and branches that makes them crucial from cybersecurity point of view. The goal of this paper is to provide the results of passive and active fingerprinting for SD-WAN systems using a common threat intelligence approach. We explore Internet-based and cloud-based publicly available SD-WAN systems using well-known Shodan and Censys search engines and custom developed automation tools and show that most of the SD-WAN systems have known vulnerabilities related to outdated software and insecure configuration.
△ Less
Submitted 29 October, 2018; v1 submitted 27 August, 2018;
originally announced August 2018.
-
1-skeletons of the spanning tree problems with additional constraints
Authors:
Vladimir Bondarenko,
Andrei Nikolaev,
Dzhambolet Shovgenov
Abstract:
We consider the polyhedral properties of two spanning tree problems with additional constraints. In the first problem, it is required to find a tree with a minimum sum of edge weights among all spanning trees with the number of leaves less or equal a given value. In the second problem, an additional constraint is the assumption that the degree of all vertices of the spanning tree does not exceed a…
▽ More
We consider the polyhedral properties of two spanning tree problems with additional constraints. In the first problem, it is required to find a tree with a minimum sum of edge weights among all spanning trees with the number of leaves less or equal a given value. In the second problem, an additional constraint is the assumption that the degree of all vertices of the spanning tree does not exceed a given value. The decision versions of both problems are NP-complete.
We consider the polytopes of these problems and their 1-skeletons. We prove that in both cases it is a NP-complete problem to determine whether the vertices of 1-skeleton are adjacent. Although it is possible to obtain a superpolynomial lower bounds on the clique numbers of these graphs. These values characterize the time complexity in a broad class of algorithms based on linear comparisons. The results indicate a fundamental difference in combinatorial and geometric properties between the considered problems and the classical minimum spanning tree problem.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Subset sum problem in polycyclic groups
Authors:
Andrey Nikolaev,
Alexander Ushakov
Abstract:
We consider a group-theoretic analogue of the classic subset sum problem. It is known that every virtually nilpotent group has polynomial time decidable subset sum problem. In this paper we use subgroup distortion to show that every polycyclic non-virtually-nilpotent group has NP-complete subset sum problem.
We consider a group-theoretic analogue of the classic subset sum problem. It is known that every virtually nilpotent group has polynomial time decidable subset sum problem. In this paper we use subgroup distortion to show that every polycyclic non-virtually-nilpotent group has NP-complete subset sum problem.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Non-commutative lattice problems
Authors:
Alexei Myasnikov,
Andrey Nikolaev,
Alexander Ushakov
Abstract:
We consider several subgroup-related algorithmic questions in groups, modeled after the classic computational lattice problems, and study their computational complexity. We find polynomial time solutions to problems like finding a subgroup element closest to a given group element, or finding a shortest non-trivial element of a subgroup in the case of nilpotent groups, and a large class of surface…
▽ More
We consider several subgroup-related algorithmic questions in groups, modeled after the classic computational lattice problems, and study their computational complexity. We find polynomial time solutions to problems like finding a subgroup element closest to a given group element, or finding a shortest non-trivial element of a subgroup in the case of nilpotent groups, and a large class of surface groups and Coxeter groups. We also provide polynomial time algorithm to compute geodesics in given generators of a subgroup of a free group.
△ Less
Submitted 10 August, 2015;
originally announced August 2015.
-
Logspace and compressed-word computations in nilpotent groups
Authors:
Jeremy Macdonald,
Alexei Myasnikov,
Andrey Nikolaev,
Svetla Vassileva
Abstract:
For finitely generated nilpotent groups, we employ Mal'cev coordinates to solve several classical algorithmic problems efficiently. Computation of normal forms, the membership problem, the conjugacy problem, and computation of presentations for subgroups are solved using only logarithmic space and quasilinear time. Logarithmic space presentation-uniform versions of these algorithms are provided. C…
▽ More
For finitely generated nilpotent groups, we employ Mal'cev coordinates to solve several classical algorithmic problems efficiently. Computation of normal forms, the membership problem, the conjugacy problem, and computation of presentations for subgroups are solved using only logarithmic space and quasilinear time. Logarithmic space presentation-uniform versions of these algorithms are provided. Compressed-word versions of the same problems, in which each input word is provided as a straight-line program, are solved in polynomial time.
△ Less
Submitted 19 December, 2021; v1 submitted 12 March, 2015;
originally announced March 2015.
-
Knapsack problems in products of groups
Authors:
Elizaveta Frenkel,
Andrey Nikolaev,
Alexander Ushakov
Abstract:
The classic knapsack and related problems have natural generalizations to arbitrary (non-commutative) groups, collectively called knapsack-type problems in groups. We study the effect of free and direct products on their time complexity. We show that free products in certain sense preserve time complexity of knapsack-type problems, while direct products may amplify it. Our methods allow to obtain…
▽ More
The classic knapsack and related problems have natural generalizations to arbitrary (non-commutative) groups, collectively called knapsack-type problems in groups. We study the effect of free and direct products on their time complexity. We show that free products in certain sense preserve time complexity of knapsack-type problems, while direct products may amplify it. Our methods allow to obtain complexity results for rational subset membership problem in amalgamated free products over finite subgroups.
△ Less
Submitted 10 August, 2015; v1 submitted 27 August, 2014;
originally announced August 2014.
-
The Post correspondence problem in groups
Authors:
Alexei Myasnikov,
Andrey Nikolaev,
Alexander Ushakov
Abstract:
We generalize the classical Post correspondence problem ($\mathbf{PCP}_n$) and its non-homogeneous variation ($\mathbf{GPCP}_n$) to non-commutative groups and study the computational complexity of these new problems. We observe that $\mathbf{PCP}_n$ is closely related to the equalizer problem in groups, while $\mathbf{GPCP}_n$ is connected to the double twisted conjugacy problem for endomorphisms.…
▽ More
We generalize the classical Post correspondence problem ($\mathbf{PCP}_n$) and its non-homogeneous variation ($\mathbf{GPCP}_n$) to non-commutative groups and study the computational complexity of these new problems. We observe that $\mathbf{PCP}_n$ is closely related to the equalizer problem in groups, while $\mathbf{GPCP}_n$ is connected to the double twisted conjugacy problem for endomorphisms. Furthermore, it is shown that one of the strongest forms of the word problem in a group $G$ (we call it the {\em hereditary word problem}) can be reduced to $\mathbf{GPCP}_n$ in $G$ in polynomial time.
The main results are that $\mathbf{PCP}_n$ is decidable in a finitely generated nilpotent group in polynomial time, while $\mathbf{GPCP}_n$ is undecidable in any group containing free non-abelian subgroup (though the argument is very different from the classical case of free semigroups). We show that the double endomorphism twisted conjugacy problem is undecidable in free groups of sufficiently large finite rank. We also consider the bounded $\mathbf{PCP}$ and observe that it is in $\mathbf{NP}$ for any group with $\mathbf{P}$-time decidable word problem, meanwhile it is $\mathbf{NP}$-hard in any group containing free non-abelian subgroup. In particular, the bounded $\mathbf{PCP}$ is $\mathbf{NP}$-complete in non-elementary hyperbolic groups and non-abelian right angle Artin groups.
△ Less
Submitted 17 November, 2013; v1 submitted 19 October, 2013;
originally announced October 2013.
-
Knapsack Problems in Groups
Authors:
Alexei Myasnikov,
Andrey Nikolaev,
Alexander Ushakov
Abstract:
We generalize the classical knapsack and subset sum problems to arbitrary groups and study the computational complexity of these new problems. We show that these problems, as well as the bounded submonoid membership problem, are P-time decidable in hyperbolic groups and give various examples of finitely presented groups where the subset sum problem is NP-complete.
We generalize the classical knapsack and subset sum problems to arbitrary groups and study the computational complexity of these new problems. We show that these problems, as well as the bounded submonoid membership problem, are P-time decidable in hyperbolic groups and give various examples of finitely presented groups where the subset sum problem is NP-complete.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
Exploring mutexes, the Oracle RDBMS retrial spinlocks
Authors:
Andrey Nikolaev
Abstract:
Spinlocks are widely used in database engines for processes synchronization. KGX mutexes is new retrial spinlocks appeared in contemporary Oracle versions for submicrosecond synchronization. The mutex contention is frequently observed in highly concurrent OLTP environments.
This work explores how Oracle mutexes operate, spin, and sleep. It develops predictive mathematical model and discusses par…
▽ More
Spinlocks are widely used in database engines for processes synchronization. KGX mutexes is new retrial spinlocks appeared in contemporary Oracle versions for submicrosecond synchronization. The mutex contention is frequently observed in highly concurrent OLTP environments.
This work explores how Oracle mutexes operate, spin, and sleep. It develops predictive mathematical model and discusses parameters and statistics related to mutex performance tuning, as well as results of contention experiments.
△ Less
Submitted 29 December, 2012;
originally announced December 2012.
-
Exploring Oracle RDBMS latches using Solaris DTrace
Authors:
Andrey Nikolaev
Abstract:
Rise of hundreds cores technologies bring again to the first plan the problem of interprocess synchronization in database engines. Spinlocks are widely used in contemporary DBMS to synchronize processes at microsecond timescale. Latches are Oracle RDBMS specific spinlocks. The latch contention is common to observe in contemporary high concurrency OLTP environments.
In contrast to system spinlock…
▽ More
Rise of hundreds cores technologies bring again to the first plan the problem of interprocess synchronization in database engines. Spinlocks are widely used in contemporary DBMS to synchronize processes at microsecond timescale. Latches are Oracle RDBMS specific spinlocks. The latch contention is common to observe in contemporary high concurrency OLTP environments.
In contrast to system spinlocks used in operating systems kernels, latches work in user context. Such user level spinlocks are influenced by context preemption and multitasking. Until recently there were no direct methods to measure effectiveness of user spinlocks. This became possible with the emergence of Solaris 10 Dynamic Tracing framework. DTrace allows tracing and profiling both OS and user applications.
This work investigates the possibilities to diagnose and tune Oracle latches. It explores the contemporary latch realization and spinning-blocking strategies, analyses corresponding statistic counters.
A mathematical model developed to estimate analytically the effect of tuning _SPIN_COUNT value.
△ Less
Submitted 2 November, 2011;
originally announced November 2011.