Search | arXiv e-print repository

doi 10.1109/ICEET53442.2021.9659578

Development of a Vision System to Enhance the Reliability of the Pick-and-Place Robot for Autonomous Testing of Camera Module used in Smartphones

Authors: Hoang-Anh Phan, Duy Nam Bui, Tuan Nguyen Dinh, Bao-Anh Hoang, An Nguyen Ngoc, Dong Tran Huu Quoc, Ha Tran Thi Thuy, Tung Thanh Bui, Van Nguyen Thi Thanh

Abstract: Pick-and-place robots are commonly used in modern industrial manufacturing. For complex devices/parts like camera modules used in smartphones, which contain optical parts, electrical components and interfacing connectors, the placement operation may not absolutely accurate, which may cause damage in the device under test during the mechanical movement to make good contact for electrical functions… ▽ More Pick-and-place robots are commonly used in modern industrial manufacturing. For complex devices/parts like camera modules used in smartphones, which contain optical parts, electrical components and interfacing connectors, the placement operation may not absolutely accurate, which may cause damage in the device under test during the mechanical movement to make good contact for electrical functions inspection. In this paper, we proposed an effective vision system including hardware and algorithm to enhance the reliability of the pick-and-place robot for autonomous testing memory of camera modules. With limited hardware based on camera and raspberry PI and using simplify image processing algorithm based on histogram information, the vision system can confirm the presence of the camera modules in feeding tray and the placement accuracy of the camera module in test socket. Through that, the system can work with more flexibility and avoid damaging the device under test. The system was experimentally quantified through testing approximately 2000 camera modules in a stable light condition. Experimental results demonstrate that the system achieves accuracy of more than 99.92%. With its simplicity and effectiveness, the proposed vision system can be considered as a useful solution for using in pick-and-place systems in industry. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Published to 2021 International Conference on Engineering and Emerging Technologies (ICEET 2021). 6 pages

arXiv:2211.11001 [pdf]

F2SD: A dataset for end-to-end group detection algorithms

Authors: Giang Hoang, Tuan Nguyen Dinh, Tung Cao Hoang, Son Le Duy, Keisuke Hihara, Yumeka Utada, Akihiko Torii, Naoki Izumi, Long Tran Quoc

Abstract: The lack of large-scale datasets has been impeding the advance of deep learning approaches to the problem of F-formation detection. Moreover, most research works on this problem rely on input sensor signals of object location and orientation rather than image signals. To address this, we develop a new, large-scale dataset of simulated images for F-formation detection, called F-formation Simulation… ▽ More The lack of large-scale datasets has been impeding the advance of deep learning approaches to the problem of F-formation detection. Moreover, most research works on this problem rely on input sensor signals of object location and orientation rather than image signals. To address this, we develop a new, large-scale dataset of simulated images for F-formation detection, called F-formation Simulation Dataset (F2SD). F2SD contains nearly 60,000 images simulated from GTA-5, with bounding boxes and orientation information on images, making it useful for a wide variety of modelling approaches. It is also closer to practical scenarios, where three-dimensional location and orientation information are costly to record. It is challenging to construct such a large-scale simulated dataset while kee** it realistic. Furthermore, the available research utilizes conventional methods to detect groups. They do not detect groups directly from the image. In this work, we propose (1) a large-scale simulation dataset F2SD and a pipeline for F-formation simulation, (2) a first-ever end-to-end baseline model for the task, and experiments on our simulation dataset. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at ICMV 2022

arXiv:2208.04613 [pdf]

Res-Dense Net for 3D Covid Chest CT-scan classification

Authors: Quoc-Huy Trinh, Minh-Van Nguyen, Thien-Phuc Nguyen Dinh

Abstract: One of the most contentious areas of research in Medical Image Preprocessing is 3D CT-scan. With the rapid spread of COVID-19, the function of CT-scan in properly and swiftly diagnosing the disease has become critical. It has a positive impact on infection prevention. There are many tasks to diagnose the illness through CT-scan images, include COVID-19. In this paper, we propose a method that usin… ▽ More One of the most contentious areas of research in Medical Image Preprocessing is 3D CT-scan. With the rapid spread of COVID-19, the function of CT-scan in properly and swiftly diagnosing the disease has become critical. It has a positive impact on infection prevention. There are many tasks to diagnose the illness through CT-scan images, include COVID-19. In this paper, we propose a method that using a Stacking Deep Neural Network to detect the Covid 19 through the series of 3D CT-scans images . In our method, we experiment with two backbones are DenseNet 121 and ResNet 101. This method achieves a competitive performance on some evaluation metrics △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.07524 by other authors

arXiv:2205.05611 [pdf, other]

Blockchain-based Secure Client Selection in Federated Learning

Authors: Truc Nguyen, Phuc Thai, Tre' R. Jeter, Thang N. Dinh, My T. Thai

Abstract: Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the… ▽ More Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the client selection process, the server can circumvent the secure aggregation to learn the local models of a victim client, indicating that secure aggregation alone is inadequate for privacy protection. To tackle this issue, we leverage blockchain technology to propose a verifiable client selection protocol. Owing to the immutability and transparency of blockchain, our proposed protocol enforces a random selection of clients, making the server unable to control the selection process at its discretion. We present security proofs showing that our protocol is secure against this attack. Additionally, we conduct several experiments on an Ethereum-like blockchain to demonstrate the feasibility and practicality of our solution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: IEEE ICBC 2022

arXiv:2205.05004 [pdf, other]

FastHare: Fast Hamiltonian Reduction for Large-scale Quantum Annealing

Authors: Phuc Thai, My T. Thai, Tam Vu, Thang N. Dinh

Abstract: Quantum annealing (QA) that encodes optimization problems into Hamiltonians remains the only near-term quantum computing paradigm that provides sufficient many qubits for real-world applications. To fit larger optimization instances on existing quantum annealers, reducing Hamiltonians into smaller equivalent Hamiltonians provides a promising approach. Unfortunately, existing reduction techniques a… ▽ More Quantum annealing (QA) that encodes optimization problems into Hamiltonians remains the only near-term quantum computing paradigm that provides sufficient many qubits for real-world applications. To fit larger optimization instances on existing quantum annealers, reducing Hamiltonians into smaller equivalent Hamiltonians provides a promising approach. Unfortunately, existing reduction techniques are either computationally expensive or ineffective in practice. To this end, we introduce a novel notion of non-separable~group, defined as a subset of qubits in a Hamiltonian that obtains the same value in optimal solutions. We develop a theoretical framework on non-separability accordingly and propose FastHare, a highly efficient reduction method. FastHare, iteratively, detects and merges non-separable groups into single qubits. It does so within a provable worst-case time complexity of only $O(αn^2)$, for some user-defined parameter $α$. Our extensive benchmarks for the feasibility of the reduction are done on both synthetic Hamiltonians and 3000+ instances from the MQLIB library. The results show FastHare outperforms the roof duality, the implemented reduction method in D-Wave's SDK library, with 3.6x higher average reduction ratio. It demonstrates a high level of effectiveness with an average of 62% qubits saving and 0.3s processing time, advocating for Hamiltonian reduction as an inexpensive necessity for QA. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2203.01746 [pdf, other]

SaPHyRa: A Learning Theory Approach to Ranking Nodes in Large Networks

Authors: Phuc Thai, My T. Thai, Tam Vu, Thang N. Dinh

Abstract: Ranking nodes based on their centrality stands a fundamental, yet, challenging problem in large-scale networks. Approximate methods can quickly estimate nodes' centrality and identify the most central nodes, but the ranking for the majority of remaining nodes may be meaningless. For example, ranking for less-known websites in search queries is known to be noisy and unstable. To this end, we invest… ▽ More Ranking nodes based on their centrality stands a fundamental, yet, challenging problem in large-scale networks. Approximate methods can quickly estimate nodes' centrality and identify the most central nodes, but the ranking for the majority of remaining nodes may be meaningless. For example, ranking for less-known websites in search queries is known to be noisy and unstable. To this end, we investigate a new node ranking problem with two important distinctions: a) ranking quality, rather than the centrality estimation quality, as the primary objective; and b) ranking only nodes of interest, e.g., websites that matched search criteria. We propose Sample space Partitioning Hypothesis Ranking, or SaPHyRa, that transforms node ranking into a hypothesis ranking in machine learning. This transformation maps nodes' centrality to the expected risks of hypotheses, opening doors for theoretical machine learning (ML) tools. The key of SaPHyRa is to partition the sample space into exact and approximate subspaces. The exact subspace contains samples related to the nodes of interest, increasing both estimation and ranking qualities. The approximate space can be efficiently sampled with ML-based techniques to provide theoretical guarantees on the estimation error. Lastly, we present SaPHyRa_bc, an illustration of SaPHyRa on ranking nodes' betweenness centrality (BC). By combining a novel bi-component sampling, a 2-hop sample partitioning, and improved bounds on the Vapnik-Chervonenkis dimension, SaPHyRa_bc can effectively rank any node subset in BC. Its performance is up to 200x faster than state-of-the-art methods in approximating BC, while its rank correlation to the ground truth is improved by multifold. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: To appear in IEEE ICDE'22

arXiv:2111.11604 [pdf, other]

Simultaneous face detection and 360 degree headpose estimation

Authors: Hoang Nguyen Viet, Linh Nguyen Viet, Tuan Nguyen Dinh, Duc Tran Minh, Long Tran Quoc

Abstract: With many practical applications in human life, including manufacturing surveillance cameras, analyzing and processing customer behavior, many researchers are noticing face detection and head pose estimation on digital images. A large number of proposed deep learning models have state-of-the-art accuracy such as YOLO, SSD, MTCNN, solving the problem of face detection or HopeNet, FSA-Net, RankPose… ▽ More With many practical applications in human life, including manufacturing surveillance cameras, analyzing and processing customer behavior, many researchers are noticing face detection and head pose estimation on digital images. A large number of proposed deep learning models have state-of-the-art accuracy such as YOLO, SSD, MTCNN, solving the problem of face detection or HopeNet, FSA-Net, RankPose model used for head pose estimation problem. According to many state-of-the-art methods, the pipeline of this task consists of two parts, from face detection to head pose estimation. These two steps are completely independent and do not share information. This makes the model clear in setup but does not leverage most of the featured resources extracted in each model. In this paper, we proposed the Multitask-Net model with the motivation to leverage the features extracted from the face detection model, sharing them with the head pose estimation branch to improve accuracy. Also, with the variety of data, the Euler angle domain representing the face is large, our model can predict with results in the 360 Euler angle domain. Applying the multitask learning method, the Multitask-Net model can simultaneously predict the position and direction of the human head. To increase the ability to predict the head direction of the model, we change there presentation of the human face from the Euler angle to vectors of the Rotation matrix. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted at The 13th International Conference on Knowledge and Systems Engineering (KSE 2021), 7 pages, 2 figures, 3 tables

arXiv:2111.07039 [pdf, other]

UET-Headpose: A sensor-based top-view head pose dataset

Authors: Linh Nguyen Viet, Tuan Nguyen Dinh, Hoang Nguyen Viet, Duc Tran Minh, Long Tran Quoc

Abstract: Head pose estimation is a challenging task that aims to solve problems related to predicting three dimensions vector, that serves for many applications in human-robot interaction or customer behavior. Previous researches have proposed some precise methods for collecting head pose data. But those methods require either expensive devices like depth cameras or complex laboratory environment setup. In… ▽ More Head pose estimation is a challenging task that aims to solve problems related to predicting three dimensions vector, that serves for many applications in human-robot interaction or customer behavior. Previous researches have proposed some precise methods for collecting head pose data. But those methods require either expensive devices like depth cameras or complex laboratory environment setup. In this research, we introduce a new approach with efficient cost and easy setup to collecting head pose images, namely UET-Headpose dataset, with top-view head pose data. This method uses an absolute orientation sensor instead of Depth cameras to be set up quickly and small cost but still ensure good results. Through experiments, our dataset has been shown the difference between its distribution and available dataset like CMU Panoptic Dataset \cite{CMU}. Besides using the UET-Headpose dataset and other head pose datasets, we also introduce the full-range model called FSANet-Wide, which significantly outperforms head pose estimation results by the UET-Headpose dataset, especially on top-view images. Also, this model is very lightweight and takes small size images. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2007.08596 [pdf, other]

doi 10.1109/ICDCS.2019.00059

OptChain: Optimal Transactions Placement for Scalable Blockchain Sharding

Authors: Lan N. Nguyen, Truc Nguyen, Thang N. Dinh, My T. Thai

Abstract: A major challenge in blockchain sharding protocols is that more than 95% transactions are cross-shard. Not only those cross-shard transactions degrade the system throughput but also double the confirmation time, and exhaust an already scarce network bandwidth. Are cross-shard transactions imminent for sharding schemes? In this paper, we propose a new sharding paradigm, called OptChain, in which cr… ▽ More A major challenge in blockchain sharding protocols is that more than 95% transactions are cross-shard. Not only those cross-shard transactions degrade the system throughput but also double the confirmation time, and exhaust an already scarce network bandwidth. Are cross-shard transactions imminent for sharding schemes? In this paper, we propose a new sharding paradigm, called OptChain, in which cross-shard transactions are minimized, resulting in almost twice faster confirmation time and throughput. By treating transactions as a stream of nodes in an online graph, OptChain utilizes a lightweight and on-the-fly transaction placement method to group both related and soon-related transactions into the same shards. At the same time, OptChain maintains a temporal balance among shards to guarantee the high parallelism. Our comprehensive and large-scale simulation using Oversim P2P library confirms a significant boost in performance with up to 10 folds reduction in cross-shard transactions, more than twice reduction in confirmation time, and 50% increase in throughput. When combined with Omniledger sharding protocol, OptChain delivers a 6000 transactions per second throughput with 10.5s confirmation time. △ Less

Submitted 18 October, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

arXiv:1907.06247 [pdf]

State Estimation in Visual Inertial Autonomous Helicopter Landing Using Optimisation on Manifold

Authors: Thinh Hoang Dinh, Hieu Le Thi Hong, Tri Ngo Dinh

Abstract: Autonomous helicopter landing is a challenging task that requires precise information about the aircraft states regarding the helicopters position, attitude, as well as position of the helipad. To this end, we propose a solution that fuses data from an Inertial Measurement Unit (IMU) and a monocular camera which is capable of detecting helipads position in the image plane. The algorithm utilises m… ▽ More Autonomous helicopter landing is a challenging task that requires precise information about the aircraft states regarding the helicopters position, attitude, as well as position of the helipad. To this end, we propose a solution that fuses data from an Inertial Measurement Unit (IMU) and a monocular camera which is capable of detecting helipads position in the image plane. The algorithm utilises manifold based nonlinear optimisation over preintegrated IMU measurements and reprojection error in temporally uniformly distributed keyframes, exhibiting good performance in terms of accuracy and being computationally feasible. Our contributions of this paper are the formal address of the landmarks Jacobian expressions and the adaptation of equality constrained Gauss-Newton method to this specific problem. Numerical simulations on MATLAB/Simulink confirm the validity of given claims. △ Less

Submitted 14 July, 2019; originally announced July 2019.

arXiv:1709.03565 [pdf, other]

Importance Sketching of Influence Dynamics in Billion-scale Networks

Authors: Hung T. Nguyen, Tri P. Nguyen, NhatHai Phan, Thang N. Dinh

Abstract: The blooming availability of traces for social, biological, and communication networks opens up unprecedented opportunities in analyzing diffusion processes in networks. However, the sheer sizes of the nowadays networks raise serious challenges in computational efficiency and scalability. In this paper, we propose a new hyper-graph sketching framework for inflence dynamics in networks. The centr… ▽ More The blooming availability of traces for social, biological, and communication networks opens up unprecedented opportunities in analyzing diffusion processes in networks. However, the sheer sizes of the nowadays networks raise serious challenges in computational efficiency and scalability. In this paper, we propose a new hyper-graph sketching framework for inflence dynamics in networks. The central of our sketching framework, called SKIS, is an efficient importance sampling algorithm that returns only non-singular reverse cascades in the network. Comparing to previously developed sketches like RIS and SKIM, our sketch significantly enhances estimation quality while substantially reducing processing time and memory-footprint. Further, we present general strategies of using SKIS to enhance existing algorithms for influence estimation and influence maximization which are motivated by practical applications like viral marketing. Using SKIS, we design high-quality influence oracle for seed sets with average estimation error up to 10x times smaller than those using RIS and 6x times smaller than SKIM. In addition, our influence maximization using SKIS substantially improves the quality of solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x memory reduction for the fastest RIS-based DSSA algorithm, while maintaining the same theoretical guarantees. △ Less

Submitted 11 September, 2017; originally announced September 2017.

Comments: 12 pages, to appear in ICDM 2017 as a regular paper

arXiv:1704.04794 [pdf, other]

Outward Influence and Cascade Size Estimation in Billion-scale Networks

Authors: Hung T. Nguyen, Tri P. Nguyen, Tam Vu, Thang N. Dinh

Abstract: Estimating cascade size and nodes' influence is a fundamental task in social, technological, and biological networks. Yet this task is extremely challenging due to the sheer size and the structural heterogeneity of networks. We investigate a new influence measure, termed outward influence (OI), defined as the (expected) number of nodes that a subset of nodes $S$ will activate, excluding the nodes… ▽ More Estimating cascade size and nodes' influence is a fundamental task in social, technological, and biological networks. Yet this task is extremely challenging due to the sheer size and the structural heterogeneity of networks. We investigate a new influence measure, termed outward influence (OI), defined as the (expected) number of nodes that a subset of nodes $S$ will activate, excluding the nodes in S. Thus, OI equals, the de facto standard measure, influence spread of S minus |S|. OI is not only more informative for nodes with small influence, but also, critical in designing new effective sampling and statistical estimation methods. Based on OI, we propose SIEA/SOIEA, novel methods to estimate influence spread/outward influence at scale and with rigorous theoretical guarantees. The proposed methods are built on two novel components 1) IICP an important sampling method for outward influence, and 2) RSA, a robust mean estimation method that minimize the number of samples through analyzing variance and range of random variables. Compared to the state-of-the art for influence estimation, SIEA is $Ω(\log^4 n)$ times faster in theory and up to several orders of magnitude faster in practice. For the first time, influence of nodes in the networks of billions of edges can be estimated with high accuracy within a few minutes. Our comprehensive experiments on real-world networks also give evidence against the popular practice of using a fixed number, e.g. 10K or 20K, of samples to compute the "ground truth" for influence spread. △ Less

Submitted 16 April, 2017; originally announced April 2017.

Comments: 16 pages, SIGMETRICS 2017

arXiv:1702.05854 [pdf, other]

Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach

Authors: Hung T. Nguyen, Alberto Cano, Tam Vu, Thang N. Dinh

Abstract: Cyber-epidemics, the widespread of fake news or propaganda through social media, can cause devastating economic and political consequences. A common countermeasure against cyber-epidemics is to disable a small subset of suspected social connections or accounts to effectively contain the epidemics. An example is the recent shutdown of 125,000 ISIS-related Twitter accounts. Despite many proposed met… ▽ More Cyber-epidemics, the widespread of fake news or propaganda through social media, can cause devastating economic and political consequences. A common countermeasure against cyber-epidemics is to disable a small subset of suspected social connections or accounts to effectively contain the epidemics. An example is the recent shutdown of 125,000 ISIS-related Twitter accounts. Despite many proposed methods to identify such subset, none are scalable enough to provide high-quality solutions in nowadays billion-size networks. To this end, we investigate the Spread Interdiction problems that seek most effective links (or nodes) for removal under the well-known Linear Threshold model. We propose novel CPU-GPU methods that scale to networks with billions of edges, yet, possess rigorous theoretical guarantee on the solution quality. At the core of our methods is an $O(1)$-space out-of-core algorithm to generate a new type of random walks, called Hitting Self-avoiding Walks (HSAWs). Such a low memory requirement enables handling of big networks and, more importantly, hiding latency via scheduling of millions of threads on GPUs. Comprehensive experiments on real-world networks show that our algorithms provides much higher quality solutions and are several order of magnitude faster than the state-of-the art. Comparing to the (single-core) CPU counterpart, our GPU implementations achieve significant speedup factors up to 177x on a single GPU and 338x on a GPU pair. △ Less

Submitted 21 February, 2017; v1 submitted 19 February, 2017; originally announced February 2017.

arXiv:1702.01452 [pdf, other]

Towards Optimal Strategy for Adaptive Probing in Incomplete Networks

Authors: Tri P. Nguyen, Hung T. Nguyen, Thang N. Dinh

Abstract: We investigate a graph probing problem in which an agent has only an incomplete view $G' \subsetneq G$ of the network and wishes to explore the network with least effort. In each step, the agent selects a node $u$ in $G'$ to probe. After probing $u$, the agent gains the information about $u$ and its neighbors. All the neighbors of $u$ become \emph{observed} and are \emph{probable} in the subsequen… ▽ More We investigate a graph probing problem in which an agent has only an incomplete view $G' \subsetneq G$ of the network and wishes to explore the network with least effort. In each step, the agent selects a node $u$ in $G'$ to probe. After probing $u$, the agent gains the information about $u$ and its neighbors. All the neighbors of $u$ become \emph{observed} and are \emph{probable} in the subsequent steps (if they have not been probed). What is the best probing strategy to maximize the number of nodes explored in $k$ probes? This problem serves as a fundamental component for other decision-making problems in incomplete networks such as information harvesting in social networks, network crawling, network security, and viral marketing with incomplete information. While there are a few methods proposed for the problem, none can perform consistently well across different network types. In this paper, we establish a strong (in)approximability for the problem, proving that no algorithm can guarantees finite approximation ratio unless P=NP. On the bright side, we design learning frameworks to capture the best probing strategies for individual network. Our extensive experiments suggest that our framework can learn efficient probing strategies that \emph{consistently} outperform previous heuristics and metric-based approaches. △ Less

Submitted 5 February, 2017; originally announced February 2017.

arXiv:1702.01451 [pdf, other]

Transitivity Demolition and the Falls of Social Networks

Authors: Hung T. Nguyen, Nam P. Nguyen, Tam Vu, Huan X. Hoang, Thang N. Dinh

Abstract: In this paper, we study crucial elements of a complex network, namely its nodes and connections, which play a key role in maintaining the network's structure and function under unexpected structural perturbations of nodes and edges removal. Specifically, we want to identify vital nodes and edges whose failure (either random or intentional) will break the most number of connected triples (or triang… ▽ More In this paper, we study crucial elements of a complex network, namely its nodes and connections, which play a key role in maintaining the network's structure and function under unexpected structural perturbations of nodes and edges removal. Specifically, we want to identify vital nodes and edges whose failure (either random or intentional) will break the most number of connected triples (or triangles) in the network. This problem is extremely important because connected triples form the foundation of strong connections in many real-world systems, such as mutual relationships in social networks, reliable data transmission in communication networks, and stable routing strategies in mobile networks. Disconnected triples, analog to broken mutual connections, can greatly affect the network's structure and disrupt its normal function, which can further lead to the corruption of the entire system. The analysis of such crucial elements will shed light on key factors behind the resilience and robustness of many complex systems in practice. We formulate the analysis under multiple optimization problems and show their intractability. We next propose efficient approximation algorithms, namely DAK-n and DAK-e, which guarantee an $(1-1/e)$-approximate ratio (compared to the overall optimal solutions) while having the same time complexity as the best triangle counting and listing algorithm on power-law networks. This advantage makes our algorithms scale extremely well even for very large networks. In an application perspective, we perform comprehensive experiments on real social traces with millions of nodes and billions of edges. These empirical experiments indicate that our approaches achieve comparably better results while are up to 100x faster than current state-of-the-art methods. △ Less

Submitted 5 February, 2017; originally announced February 2017.

arXiv:1701.08787 [pdf, ps, other]

Vulnerability of Clustering under Node Failure in Complex Networks

Authors: Alan Kuhnle, Nam P. Nguyen, Thang N. Dinh, My T. Thai

Abstract: Robustness in response to unexpected events is always desirable for real-world networks. To improve the robustness of any networked system, it is important to analyze vulnerability to external perturbation such as random failures or adversarial attacks occurring to elements of the network. In this paper, we study an emerging problem in assessing the robustness of complex networks: the vulnerabilit… ▽ More Robustness in response to unexpected events is always desirable for real-world networks. To improve the robustness of any networked system, it is important to analyze vulnerability to external perturbation such as random failures or adversarial attacks occurring to elements of the network. In this paper, we study an emerging problem in assessing the robustness of complex networks: the vulnerability of the clustering of the network to the failure of network elements. Specifically, we identify vertices whose failures will critically damage the network by degrading its clustering, evaluated through the average clustering coefficient. This problem is important because any significant change made to the clustering, resulting from element-wise failures, could degrade network performance such as the ability for information to propagate in a social network. We formulate this vulnerability analysis as an optimization problem, prove its NP-completeness and non-monotonicity, and we offer two algorithms to identify the vertices most important to clustering. Finally, we conduct comprehensive experiments in synthesized social networks generated by various well-known models as well as traces of real social networks. The empirical results over other competitive strategies show the efficacy of our proposed algorithms. △ Less

Submitted 30 January, 2017; originally announced January 2017.

arXiv:1701.08462 [pdf, other]

TipTop: (Almost) Exact Solutions for Influence Maximization in Billion-scale Networks

Authors: Xiang Li, J. David Smith, Thang N. Dinh, My T. Thai

Abstract: In this paper, we study the Cost-aware Target Viral Marketing (CTVM) problem, a generalization of Influence Maximization (IM). CTVM asks for the most cost-effective users to influence the most relevant users. In contrast to the vast literature, we attempt to offer exact solutions. As the problem is NP-hard, thus, exact solutions are intractable, we propose TipTop, a $(1-ε)$-optimal solution for ar… ▽ More In this paper, we study the Cost-aware Target Viral Marketing (CTVM) problem, a generalization of Influence Maximization (IM). CTVM asks for the most cost-effective users to influence the most relevant users. In contrast to the vast literature, we attempt to offer exact solutions. As the problem is NP-hard, thus, exact solutions are intractable, we propose TipTop, a $(1-ε)$-optimal solution for arbitrary $ε>0$ that scales to very large networks such as Twitter. At the heart of TipTop lies an innovative technique that reduces the number of samples as much as possible. This allows us to exactly solve CTVM on a much smaller space of generated samples using Integer Programming. Furthermore, TipTop lends a tool for researchers to benchmark their solutions against the optimal one in large-scale networks, which is currently not available. △ Less

Submitted 7 February, 2019; v1 submitted 29 January, 2017; originally announced January 2017.

Comments: extended version, v2

ACM Class: G.2.2; G.1.6

arXiv:1608.06492 [pdf, other]

Multiple Infection Sources Identification with Provable Guarantees

Authors: Hung T. Nguyen, Preetam Ghosh, Michael L. Mayo, Thang N. Dinh

Abstract: Given an aftermath of a cascade in the network, i.e. a set $V_I$ of "infected" nodes after an epidemic outbreak or a propagation of rumors/worms/viruses, how can we infer the sources of the cascade? Answering this challenging question is critical for computer forensic, vulnerability analysis, and risk management. Despite recent interest towards this problem, most of existing works focus only on si… ▽ More Given an aftermath of a cascade in the network, i.e. a set $V_I$ of "infected" nodes after an epidemic outbreak or a propagation of rumors/worms/viruses, how can we infer the sources of the cascade? Answering this challenging question is critical for computer forensic, vulnerability analysis, and risk management. Despite recent interest towards this problem, most of existing works focus only on single source detection or simple network topologies, e.g. trees or grids. In this paper, we propose a new approach to identify infection sources by searching for a seed set $S$ that minimizes the \emph{symmetric difference} between the cascade from $S$ and $V_I$, the given set of infected nodes. Our major result is an approximation algorithm, called SISI, to identify infection sources \emph{without the prior knowledge on the number of source nodes}. SISI, to our best knowledge, is the first algorithm with \emph{provable guarantee} for the problem in general graphs. It returns a $\frac{2}{(1-ε)^2}Δ$-approximate solution with high probability, where $Δ$ denotes the maximum number of nodes in $V_I$ that may infect a single node in the network. Our experiments on real-world networks show the superiority of our approach and SISI in detecting true source(s), boosting the F1-measure from few percents, for the state-of-the-art NETSLEUTH, to approximately 50\%. △ Less

Submitted 23 August, 2016; originally announced August 2016.

Comments: in The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)

arXiv:1605.07990 [pdf, other]

Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Authors: Hung T. Nguyen, My T. Thai, Thang N. Dinh

Abstract: Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisf… ▽ More Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisfactorily solved. Even the state-of-the-art methods such as TIM+ and IMM may take days on those networks. In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. SSA and D-SSA are up to 1200 times faster than the SIGMOD'15 best method, IMM, while providing the same $(1-1/e-ε)$ approximation guarantee. Underlying our frameworks is an innovative Stop-and-Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that SSA and D-SSA are the first approximation algorithms that use (asymptotically) minimum numbers of samples, meeting strict theoretical thresholds characterized for IM. The absolute superiority of SSA and D-SSA are confirmed through extensive experiments on real network data for IM and another topic-aware viral marketing problem, named TVM. The source code is available at https://github.com/hungnt55/Stop-and-Stare △ Less

Submitted 22 February, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

Comments: Correct the errors in the proofs for SSA/D-SSA. Update D-SSA to estimate ε(s) instead of δ(s)

arXiv:1602.01016 [pdf, other]

doi 10.1109/ICDM.2015.139

Network Clustering via Maximizing Modularity: Approximation Algorithms and Theoretical Limits

Authors: Thang N. Dinh, Xiang Li, My T. Thai

Abstract: Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the max… ▽ More Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the maximum modularity as well as the implications of finding community structure with provable guarantees. In this paper, we settle definitely the approximability of modularity clustering, proving that approximating the problem within any (multiplicative) positive factor is intractable, unless P = NP. Yet we propose the first additive approximation algorithm for modularity clustering with a constant factor. Moreover, we provide a rigorous proof that a CS with modularity arbitrary close to maximum modularity QOPT might bear no similarity to the optimal CS of maximum modularity. Thus even when CS with near-optimal modularity are found, other verification methods are needed to confirm the significance of the structure. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: Appeared in IEEE ICDM 2015

arXiv:1108.4034 [pdf, other]

Finding Community Structure with Performance Guarantees in Complex Networks

Authors: Thang N. Dinh, My T. Thai

Abstract: Many networks including social networks, computer networks, and biological networks are found to divide naturally into communities of densely connected individuals. Finding community structure is one of fundamental problems in network science. Since Newman's suggestion of using \emph{modularity} as a measure to qualify the goodness of community structures, many efficient methods to maximize modula… ▽ More Many networks including social networks, computer networks, and biological networks are found to divide naturally into communities of densely connected individuals. Finding community structure is one of fundamental problems in network science. Since Newman's suggestion of using \emph{modularity} as a measure to qualify the goodness of community structures, many efficient methods to maximize modularity have been proposed but without a guarantee of optimality. In this paper, we propose two polynomial-time algorithms to the modularity maximization problem with theoretical performance guarantees. The first algorithm comes with a \emph{priori guarantee} that the modularity of found community structure is within a constant factor of the optimal modularity when the network has the power-law degree distribution. Despite being mainly of theoretical interest, to our best knowledge, this is the first approximation algorithm for finding community structure in networks. In our second algorithm, we propose a \emph{sparse metric}, a substantially faster linear programming method for maximizing modularity and apply a rounding technique based on this sparse metric with a \emph{posteriori approximation guarantee}. Our experiments show that the rounding algorithm returns the optimal solutions in most cases and are very scalable, that is, it can run on a network of a few thousand nodes whereas the LP solution in the literature only ran on a network of at most 235 nodes. △ Less

Submitted 19 August, 2011; originally announced August 2011.

Showing 1–21 of 21 results for author: Dinh, T N