Search | arXiv e-print repository

TopoGCL: Topological Graph Contrastive Learning

Authors: Yuzhou Chen, Jose Frias, Yulia R. Gel

Abstract: Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address t… ▽ More Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address this limitation by introducing the concepts of topological invariance and extended persistence on graphs to GCL. In particular, we propose a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions. Along with the extended topological layer, we introduce a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees. Our extensive numerical results on biological, chemical, and social interaction graphs show that the new Topological Graph Contrastive Learning (TopoGCL) model delivers significant performance gains in unsupervised graph classification for 11 out of 12 considered datasets and also exhibits robustness under noisy scenarios. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2401.13713 [pdf, other]

EMP: Effective Multidimensional Persistence for Graph Representation Learning

Authors: Ignacio Segovia-Dominguez, Yuzhou Chen, Cuneyt G. Akcora, Zhiwei Zhen, Murat Kantarcioglu, Yulia R. Gel, Baris Coskunuzer

Abstract: Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing da… ▽ More Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.13157

Journal ref: LoG 2023

arXiv:2401.13157 [pdf, other]

Time-Aware Knowledge Representations of Dynamic Objects with Multidimensional Persistence

Authors: Baris Coskunuzer, Ignacio Segovia-Dominguez, Yuzhou Chen, Yulia R. Gel

Abstract: Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of ti… ▽ More Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of time dimension in knowledge encoding mechanisms for time-dependent data leads to frequent model updates, poor learning performance, and, as a result, subpar decision-making. Here we propose a new approach to a time-aware knowledge representation mechanism that notably focuses on implicit time-dependent topological information along multiple geometric dimensions. In particular, we propose a new approach, named \textit{Temporal MultiPersistence} (TMP), which produces multidimensional topological fingerprints of the data by using the existing single parameter topological summaries. The main idea behind TMP is to merge the two newest directions in topological representation learning, that is, multi-persistence which simultaneously describes data shape evolution along multiple key parameters, and zigzag persistence to enable us to extract the most salient data shape information over time. We derive theoretical guarantees of TMP vectorizations and show its utility, in application to forecasting on benchmark traffic flow, Ethereum blockchain, and electrocardiogram datasets, demonstrating the competitive performance, especially, in scenarios of limited data records. In addition, our TMP method improves the computational efficiency of the state-of-the-art multipersistence summaries up to 59.5 times. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Journal ref: AAAI 2024

arXiv:2303.14543 [pdf, other]

Topological Pooling on Graphs

Authors: Yuzhou Chen, Yulia R. Gel

Abstract: Graph neural networks (GNNs) have demonstrated a significant success in various graph learning tasks, from graph classification to anomaly detection. There recently has emerged a number of approaches adopting a graph pooling operation within GNNs, with a goal to preserve graph attributive and structural features during the graph representation learning. However, most existing graph pooling operati… ▽ More Graph neural networks (GNNs) have demonstrated a significant success in various graph learning tasks, from graph classification to anomaly detection. There recently has emerged a number of approaches adopting a graph pooling operation within GNNs, with a goal to preserve graph attributive and structural features during the graph representation learning. However, most existing graph pooling operations suffer from the limitations of relying on node-wise neighbor weighting and embedding, which leads to insufficient encoding of rich topological structures and node attributes exhibited by real-world networks. By invoking the machinery of persistent homology and the concept of landmarks, we propose a novel topological pooling layer and witness complex-based topological embedding mechanism that allow us to systematically integrate hidden topological information at both local and global levels. Specifically, we design new learnable local and global topological representations Wit-TopoPool which allow us to simultaneously extract rich discriminative topological information from graphs. Experiments on 11 diverse benchmark datasets against 18 baseline models in conjunction with graph classification tasks indicate that Wit-TopoPool significantly outperforms all competitors across all datasets. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: AAAI 2023

arXiv:2303.11464 [pdf, other]

Seven open problems in applied combinatorics

Authors: Sinan G. Aksoy, Ryan Bennink, Yuzhou Chen, José Frías, Yulia R. Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V. Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, Stephen J. Young

Abstract: We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems. We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 43 pages, 5 figures

MSC Class: 05C90; 65Y04; 65D25; 05C65; 81P68; 62R40; 55N31; 65F10

arXiv:2303.08933 [pdf, other]

Efficient Planning of Multi-Robot Collective Transport using Graph Reinforcement Learning with Higher Order Topological Abstraction

Authors: Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury

Abstract: Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT -- here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload… ▽ More Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT -- here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training. △ Less

Submitted 17 August, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: This paper has been accepted to be presented at the IEEE International Conference on Robotics and Automation, 2023

arXiv:2211.13708 [pdf, other]

Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT

Authors: Cuneyt Gurcan Akcora, Murat Kantarcioglu, Yulia R. Gel, Baris Coskunuzer

Abstract: Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blo… ▽ More Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: Spotlight paper at NeurIPS 2022

MSC Class: 68T09; 55N31; 62R40 ACM Class: F.2.2

arXiv:2211.09967 [pdf, other]

Learning on Health Fairness and Environmental Justice via Interactive Visualization

Authors: Abdullah-Al-Raihan Nayeem, Ignacio Segovia-Dominguez, Huikyo Lee, Dongyun Han, Yuzhou Chen, Zhiwei Zhen, Yulia Gel, Isaac Cho

Abstract: This paper introduces an interactive visualization interface with a machine learning consensus analysis that enables the researchers to explore the impact of atmospheric and socioeconomic factors on COVID-19 clinical severity by employing multiple Recurrent Graph Neural Networks. We designed and implemented a visualization interface that leverages coordinated multi-views to support exploratory and… ▽ More This paper introduces an interactive visualization interface with a machine learning consensus analysis that enables the researchers to explore the impact of atmospheric and socioeconomic factors on COVID-19 clinical severity by employing multiple Recurrent Graph Neural Networks. We designed and implemented a visualization interface that leverages coordinated multi-views to support exploratory and predictive analysis of hospitalizations and other socio-geographic variables at multiple dimensions, simultaneously. By harnessing the strength of geometric deep learning, we build a consensus machine learning model to include knowledge from county-level records and investigate the complex interrelationships between global infectious disease, environment, and social justice. Additionally, we make use of unique NASA satellite-based observations which are not broadly used in the context of climate justice applications. Our current interactive interface focus on three US states (California, Pennsylvania, and Texas) to demonstrate its scientific value and presented three case studies to make qualitative evaluations. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.07645 [pdf, other]

Evaluating Distribution System Reliability with Hyperstructures Graph Convolutional Nets

Authors: Yuzhou Chen, Tian Jiang, Miguel Heleno, Alexandre Moreira, Yulia R. Gel

Abstract: Nowadays, it is broadly recognized in the power system community that to meet the ever expanding energy sector's needs, it is no longer possible to rely solely on physics-based models and that reliable, timely and sustainable operation of energy systems is impossible without systematic integration of artificial intelligence (AI) tools. Nevertheless, the adoption of AI in power systems is still lim… ▽ More Nowadays, it is broadly recognized in the power system community that to meet the ever expanding energy sector's needs, it is no longer possible to rely solely on physics-based models and that reliable, timely and sustainable operation of energy systems is impossible without systematic integration of artificial intelligence (AI) tools. Nevertheless, the adoption of AI in power systems is still limited, while integration of AI particularly into distribution grid investment planning is still an uncharted territory. We make the first step forward to bridge this gap by showing how graph convolutional networks coupled with the hyperstructures representation learning framework can be employed for accurate, reliable, and computationally efficient distribution grid planning with resilience objectives. We further propose a Hyperstructures Graph Convolutional Neural Networks (Hyper-GCNNs) to capture hidden higher order representations of distribution networks with attention mechanism. Our numerical experiments show that the proposed Hyper-GCNNs approach yields substantial gains in computational efficiency compared to the prevailing methodology in distribution grid planning and also noticeably outperforms seven state-of-the-art models from deep learning (DL) community. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: IEEE BigData 2022

arXiv:2211.03808 [pdf, other]

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

Authors: Andac Demir, Baris Coskunuzer, Ignacio Segovia-Dominguez, Yuzhou Chen, Yulia Gel, Bulent Kiziltan

Abstract: In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively m… ▽ More In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset). △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: NeurIPS, 2022 (36th Conference on Neural Information Processing Systems)

arXiv:2112.06826 [pdf, other]

BScNets: Block Simplicial Complex Neural Networks

Authors: Yuzhou Chen, Yulia R. Gel, H. Vincent Poor

Abstract: Simplicial neural networks (SNN) have recently emerged as the newest direction in graph learning which expands the idea of convolutional architectures from node space to simplicial complexes on graphs. Instead of pre-dominantly assessing pairwise relations among nodes as in the current practice, simplicial complexes allow us to describe higher-order interactions and multi-node graph structures. By… ▽ More Simplicial neural networks (SNN) have recently emerged as the newest direction in graph learning which expands the idea of convolutional architectures from node space to simplicial complexes on graphs. Instead of pre-dominantly assessing pairwise relations among nodes as in the current practice, simplicial complexes allow us to describe higher-order interactions and multi-node graph structures. By building upon connection between the convolution operation and the new block Hodge-Laplacian, we propose the first SNN for link prediction. Our new Block Simplicial Complex Neural Networks (BScNets) model generalizes the existing graph convolutional network (GCN) frameworks by systematically incorporating salient interactions among multiple higher-order graph structures of different dimensions. We discuss theoretical foundations behind BScNets and illustrate its utility for link prediction on eight real-world and synthetic datasets. Our experiments indicate that BScNets outperforms the state-of-the-art models by a significant margin while maintaining low computation costs. Finally, we show utility of BScNets as the new promising alternative for tracking spread of infectious diseases such as COVID-19 and measuring the effectiveness of the healthcare risk mitigation strategies. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2110.15529 [pdf, other]

Topological Relational Learning on Graphs

Authors: Yuzhou Chen, Baris Coskunuzer, Yulia R. Gel

Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GN… ▽ More Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classification tasks demonstrate that the new TRI-GNN outperforms all 14 state-of-the-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10\% better performance under noisy scenarios. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.10849 [pdf, other]

Using NASA Satellite Data Sources and Geometric Deep Learning to Uncover Hidden Patterns in COVID-19 Clinical Severity

Authors: Ignacio Segovia-Dominguez, Huikyo Lee, Zhiwei Zhen, Yuzhou Chen, Michael Garay, Daniel Crichton, Rishabh Wagh, Yulia R. Gel

Abstract: As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning -- from water and food security to energy supply to healthcare -- more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy acc… ▽ More As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning -- from water and food security to energy supply to healthcare -- more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy access to use data. Here we present a unique not yet broadly available NASA's satellite dataset on aerosol optical depth (AOD), temperature and relative humidity and discuss the utility of these new data for COVID-19 biosurveillance. In particular, using the geometric deep learning models for semi-supervised classification on a county-level basis over the contiguous United States, we investigate the pressing societal question whether atmospheric variables have considerable impact on COVID-19 clinical severity. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: Main Paper and Appendix

arXiv:2106.01806 [pdf, other]

Topological Anomaly Detection in Dynamic Multilayer Blockchain Networks

Authors: Dorcas Ofori-Boateng, Ignacio Segovia Dominguez, Murat Kantarcioglu, Cuneyt G. Akcora, Yulia R. Gel

Abstract: Motivated by the recent surge of criminal activities with cross-cryptocurrency trades, we introduce a new topological perspective to structural anomaly detection in dynamic multilayer networks. We postulate that anomalies in the underlying blockchain transaction graph that are composed of multiple layers are likely to also be manifested in anomalous patterns of the network shape properties. As suc… ▽ More Motivated by the recent surge of criminal activities with cross-cryptocurrency trades, we introduce a new topological perspective to structural anomaly detection in dynamic multilayer networks. We postulate that anomalies in the underlying blockchain transaction graph that are composed of multiple layers are likely to also be manifested in anomalous patterns of the network shape properties. As such, we invoke the machinery of clique persistent homology on graphs to systematically and efficiently track evolution of the network shape and, as a result, to detect changes in the underlying network topology and geometry. We develop a new persistence summary for multilayer networks, called stacked persistence diagram, and prove its stability under input data perturbations. We validate our new topological anomaly detection framework in application to dynamic multilayer networks from the Ethereum Blockchain and the Ripple Credit Network, and demonstrate that our stacked PD approach substantially outperforms state-of-art techniques. △ Less

Submitted 6 July, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: 26 pages, 6 figures, 7 tables

arXiv:2105.04100 [pdf, other]

Z-GCNETs: Time Zigzags at Graph Convolutional Networks for Time Series Forecasting

Authors: Yuzhou Chen, Ignacio Segovia-Dominguez, Yulia R. Gel

Abstract: There recently has been a surge of interest in develo** a new class of deep learning (DL) architectures that integrate an explicit time dimension as a fundamental building block of learning and representation mechanisms. In turn, many recent results show that topological descriptors of the observed data, encoding information on the shape of the dataset in a topological space at different scales,… ▽ More There recently has been a surge of interest in develo** a new class of deep learning (DL) architectures that integrate an explicit time dimension as a fundamental building block of learning and representation mechanisms. In turn, many recent results show that topological descriptors of the observed data, encoding information on the shape of the dataset in a topological space at different scales, that is, persistent homology of the data, may contain important complementary information, improving both performance and robustness of DL. As convergence of these two emerging ideas, we propose to enhance DL architectures with the most salient time-conditioned topological information of the data and introduce the concept of zigzag persistence into time-aware graph convolutional networks (GCNs). Zigzag persistence provides a systematic and mathematically rigorous framework to track the most important topological features of the observed data that tend to manifest themselves over time. To integrate the extracted time-conditioned topological descriptors into DL, we develop a new topological summary, zigzag persistence image, and derive its theoretical stability guarantees. We validate the new GCNs with a time-aware zigzag topological layer (Z-GCNETs), in application to traffic forecasting and Ethereum blockchain price prediction. Our results indicate that Z-GCNET outperforms 13 state-of-the-art methods on 4 time series datasets. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: Accepted at the International Conference on Machine Learning (ICML) 2021

arXiv:2104.04787 [pdf, other]

Smart Vectorizations for Single and Multiparameter Persistence

Authors: Baris Coskunuzer, CUneyt Gurcan Akcora, Ignacio Segovia Dominguez, Zhiwei Zhen, Murat Kantarcioglu, Yulia R. Gel

Abstract: The machinery of topological data analysis becomes increasingly popular in a broad range of machine learning tasks, ranging from anomaly detection and manifold learning to graph classification. Persistent homology is one of the key approaches here, allowing us to systematically assess the evolution of various hidden patterns in the data as we vary a scale parameter. The extracted patterns, or homo… ▽ More The machinery of topological data analysis becomes increasingly popular in a broad range of machine learning tasks, ranging from anomaly detection and manifold learning to graph classification. Persistent homology is one of the key approaches here, allowing us to systematically assess the evolution of various hidden patterns in the data as we vary a scale parameter. The extracted patterns, or homological features, along with information on how long such features persist throughout the considered filtration of a scale parameter, convey a critical insight into salient data characteristics and data organization. In this work, we introduce two new and easily interpretable topological summaries for single and multi-parameter persistence, namely, saw functions and multi-persistence grid functions, respectively. Compared to the existing topological summaries which tend to assess the numbers of topological features and/or their lifespans at a given filtration step, our proposed saw and multi-persistence grid functions allow us to explicitly account for essential complementary information such as the numbers of births and deaths at each filtration step. These new topological summaries can be regarded as the complexity measures of the evolving subspaces determined by the filtration and are of particular utility for applications of persistent homology on graphs. We derive theoretical guarantees on the stability of the new saw and multi-persistence grid functions and illustrate their applicability for graph classification tasks. △ Less

Submitted 10 April, 2021; originally announced April 2021.

Comments: 27 pages, 7 figures 5 tables

arXiv:2103.08712 [pdf, other]

Blockchain Networks: Data Structures of Bitcoin, Monero, Zcash, Ethereum, Ripple and Iota

Authors: Cuneyt Gurcan Akcora, Murat Kantarcioglu, Yulia R. Gel

Abstract: Blockchain is an emerging technology that has enabled many applications, from cryptocurrencies to digital asset management and supply chains. Due to this surge of popularity, analyzing the data stored on blockchains poses a new critical challenge in data science. To assist data scientists in various analytic tasks on a blockchain, in this tutorial, we provide a systematic and comprehensive overv… ▽ More Blockchain is an emerging technology that has enabled many applications, from cryptocurrencies to digital asset management and supply chains. Due to this surge of popularity, analyzing the data stored on blockchains poses a new critical challenge in data science. To assist data scientists in various analytic tasks on a blockchain, in this tutorial, we provide a systematic and comprehensive overview of the fundamental elements of blockchain network models. We discuss how we can abstract blockchain data as various types of networks and further use such associated network abstractions to reap important insights on blockchains' structure, organization, and functionality. △ Less

Submitted 29 September, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: 27 figures, 8 tables, 42 pages

arXiv:2010.15082 [pdf, other]

How to Not Get Caught When You Launder Money on Blockchain?

Authors: Cuneyt G. Akcora, Sudhanva Purusotham, Yulia R. Gel, Mitchell Krawiec-Thayer, Murat Kantarcioglu

Abstract: The number of blockchain users has tremendously grown in recent years. As an unintended consequence, e-crime transactions on blockchains has been on the rise. Consequently, public blockchains have become a hotbed of research for develo** AI tools to detect and trace users and transactions that are related to e-crime. We argue that following a few select strategies can make money laundering on… ▽ More The number of blockchain users has tremendously grown in recent years. As an unintended consequence, e-crime transactions on blockchains has been on the rise. Consequently, public blockchains have become a hotbed of research for develo** AI tools to detect and trace users and transactions that are related to e-crime. We argue that following a few select strategies can make money laundering on blockchain virtually undetectable with most of the existing tools and algorithms. As a result, the effective combating of e-crime activities involving cryptocurrencies requires the development of novel analytic methodology in AI. △ Less

Submitted 21 September, 2020; originally announced October 2020.

arXiv:2009.02365 [pdf, other]

LFGCN: Levitating over Graphs with Levy Flights

Authors: Yuzhou Chen, Yulia R. Gel, Konstantin Avrachenkov

Abstract: Due to high utility in many applications, from social networks to blockchain to power grids, deep learning on non-Euclidean objects such as graphs and manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever increasing interest. We propose a new Lévy Flights Graph Convolutional Networks (LFGCN) method for semi-supervised learning, which casts the Lévy Flights into random walks on… ▽ More Due to high utility in many applications, from social networks to blockchain to power grids, deep learning on non-Euclidean objects such as graphs and manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever increasing interest. We propose a new Lévy Flights Graph Convolutional Networks (LFGCN) method for semi-supervised learning, which casts the Lévy Flights into random walks on graphs and, as a result, allows both to accurately account for the intrinsic graph topology and to substantially improve classification performance, especially for heterogeneous graphs. Furthermore, we propose a new preferential P-DropEdge method based on the Girvan-Newman argument. That is, in contrast to uniform removing of edges as in DropEdge, following the Girvan-Newman algorithm, we detect network periphery structures using information on edge betweenness and then remove edges according to their betweenness centrality. Our experimental results on semi-supervised node classification tasks demonstrate that the LFGCN coupled with P-DropEdge accelerates the training task, increases stability and further improves predictive accuracy of learned graph topology structure. Finally, in our case studies we bring the machinery of LFGCN and other deep networks tools to analysis of power grid networks - the area where the utility of GDL remains untapped. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: To Appear in the 2020 IEEE International Conference on Data Mining (ICDM)

arXiv:2007.03767 [pdf, other]

Defending against Backdoors in Federated Learning with Robust Learning Rate

Authors: Mustafa Safa Ozdayi, Murat Kantarcioglu, Yulia R. Gel

Abstract: Federated learning (FL) allows a set of agents to collaboratively train a model without sharing their potentially sensitive data. This makes FL suitable for privacy-preserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to e… ▽ More Federated learning (FL) allows a set of agents to collaboratively train a model without sharing their potentially sensitive data. This makes FL suitable for privacy-preserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to embed a backdoor functionality to the model during training that can later be activated to cause a desired misclassification. To prevent backdoor attacks, we propose a lightweight defense that requires minimal change to the FL protocol. At a high level, our defense is based on carefully adjusting the aggregation server's learning rate, per dimension and per round, based on the sign information of agents' updates. We first conjecture the necessary steps to carry a successful backdoor attack in FL setting, and then, explicitly formulate the defense based on our conjecture. Through experiments, we provide empirical evidence that supports our conjecture, and we test our defense against backdoor attacks under different settings. We observe that either backdoor is completely eliminated, or its accuracy is significantly reduced. Overall, our experiments suggest that our defense significantly outperforms some of the recently proposed defenses in the literature. We achieve this by having minimal influence over the accuracy of the trained models. In addition, we also provide convergence rate analysis for our proposed scheme. △ Less

Submitted 29 July, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Published at AAAI 2021

arXiv:1912.10105 [pdf, other]

Dissecting Ethereum Blockchain Analytics: What We Learn from Topology and Geometry of Ethereum Graph

Authors: Yitao Li, Umar Islambekov, Cuneyt Akcora, Ekaterina Smirnova, Yulia R. Gel, Murat Kantarcioglu

Abstract: Blockchain technology and, in particular, blockchain-based cryptocurrencies offer us information that has never been seen before in the financial world. In contrast to fiat currencies, all transactions of crypto-currencies and crypto-tokens are permanently recorded on distributed ledgers and are publicly available. As a result, this allows us to construct a transaction graph and to assess not only… ▽ More Blockchain technology and, in particular, blockchain-based cryptocurrencies offer us information that has never been seen before in the financial world. In contrast to fiat currencies, all transactions of crypto-currencies and crypto-tokens are permanently recorded on distributed ledgers and are publicly available. As a result, this allows us to construct a transaction graph and to assess not only its organization but to glean relationships between transaction graph properties and crypto price dynamics. The ultimate goal of this paper is to facilitate our understanding on horizons and limitations of what can be learned on crypto-tokens from local topology and geometry of the Ethereum transaction network whose even global network properties remain scarcely explored. By introducing novel tools based on topological data analysis and functional data depth into Blockchain Data Analytics, we show that Ethereum network (one of the most popular blockchains for creating new crypto-tokens) can provide critical insights on price strikes of crypto-tokens that are otherwise largely inaccessible with conventional data sources and traditional analytic methods. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: Will appear in SIAM International Conference on Data Mining (SDM20). May 7 - 9, 2020. Cincinnati, Ohio, U.S

arXiv:1910.12939 [pdf, other]

Harnessing the power of Topological Data Analysis to detect change points in time series

Authors: Umar Islambekov, Monisha Yuvaraj, Yulia R. Gel

Abstract: We introduce a novel geometry-oriented methodology, based on the emerging tools of topological data analysis, into the change point detection framework. The key rationale is that change points are likely to be associated with changes in geometry behind the data generating process. While the applications of topological data analysis to change point detection are potentially very broad, in this pape… ▽ More We introduce a novel geometry-oriented methodology, based on the emerging tools of topological data analysis, into the change point detection framework. The key rationale is that change points are likely to be associated with changes in geometry behind the data generating process. While the applications of topological data analysis to change point detection are potentially very broad, in this paper we primarily focus on integrating topological concepts with the existing nonparametric methods for change point detection. In particular, the proposed new geometry-oriented approach aims to enhance detection accuracy of distributional regime shift locations. Our simulation studies suggest that integration of topological data analysis with some existing algorithms for change point detection leads to consistently more accurate detection results. We illustrate our new methodology in application to the two closely related environmental time series datasets -ice phenology of the Lake Baikal and the North Atlantic Oscillation indices, in a research query for a possible association between their estimated regime shift locations. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 11 pages, 3 Figures, 4 tables

arXiv:1910.11525 [pdf, other]

Unsupervised Space-Time Clustering using Persistent Homology

Authors: Umar Islambekov, Yulia Gel

Abstract: This paper presents a new clustering algorithm for space-time data based on the concepts of topological data analysis and in particular, persistent homology. Employing persistent homology - a flexible mathematical tool from algebraic topology used to extract topological information from data - in unsupervised learning is an uncommon and a novel approach. A notable aspect of this methodology consis… ▽ More This paper presents a new clustering algorithm for space-time data based on the concepts of topological data analysis and in particular, persistent homology. Employing persistent homology - a flexible mathematical tool from algebraic topology used to extract topological information from data - in unsupervised learning is an uncommon and a novel approach. A notable aspect of this methodology consists in analyzing data at multiple resolutions which allows to distinguish true features from noise based on the extent of their persistence. We evaluate the performance of our algorithm on synthetic data and compare it to other well-known clustering algorithms such as K-means, hierarchical clustering and DBSCAN. We illustrate its application in the context of a case study of water quality in the Chesapeake Bay. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1908.06971 [pdf, other]

ChainNet: Learning on Blockchain Graphs with Topological Features

Authors: Nazmiye Ceren Abay, Cuneyt Gurcan Akcora, Yulia R. Gel, Umar D. Islambekov, Murat Kantarcioglu, Yahui Tian, Bhavani Thuraisingham

Abstract: With emergence of blockchain technologies and the associated cryptocurrencies, such as Bitcoin, understanding network dynamics behind Blockchain graphs has become a rapidly evolving research direction. Unlike other financial networks, such as stock and currency trading, blockchain based cryptocurrencies have the entire transaction graph accessible to the public (i.e., all transactions can be downl… ▽ More With emergence of blockchain technologies and the associated cryptocurrencies, such as Bitcoin, understanding network dynamics behind Blockchain graphs has become a rapidly evolving research direction. Unlike other financial networks, such as stock and currency trading, blockchain based cryptocurrencies have the entire transaction graph accessible to the public (i.e., all transactions can be downloaded and analyzed). A natural question is then to ask whether the dynamics of the transaction graph impacts the price of the underlying cryptocurrency. We show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. %explain higher order interactions among the nodes in Blockchain graphs and can be used to build much more accurate price prediction models. Using the proposed persistent homology-based techniques, we offer a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain. △ Less

Submitted 18 August, 2019; originally announced August 2019.

Comments: To Appear in the 2019 IEEE International Conference on Data Mining (ICDM)

arXiv:1906.07852 [pdf, other]

BitcoinHeist: Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain

Authors: Cuneyt Gurcan Akcora, Yitao Li, Yulia R. Gel, Murat Kantarcioglu

Abstract: Proliferation of cryptocurrencies (e.g., Bitcoin) that allow pseudo-anonymous transactions, has made it easier for ransomware developers to demand ransom by encrypting sensitive user data. The recently revealed strikes of ransomware attacks have already resulted in significant economic losses and societal harm across different sectors, ranging from local governments to health care. Most modern r… ▽ More Proliferation of cryptocurrencies (e.g., Bitcoin) that allow pseudo-anonymous transactions, has made it easier for ransomware developers to demand ransom by encrypting sensitive user data. The recently revealed strikes of ransomware attacks have already resulted in significant economic losses and societal harm across different sectors, ranging from local governments to health care. Most modern ransomware use Bitcoin for payments. However, although Bitcoin transactions are permanently recorded and publicly available, current approaches for detecting ransomware depend only on a couple of heuristics and/or tedious information gathering steps (e.g., running ransomware to collect ransomware related Bitcoin addresses). To our knowledge, none of the previous approaches have employed advanced data analytics techniques to automatically detect ransomware related transactions and malicious Bitcoin addresses. By capitalizing on the recent advances in topological data analysis, we propose an efficient and tractable data analytics framework to automatically detect new malicious addresses in a ransomware family, given only a limited records of previous transactions. Furthermore, our proposed techniques exhibit high utility to detect the emergence of new ransomware families, that is, ransomware with no previous records of transactions. Using the existing known ransomware data sets, we show that our proposed methodology provides significant improvements in precision and recall for ransomware transaction detection, compared to existing heuristic based approaches, and can be utilized to automate ransomware detection. △ Less

Submitted 18 June, 2019; originally announced June 2019.

Comments: 15 pages, 11 tables, 12 figures

arXiv:1902.09029 [pdf, other]

doi 10.32614/RJ-2018-056

Snowboot: Bootstrap Methods for Network Inference

Authors: Yuzhou Chen, Yulia R. Gel, Vyacheslav Lyubchich, Kusha Nezafati

Abstract: Complex networks are used to describe a broad range of disparate social systems and natural phenomena, from power grids to customer segmentation to human brain connectome. Challenges of parametric model specification and validation inspire a search for more data-driven and flexible nonparametric approaches for inference of complex networks. In this paper we discuss methodology and R implementation… ▽ More Complex networks are used to describe a broad range of disparate social systems and natural phenomena, from power grids to customer segmentation to human brain connectome. Challenges of parametric model specification and validation inspire a search for more data-driven and flexible nonparametric approaches for inference of complex networks. In this paper we discuss methodology and R implementation of two bootstrap procedures on random networks, that is, patchwork bootstrap of Thompson et al. (2016) and Gel et al. (2017) and vertex bootstrap of Snijders and Borgatti (1999). To our knowledge, the new R package snowboot is the first implementation of the vertex and patchwork bootstrap inference on networks in R. Our new package is accompanied with a detailed user's manual, and is compatible with the popular R package on network studies igraph. We evaluate the patchwork bootstrap and vertex bootstrap with extensive simulation studies and illustrate their utility in application to analysis of real world networks. △ Less

Submitted 24 February, 2019; originally announced February 2019.

Journal ref: The R Journal (2018) 10:2, pages 95-113

arXiv:1708.08749 [pdf, other]

Blockchain: A Graph Primer

Authors: Cuneyt Gurcan Akcora, Yulia R. Gel, Murat Kantarcioglu

Abstract: Bitcoin and its underlying technology, blockchain, have gained significant popularity in recent years. Satoshi Nakamoto designed Bitcoin to enable a secure, distributed platform without the need for central authorities, and blockchain has been hailed as a paradigm that will be as impactful as Big Data, Cloud Computing, and Machine Learning. Blockchain incorporates innovative ideas from various f… ▽ More Bitcoin and its underlying technology, blockchain, have gained significant popularity in recent years. Satoshi Nakamoto designed Bitcoin to enable a secure, distributed platform without the need for central authorities, and blockchain has been hailed as a paradigm that will be as impactful as Big Data, Cloud Computing, and Machine Learning. Blockchain incorporates innovative ideas from various fields, such as public-key encryption and distributed systems. As a result, readers often encounter resources that explain Blockchain technology from a single perspective, leaving them with more questions than answers. In this primer, we aim to provide a comprehensive view of blockchain. We will begin with a brief history and introduce the building blocks of the blockchain. As graph mining is a major area of blockchain analysis, we will delve into the graph-theoretical aspects of Blockchain technology. We will also discuss the future of blockchain and explain how extensions such as smart contracts and decentralized autonomous organizations will function. Our goal is to provide a concise but complete description of blockchain technology that is accessible to readers with no prior expertise in the field. △ Less

Submitted 11 December, 2022; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: 19 pages, 5 figures

Showing 1–27 of 27 results for author: Gel, Y