-
Explaining neural network predictions of material strength
Authors:
Ian A. Palmer,
T. Nathan Mundhenk,
Brian Gallagher,
Yong Han
Abstract:
We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material's crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one wha…
▽ More
We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material's crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one what parts of an image are important to the network's decision. One can usually deduce the important features by looking at these salient locations. However, SEM images of crystals are more abstract to the human observer than natural image photographs. As a result, it is not easy to tell what features are important at the locations which are most salient. To solve this, we developed a method that helps us map features from important locations in SEM images to non-abstract textures that are easier to interpret.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Network Structure Inference, A Survey: Motivations, Methods, and Applications
Authors:
Ivan Brugere,
Brian Gallagher,
Tanya Y. Berger-Wolf
Abstract:
Networks represent relationships between entities in many complex systems, spanning from online social interactions to biological cell development and brain connectivity. In many cases, relationships between entities are unambiguously known: are two users 'friends' in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? T…
▽ More
Networks represent relationships between entities in many complex systems, spanning from online social interactions to biological cell development and brain connectivity. In many cases, relationships between entities are unambiguously known: are two users 'friends' in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak in a population?
Existing approaches for inferring networks from data are found across many application domains and use specialized knowledge to infer and measure the quality of inferred network for a specific task or hypothesis. However, current research lacks a rigorous methodology which employs standard statistical validation on inferred models. In this survey, we examine (1) how network representations are constructed from underlying data, (2) the variety of questions and tasks on these representations over several domains, and (3) validation strategies for measuring the inferred network's capability of answering questions on the system of interest.
△ Less
Submitted 19 January, 2018; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Size-Consistent Statistics for Anomaly Detection in Dynamic Networks
Authors:
Timothy La Fond,
Jennifer Neville,
Brian Gallagher
Abstract:
An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process.…
▽ More
An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face anomaly detection in dynamic network streams regarding confounding factors. We also provide two solutions to avoiding error due to confounding factors: the first is a randomization testing method that controls for confounding factors, and the second is a set of size-consistent network statistics which avoid confounding due to the most common factors, edge count and node count.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
MaxOutProbe: An Algorithm for Increasing the Size of Partially Observed Networks
Authors:
Sucheta Soundarajan,
Tina Eliassi-Rad,
Brian Gallagher,
Ali Pinar
Abstract:
Networked representations of real-world phenomena are often partially observed, which lead to incomplete networks. Analysis of such incomplete networks can lead to skewed results. We examine the following problem: given an incomplete network, which $b$ nodes should be probed to bring the largest number of new nodes into the observed network? Many graph-mining tasks require having observed a consid…
▽ More
Networked representations of real-world phenomena are often partially observed, which lead to incomplete networks. Analysis of such incomplete networks can lead to skewed results. We examine the following problem: given an incomplete network, which $b$ nodes should be probed to bring the largest number of new nodes into the observed network? Many graph-mining tasks require having observed a considerable amount of the network. Examples include community discovery, belief propagation, influence maximization, etc. For instance, consider someone who has observed a portion (say 1%) of the Twitter retweet network via random tweet sampling. She wants to estimate the size of the largest connected component of the fully observed retweet network. To improve her estimate, how should she use her limited budget to reduce the incompleteness of the network? In this work, we propose a novel algorithm, called MaxOutProbe, which uses a budget $b$ (on nodes probed) to increase the size of the observed network in terms of the number of nodes. Our experiments, across a range of datasets and conditions, demonstrate the advantages of MaxOutProbe over existing methods.
△ Less
Submitted 19 November, 2015;
originally announced November 2015.
-
Anomaly Detection in Dynamic Networks of Varying Size
Authors:
Timothy La Fond,
Jennifer Neville,
Brian Gallagher
Abstract:
Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Her…
▽ More
Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Here the task is to identify points in time where the network exhibits behavior radically different from a typical time, either due to some event (like the failure of machines in a computer network) or a shift in the network properties. This problem is made more difficult by the fluid nature of what is considered "normal" network behavior. The volume of traffic on a network, for example, can change over the course of a month or even vary based on the time of the day without being considered unusual. Anomaly detection tests using traditional network statistics have difficulty in these scenarios due to their Density Dependence: as the volume of edges changes the value of the statistics changes as well making it difficult to determine if the change in signal is due to the traffic volume or due to some fundamental shift in the behavior of the network. To more accurately detect anomalies in dynamic networks, we introduce the concept of Density-Consistent network statistics. On synthetically generated graphs anomaly detectors using these statistics show a a 20-400% improvement in the recall when distinguishing graphs drawn from different distributions. When applied to several real datasets Density-Consistent statistics recover multiple network events which standard statistics failed to find.
△ Less
Submitted 13 November, 2014;
originally announced November 2014.
-
Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective
Authors:
Neil Shah,
Alex Beutel,
Brian Gallagher,
Christos Faloutsos
Abstract:
How can we detect suspicious users in large online networks? Online popularity of a user or product (via follows, page-likes, etc.) can be monetized on the premise of higher ad click-through rates or increased sales. Web services and social networks which incentivize popularity thus suffer from a major problem of fake connections from link fraudsters looking to make a quick buck. Typical methods o…
▽ More
How can we detect suspicious users in large online networks? Online popularity of a user or product (via follows, page-likes, etc.) can be monetized on the premise of higher ad click-through rates or increased sales. Web services and social networks which incentivize popularity thus suffer from a major problem of fake connections from link fraudsters looking to make a quick buck. Typical methods of catching this suspicious behavior use spectral techniques to spot large groups of often blatantly fraudulent (but sometimes honest) users. However, small-scale, stealthy attacks may go unnoticed due to the nature of low-rank eigenanalysis used in practice.
In this work, we take an adversarial approach to find and prove claims about the weaknesses of modern, state-of-the-art spectral methods and propose fBox, an algorithm designed to catch small-scale, stealth attacks that slip below the radar. Our algorithm has the following desirable properties: (a) it has theoretical underpinnings, (b) it is shown to be highly effective on real data and (c) it is scalable (linear on the input size). We evaluate fBox on a large, public 41.7 million node, 1.5 billion edge who-follows-whom social graph from Twitter in 2010 and with high precision identify many suspicious accounts which have persisted without suspension even to this day.
△ Less
Submitted 14 October, 2014;
originally announced October 2014.
-
Dynamic Behavioral Mixed-Membership Model for Large Evolving Networks
Authors:
Ryan Rossi,
Brian Gallagher,
Jennifer Neville,
Keith Henderson
Abstract:
The majority of real-world networks are dynamic and extremely large (e.g., Internet Traffic, Twitter, Facebook, ...). To understand the structural behavior of nodes in these large dynamic networks, it may be necessary to model the dynamics of behavioral roles representing the main connectivity patterns over time. In this paper, we propose a dynamic behavioral mixed-membership model (DBMM) that cap…
▽ More
The majority of real-world networks are dynamic and extremely large (e.g., Internet Traffic, Twitter, Facebook, ...). To understand the structural behavior of nodes in these large dynamic networks, it may be necessary to model the dynamics of behavioral roles representing the main connectivity patterns over time. In this paper, we propose a dynamic behavioral mixed-membership model (DBMM) that captures the roles of nodes in the graph and how they evolve over time. Unlike other node-centric models, our model is scalable for analyzing large dynamic networks. In addition, DBMM is flexible, parameter-free, has no functional form or parameterization, and is interpretable (identifies explainable patterns). The performance results indicate our approach can be applied to very large networks while the experimental results show that our model uncovers interesting patterns underlying the dynamics of these networks.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
Role-Dynamics: Fast Mining of Large Dynamic Networks
Authors:
Ryan Rossi,
Brian Gallagher,
Jennifer Neville,
Keith Henderson
Abstract:
To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral p…
▽ More
To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral patterns such as the center of a star, peripheral nodes, or bridge nodes that connect different communities. Our novel approach learns the appropriate structural role dynamics for any arbitrary network and tracks the changes over time. In particular, we uncover the specific global network dynamics and the local node dynamics of a technological, communication, and social network. We identify interesting node and network patterns such as stationary and non-stationary roles, spikes/steps in role-memberships (perhaps indicating anomalies), increasing/decreasing role trends, among many others. Our results indicate that the nodes in each of these networks have distinct connectivity patterns that are non-stationary and evolve considerably over time. Overall, the experiments demonstrate the effectiveness of our approach for fast mining and tracking of the dynamics in large networks. Furthermore, the dynamic structural representation provides a basis for building more sophisticated models and tools that are fast for exploring large dynamic networks.
△ Less
Submitted 9 March, 2012;
originally announced March 2012.