Search | arXiv e-print repository

The AI Security Pyramid of Pain

Authors: Chris M. Ward, Josh Harguess, Julia Tao, Daniel Christman, Paul Spicer, Mike Tan

Abstract: We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models… ▽ More We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models, including their weights and parameters. Ensuring data integrity is crucial, as it underpins the effectiveness of all AI-driven decisions and operations. The next level, AI System Performance, focuses on MLOps-driven metrics such as model drift, accuracy, and false positive rates. These metrics are crucial for detecting potential security breaches, allowing for early intervention and maintenance of AI system integrity. Advancing further, the pyramid addresses the threat posed by Adversarial Tools, identifying and neutralizing tools used by adversaries to target AI systems. This layer is key to staying ahead of evolving attack methodologies. At the Adversarial Input layer, the framework addresses the detection and mitigation of inputs designed to deceive or exploit AI models. This includes techniques like adversarial patterns and prompt injection attacks, which are increasingly used in sophisticated attacks on AI systems. Data Provenance is the next critical layer, ensuring the authenticity and lineage of data and models. This layer is pivotal in preventing the use of compromised or biased data in AI systems. At the apex is the tactics, techniques, and procedures (TTPs) layer, dealing with the most complex and challenging aspects of AI security. This involves a deep understanding and strategic approach to counter advanced AI-targeted attacks, requiring comprehensive knowledge and planning. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: SPIE DCS 2024

arXiv:2308.07467 [pdf, ps, other]

Sequences with identical autocorrelation spectra

Authors: Daniel J. Katz, Adeebur Rahman, Michael J Ward

Abstract: Aperiodic autocorrelation measures the similarity between a finite-length sequence of complex numbers and translates of itself. Autocorrelation is important in communications, remote sensing, and scientific instrumentation. The autocorrelation function reports the aperiodic autocorrelation at every possible translation. Knowing the autocorrelation function of a sequence is equivalent to knowing th… ▽ More Aperiodic autocorrelation measures the similarity between a finite-length sequence of complex numbers and translates of itself. Autocorrelation is important in communications, remote sensing, and scientific instrumentation. The autocorrelation function reports the aperiodic autocorrelation at every possible translation. Knowing the autocorrelation function of a sequence is equivalent to knowing the magnitude of its Fourier transform. Resolving the lack of phase information is called the phase problem. We say that two sequences are isospectral to mean that they have the same aperiodic autocorrelation function. Sequences used in technological applications often have restrictions on their terms: they are not arbitrary complex numbers, but come from an alphabet that may reside in a proper subring of the complex field or may come from a finite set of values. For example, binary sequences involve terms equal to only $+1$ and $-1$. In this paper, we investigate the necessary and sufficient conditions for two sequences to be isospectral, where we take their alphabet into consideration. There are trivial forms of isospectrality arising from modifications that predictably preserve the autocorrelation, for example, negating sequences or both conjugating their terms and writing them in reverse order. By an exhaustive search of binary sequences up to length $34$, we find that nontrivial isospectrality among binary sequences does occur, but is rare. We say that a positive integer $n$ is barren to mean that there are no nontrivially isospectral binary sequences of length $n$. For integers $n \leq 34$, we found that the barren ones are $1$--$8$, $10$, $11$, $13$, $14$, $19$, $22$, $23$, $26$, and $29$. We prove that any multiple of a non-barren number is also not barren, and pose an open question as to whether there are finitely or infinitely many barren numbers. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 12 pages

MSC Class: 94A12 42A05 42A38 42A85

arXiv:2212.04326 [pdf, other]

Scalable Edge Blocking Algorithms for Defending Active Directory Style Attack Graphs

Authors: Mingyu Guo, Max Ward, Aneta Neumann, Frank Neumann, Hung Nguyen

Abstract: Active Directory (AD) is the default security management system for Windows domain networks. An AD environment naturally describes an attack graph where nodes represent computers/accounts/security groups, and edges represent existing accesses/known exploits that allow the attacker to gain access from one node to another. Motivated by practical AD use cases, we study a Stackelberg game between one… ▽ More Active Directory (AD) is the default security management system for Windows domain networks. An AD environment naturally describes an attack graph where nodes represent computers/accounts/security groups, and edges represent existing accesses/known exploits that allow the attacker to gain access from one node to another. Motivated by practical AD use cases, we study a Stackelberg game between one attacker and one defender. There are multiple entry nodes for the attacker to choose from and there is a single target (Domain Admin). Every edge has a failure rate. The attacker chooses the attack path with the maximum success rate. The defender can block a limited number of edges (i.e., revoke accesses) from a set of blockable edges, limited by budget. The defender's aim is to minimize the attacker's success rate. We exploit the tree-likeness of practical AD graphs to design scalable algorithms. We propose two novel methods that combine theoretical fixed parameter analysis and practical optimisation techniques. For graphs with small tree widths, we propose a tree decomposition based dynamic program. We then propose a general method for converting tree decomposition based dynamic programs to reinforcement learning environments, which leads to an anytime algorithm that scales better, but loses the optimality guarantee. For graphs with small numbers of non-splitting paths (a parameter we invent specifically for AD graphs), we propose a kernelization technique that significantly downsizes the model, which is then solved via mixed-integer programming. Experimentally, our algorithms scale to handle synthetic AD graphs with tens of thousands of nodes. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2204.03397 [pdf, other]

doi 10.1145/3512290.3528729

Defending Active Directory by Combining Neural Network based Dynamic Program and Evolutionary Diversity Optimisation

Authors: Diksha Goel, Max Ward, Aneta Neumann, Frank Neumann, Hung Nguyen, Mingyu Guo

Abstract: Active Directory (AD) is the default security management system for Windows domain networks. We study a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker initially has access to a set of entry nodes. The attacker can expand this set by strategically exploring edges. Every edge has a detection rate and a failure rate. The attacker aims to maximize thei… ▽ More Active Directory (AD) is the default security management system for Windows domain networks. We study a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker initially has access to a set of entry nodes. The attacker can expand this set by strategically exploring edges. Every edge has a detection rate and a failure rate. The attacker aims to maximize their chance of successfully reaching the destination before getting detected. The defender's task is to block a constant number of edges to decrease the attacker's chance of success. We show that the problem is #P-hard and, therefore, intractable to solve exactly. We convert the attacker's problem to an exponential sized Dynamic Program that is approximated by a Neural Network (NN). Once trained, the NN provides an efficient fitness function for the defender's Evolutionary Diversity Optimisation (EDO). The diversity emphasis on the defender's solution provides a diverse set of training samples, which improves the training accuracy of our NN for modelling the attacker. We go back and forth between NN training and EDO. Experimental results show that for R500 graph, our proposed EDO based defense is less than 1% away from the optimal defense. △ Less

Submitted 4 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Added reference [12] on page 3 and 4. Corrected spelling EVC to VEC on page 10

Journal ref: Proceedings of the Genetic and Evolutionary Computation Conference, 2022, Pages 1191 to 1199

arXiv:2202.13402 [pdf, other]

Concept Graph Neural Networks for Surgical Video Understanding

Authors: Yutong Ban, Jennifer A. Eckhoff, Thomas M. Ward, Daniel A. Hashimoto, Ozanan R. Meireles, Daniela Rus, Guy Rosman

Abstract: We constantly integrate our knowledge and understanding of the world to enhance our interpretation of what we see. This ability is crucial in application domains which entail reasoning about multiple entities and concepts, such as AI-augmented surgery. In this paper, we propose a novel way of integrating conceptual knowledge into temporal analysis tasks via temporal concept graph networks. In th… ▽ More We constantly integrate our knowledge and understanding of the world to enhance our interpretation of what we see. This ability is crucial in application domains which entail reasoning about multiple entities and concepts, such as AI-augmented surgery. In this paper, we propose a novel way of integrating conceptual knowledge into temporal analysis tasks via temporal concept graph networks. In the proposed networks, a global knowledge graph is incorporated into the temporal analysis of surgical instances, learning the meaning of concepts and relations as they apply to the data. We demonstrate our results in surgical video data for tasks such as verification of critical view of safety, as well as estimation of Parkland grading scale. The results show that our method improves the recognition and detection of complex benchmarks as well as enables other analytic applications of interest. △ Less

Submitted 25 April, 2023; v1 submitted 27 February, 2022; originally announced February 2022.

arXiv:2201.04799 [pdf, other]

Finding $(s,d)$-Hypernetworks in F-Hypergraphs is NP-Hard

Authors: Reynaldo Gil-Pons, Max Ward, Loïc Miller

Abstract: We consider the problem of computing an $(s,d)$-hypernetwork in an acyclic F-hypergraph. This is a fundamental computational problem arising in directed hypergraphs, and is a foundational step in tackling problems of reachability and redundancy. This problem was previously explored in the context of general directed hypergraphs (containing cycles), where it is NP-hard, and acyclic B-hypergraphs, w… ▽ More We consider the problem of computing an $(s,d)$-hypernetwork in an acyclic F-hypergraph. This is a fundamental computational problem arising in directed hypergraphs, and is a foundational step in tackling problems of reachability and redundancy. This problem was previously explored in the context of general directed hypergraphs (containing cycles), where it is NP-hard, and acyclic B-hypergraphs, where a linear time algorithm can be achieved. In a surprising contrast, we find that for acyclic F-hypergraphs the problem is NP-hard, which also implies the problem is hard in BF-hypergraphs. This is a striking complexity boundary given that F-hypergraphs and B-hypergraphs would at first seem to be symmetrical to one another. We provide the proof of complexity and explain why there is a fundamental asymmetry between the two classes of directed hypergraphs. △ Less

Submitted 14 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

arXiv:2110.04933 [pdf, other]

A Faster Algorithm for Maximum Independent Set on Interval Filament Graphs

Authors: Darcy Best, Max Ward

Abstract: We provide an algorithm requiring only $O(N^2)$ time to compute the maximum weight independent set of interval filament graphs. This also implies an $O(N^4)$ algorithm to compute the maximum weight induced matching of interval filament graphs. Both algorithms significantly improve upon the previous best complexities for these problems. Previously, the maximum weight independent set and maximum wei… ▽ More We provide an algorithm requiring only $O(N^2)$ time to compute the maximum weight independent set of interval filament graphs. This also implies an $O(N^4)$ algorithm to compute the maximum weight induced matching of interval filament graphs. Both algorithms significantly improve upon the previous best complexities for these problems. Previously, the maximum weight independent set and maximum weight induced matching problems required $O(N^3)$ and $O(N^6)$ time respectively. △ Less

Submitted 17 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

arXiv:2109.12894 [pdf, other]

Training Spiking Neural Networks Using Lessons From Deep Learning

Authors: Jason K. Eshraghian, Max Ward, Emre Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, Wei D. Lu

Abstract: The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neurosc… ▽ More The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neuroscience to biologically plausible spiking neural neural networks. We also explore the delicate interplay between encoding data as spikes and the learning process; the challenges and solutions of applying gradient-based learning to spiking neural networks (SNNs); the subtle link between temporal backpropagation and spike timing dependent plasticity, and how deep learning might move towards biologically plausible online learning. Some ideas are well accepted and commonly used amongst the neuromorphic engineering community, while others are presented or justified for the first time here. The fields of deep learning and spiking neural networks evolve very rapidly. We endeavour to treat this document as a 'dynamic' manuscript that will continue to be updated as the common practices in training SNNs also change. A series of companion interactive tutorials complementary to this paper using our Python package, snnTorch, are also made available. See https://snntorch.readthedocs.io/en/latest/tutorials/index.html . △ Less

Submitted 13 August, 2023; v1 submitted 27 September, 2021; originally announced September 2021.

arXiv:2108.06891

Efficient Network Analysis Under Single Link Deletion

Authors: Max Ward, Amitava Datta, Hung X. Nguyen, Jason Eshraghian

Abstract: The problem of worst case edge deletion from a network is considered. Suppose that you have a communication network and you can delete a single edge. Which edge deletion causes the largest disruption? More formally, given a graph, which edge after deletion disconnects the maximum number of pairs of vertices, where ties for number of pairs disconnected are broken by finding an edge that increases t… ▽ More The problem of worst case edge deletion from a network is considered. Suppose that you have a communication network and you can delete a single edge. Which edge deletion causes the largest disruption? More formally, given a graph, which edge after deletion disconnects the maximum number of pairs of vertices, where ties for number of pairs disconnected are broken by finding an edge that increases the average shortest path length the maximum amount. This problem is interesting both practically and theoretically. We call it the \emph{single edge deletion problem}. Our contributions include formally defining the single edge deletion problem and providing motivations from network analysis. Also, we give an algorithm that solves the problem much faster than a naive solution. The algorithm incorporates sophisticated and novel techniques, and generalises to the problem of computing the all-pairs shortest paths table after deleting each edge individually. This means the algorithm has deep theoretical interest as well as the potential for even wider applications than those we present here. △ Less

Submitted 15 February, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: Found previous published results with similar findings

arXiv:2105.04642 [pdf, other]

SUPR-GAN: SUrgical PRediction GAN for Event Anticipation in Laparoscopic and Robotic Surgery

Authors: Yutong Ban, Guy Rosman, Jennifer A. Eckhoff, Thomas M. Ward, Daniel A. Hashimoto, Taisei Kondo, Hidekazu Iwaki, Ozanan R. Meireles, Daniela Rus

Abstract: Comprehension of surgical workflow is the foundation upon which artificial intelligence (AI) and machine learning (ML) holds the potential to assist intraoperative decision-making and risk mitigation. In this work, we move beyond mere identification of past surgical phases, into the prediction of future surgical steps and specification of the transitions between them. We use a novel Generative Adv… ▽ More Comprehension of surgical workflow is the foundation upon which artificial intelligence (AI) and machine learning (ML) holds the potential to assist intraoperative decision-making and risk mitigation. In this work, we move beyond mere identification of past surgical phases, into the prediction of future surgical steps and specification of the transitions between them. We use a novel Generative Adversarial Network (GAN) formulation to sample future surgical phases trajectories conditioned on past video frames from laparoscopic cholecystectomy (LC) videos and compare it to state-of-the-art approaches for surgical video analysis and alternative prediction methods. We demonstrate the GAN formulation's effectiveness through inferring and predicting the progress of LC videos. We quantify the horizon-accuracy trade-off and explored average performance, as well as the performance on the more challenging, and clinically relevant transitions between phases. Furthermore, we conduct a survey, asking 16 surgeons of different specialties and educational levels to qualitatively evaluate predicted surgery phases. △ Less

Submitted 9 March, 2022; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: RA-L ICRA 2022

arXiv:2008.12363 [pdf, other]

Analyzing Worldwide Social Distancing through Large-Scale Computer Vision

Authors: Isha Ghodgaonkar, Subhankar Chakraborty, Vishnu Banna, Shane Allcroft, Mohammed Metwaly, Fischer Bordwell, Kohsuke Kimura, Xinxin Zhao, Abhinav Goel, Caleb Tung, Akhil Chinnakotla, Minghao Xue, Yung-Hsiang Lu, Mark Daniel Ward, Wei Zakharov, David S. Ebert, David M. Barbarash, George K. Thiruvathukal

Abstract: In order to contain the COVID-19 pandemic, countries around the world have introduced social distancing guidelines as public health interventions to reduce the spread of the disease. However, monitoring the efficacy of these guidelines at a large scale (nationwide or worldwide) is difficult. To make matters worse, traditional observational methods such as in-person reporting is dangerous because o… ▽ More In order to contain the COVID-19 pandemic, countries around the world have introduced social distancing guidelines as public health interventions to reduce the spread of the disease. However, monitoring the efficacy of these guidelines at a large scale (nationwide or worldwide) is difficult. To make matters worse, traditional observational methods such as in-person reporting is dangerous because observers may risk infection. A better solution is to observe activities through network cameras; this approach is scalable and observers can stay in safe locations. This research team has created methods that can discover thousands of network cameras worldwide, retrieve data from the cameras, analyze the data, and report the sizes of crowds as different countries issued and lifted restrictions (also called ''lockdown''). We discover 11,140 network cameras that provide real-time data and we present the results across 15 countries. We collect data from these cameras beginning April 2020 at approximately 0.5TB per week. After analyzing 10,424,459 images from still image cameras and frames extracted periodically from video, the data reveals that the residents in some countries exhibited more activity (judged by numbers of people and vehicles) after the restrictions were lifted. In other countries, the amounts of activities showed no obvious changes during the restrictions and after the restrictions were lifted. The data further reveals whether people stay ''social distancing'', at least 6 feet apart. This study discerns whether social distancing is being followed in several types of locations and geographical locations worldwide and serve as an early indicator whether another wave of infections is likely to occur soon. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 10 pages, 15 figures

arXiv:2005.09091 [pdf, other]

Observing Responses to the COVID-19 Pandemic using Worldwide Network Cameras

Authors: Isha Ghodgaonkar, Abhinav Goel, Fischer Bordwell, Caleb Tung, Sara Aghajanzadeh, Noah Curran, Ryan Chen, Kaiwen Yu, Sneha Mahapatra, Vishnu Banna, Gore Kao, Kate Lee, Xiao Hu, Nick Eliopolous, Akhil Chinnakotla, Damini Rijhwani, Ashley Kim, Aditya Chakraborty, Mark Daniel Ward, Yung-Hsiang Lu, George K. Thiruvathukal

Abstract: COVID-19 has resulted in a worldwide pandemic, leading to "lockdown" policies and social distancing. The pandemic has profoundly changed the world. Traditional methods for observing these historical events are difficult because sending reporters to areas with many infected people can put the reporters' lives in danger. New technologies are needed for safely observing responses to these policies. T… ▽ More COVID-19 has resulted in a worldwide pandemic, leading to "lockdown" policies and social distancing. The pandemic has profoundly changed the world. Traditional methods for observing these historical events are difficult because sending reporters to areas with many infected people can put the reporters' lives in danger. New technologies are needed for safely observing responses to these policies. This paper reports using thousands of network cameras deployed worldwide for the purpose of witnessing activities in response to the policies. The network cameras can continuously provide real-time visual data (image and video) without human efforts. Thus, network cameras can be utilized to observe activities without risking the lives of reporters. This paper describes a project that uses network cameras to observe responses to governments' policies during the COVID-19 pandemic (March to April in 2020). The project discovers over 30,000 network cameras deployed in 110 countries. A set of computer tools are created to collect visual data from network cameras continuously during the pandemic. This paper describes the methods to discover network cameras on the Internet, the methods to collect and manage data, and preliminary results of data analysis. This project can be the foundation for observing the possible "second wave" in fall 2020. The data may be used for post-pandemic analysis by sociologists, public health experts, and meteorologists. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 7 pages, 20 figures

arXiv:1905.05373 [pdf, other]

doi 10.1117/12.2275157

Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network (SRCNN)

Authors: Chris M. Ward, Josh Harguess, Brendan Crabb, Shibin Parameswaran

Abstract: Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to understanding the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/re… ▽ More Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to understanding the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/referenceless image spatial quality (BRISQUE), Structural SIMilarity (SSIM) index scores, and Peak signal-to-noise ratio (PSNR) to images before and after image processing, we can quantify quality improvements in a meaningful way and determine the lowest recoverable image quality for a given method. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Journal ref: Proceedings Volume 10396, Applications of Digital Image Processing XL; 1039605 (2017)

arXiv:1905.04828 [pdf, other]

doi 10.1117/12.2306113

Leveraging synthetic imagery for collision-at-sea avoidance

Authors: Chris M. Ward, Josh Harguess, Alexander G. Corelli

Abstract: Maritime collisions involving multiple ships are considered rare, but in 2017 several United States Navy vessels were involved in fatal at-sea collisions that resulted in the death of seventeen American Servicemembers. The experimentation introduced in this paper is a direct response to these incidents. We propose a shipboard Collision-At-Sea avoidance system, based on video image processing, that… ▽ More Maritime collisions involving multiple ships are considered rare, but in 2017 several United States Navy vessels were involved in fatal at-sea collisions that resulted in the death of seventeen American Servicemembers. The experimentation introduced in this paper is a direct response to these incidents. We propose a shipboard Collision-At-Sea avoidance system, based on video image processing, that will help ensure the safe stationing and navigation of maritime vessels. Our system leverages a convolutional neural network trained on synthetic maritime imagery in order to detect nearby vessels within a scene, perform heading analysis of detected vessels, and provide an alert in the presence of an inbound vessel. Additionally, we present the Navigational Hazards - Synthetic (NAVHAZ-Synthetic) dataset. This dataset, is comprised of one million annotated images of ten vessel classes observed from virtual vessel-mounted cameras, as well as a human "Topside Lookout" perspective. NAVHAZ-Synthetic includes imagery displaying varying sea-states, lighting conditions, and optical degradations such as fog, sea-spray, and salt-accumulation. We present our results on the use of synthetic imagery in a computer vision based collision-at-sea warning system with promising performance. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Journal ref: Proc. SPIE 10645, Geospatial Informatics, Motion Imagery, and Network Analytics VIII, 1064507 (4 May 2018)

arXiv:1905.03894 [pdf, other]

Ship classification from overhead imagery using synthetic data and domain adaptation

Authors: Chris M. Ward, Josh Harguess, Cameron Hilton

Abstract: In this paper, we revisit the problem of classifying ships (maritime vessels) detected from overhead imagery. Despite the last decade of research on this very important and pertinent problem, it remains largely unsolved. One of the major issues with the detection and classification of ships and other objects in the maritime domain is the lack of substantial ground truth data needed to train state-… ▽ More In this paper, we revisit the problem of classifying ships (maritime vessels) detected from overhead imagery. Despite the last decade of research on this very important and pertinent problem, it remains largely unsolved. One of the major issues with the detection and classification of ships and other objects in the maritime domain is the lack of substantial ground truth data needed to train state-of-the-art machine learning algorithms. We address this issue by building a large (200k) synthetic image dataset using the Unity gaming engine and 3D ship models. We demonstrate that with the use of synthetic data, classification performance increases dramatically, particularly when there are very few annotated images used in training. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: OCEANS 2018 MTS/IEEE Charleston

arXiv:1611.04837 [pdf, other]

doi 10.1017/psrm.2018.23

Lost in Space: Geolocation in Event Data

Authors: Sophie J. Lee, Howard Liu, Michael D. Ward

Abstract: Extracting the "correct" location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warnin… ▽ More Extracting the "correct" location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data Alliance [OEDA] data) to test our algorithm that consists of two stages. In the feature selection stage, we extract contextual information from texts, namely, the N-gram patterns for location words, the frequency of mention, and the context of the sentences containing location words. In the classification stage, we use three classifiers to estimate the model parameters in the training set and then to predict whether a location word in the test set news articles is the place of the event. The validation results show that our algorithm improves the accuracy rate of the current geolocation methods of dictionary approach by as much as 25%. △ Less

Submitted 14 November, 2016; originally announced November 2016.

Journal ref: PSRM 7 (2019) 871-888

arXiv:1605.03390 [pdf, ps, other]

Variance of the Internal Profile in Suffix Trees

Authors: Jeffrey Gaither, Mark Daniel Ward

Abstract: The precise analysis of the variance of the profile of a suffix tree has been a longstanding open problem. We analyze three regimes of the asymptotic growth of the variance of the profile of a suffix tree built from a randomly generated binary string, in the nonuniform case. We utilize combinatorics on words, singularity analysis, and the Mellin transform. The precise analysis of the variance of the profile of a suffix tree has been a longstanding open problem. We analyze three regimes of the asymptotic growth of the variance of the profile of a suffix tree built from a randomly generated binary string, in the nonuniform case. We utilize combinatorics on words, singularity analysis, and the Mellin transform. △ Less

Submitted 12 May, 2016; v1 submitted 11 May, 2016; originally announced May 2016.

Comments: 19 pages, 0 figures

arXiv:1504.08218 [pdf, other]

Relax, Tensors Are Here: Dependencies in International Processes

Authors: Shahryar Minhas, Peter D. Hoff, Michael D. Ward

Abstract: Previous models of international conflict have suffered two shortfalls. They tended not to embody dynamic changes, focusing rather on static slices of behavior over time. These models have also been empirically evaluated in ways that assumed the independence of each country, when in reality they are searching for the interdependence among all countries. We illustrate a solution to these two hurdle… ▽ More Previous models of international conflict have suffered two shortfalls. They tended not to embody dynamic changes, focusing rather on static slices of behavior over time. These models have also been empirically evaluated in ways that assumed the independence of each country, when in reality they are searching for the interdependence among all countries. We illustrate a solution to these two hurdles and evaluate this new, dynamic, network based approach to the dependencies among the ebb and flow of daily international interactions using a newly developed, and openly available, database of events among nations. △ Less

Submitted 30 April, 2015; originally announced April 2015.

arXiv:1203.2670 [pdf, ps, other]

Partitions with Distinct Multiplicities of Parts: On An "Unsolved Problem" Posed By Herbert Wilf

Authors: James Allen Fill, Svante Janson, Mark Daniel Ward

Abstract: Wilf's Sixth Unsolved Problem asks for any interesting properties of the set of partitions of integers for which the (nonzero) multiplicities of the parts are all different. We refer to these as \emph{Wilf partitions}. Using $f(n)$ to denote the number of Wilf partitions, we establish lead-order asymptotics for $\ln{f(n)}$. Wilf's Sixth Unsolved Problem asks for any interesting properties of the set of partitions of integers for which the (nonzero) multiplicities of the parts are all different. We refer to these as \emph{Wilf partitions}. Using $f(n)$ to denote the number of Wilf partitions, we establish lead-order asymptotics for $\ln{f(n)}$. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Comments: 6 pages, 1 figure

MSC Class: 05A16; 05A17; 68W40

arXiv:1110.6650 [pdf, other]

Summarization and Matching of Density-Based Clusters in Streaming Environments

Authors: Di Yang, Elke A. Rundensteiner, Matthew O. Ward

Abstract: Density-based cluster mining is known to serve a broad range of applications ranging from stock trade analysis to moving object monitoring. Although methods for efficient extraction of density-based clusters have been studied in the literature, the problem of summarizing and matching of such clusters with arbitrary shapes and complex cluster structures remains unsolved. Therefore, the goal of our… ▽ More Density-based cluster mining is known to serve a broad range of applications ranging from stock trade analysis to moving object monitoring. Although methods for efficient extraction of density-based clusters have been studied in the literature, the problem of summarizing and matching of such clusters with arbitrary shapes and complex cluster structures remains unsolved. Therefore, the goal of our work is to extend the state-of-art of density-based cluster mining in streams from cluster extraction only to now also support analysis and management of the extracted clusters. Our work solves three major technical challenges. First, we propose a novel multi-resolution cluster summarization method, called Skeletal Grid Summarization (SGS), which captures the key features of density-based clusters, covering both their external shape and internal cluster structures. Second, in order to summarize the extracted clusters in real-time, we present an integrated computation strategy C-SGS, which piggybacks the generation of cluster summarizations within the online clustering process. Lastly, we design a mechanism to efficiently execute cluster matching queries, which identify similar clusters for given cluster of analyst's interest from clusters extracted earlier in the stream history. Our experimental study using real streaming data shows the clear superiority of our proposed methods in both efficiency and effectiveness for cluster summarization and cluster matching queries to other potential alternatives. △ Less

Submitted 30 October, 2011; originally announced October 2011.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 2, pp. 121-132 (2011)

arXiv:cs/0109053 [pdf]

Price Increases from Online Privacy

Authors: Michael R. Ward, Yu-Ching Chen

Abstract: Consumers value kee** some information about them private from potential marketers. E-commerce dramatically increases the potential for marketers to accumulate otherwise private information about potential customers. Online marketers claim that this information enables them to better market their products. Policy makers are currently drafting rules to regulate the way in which these marketers… ▽ More Consumers value kee** some information about them private from potential marketers. E-commerce dramatically increases the potential for marketers to accumulate otherwise private information about potential customers. Online marketers claim that this information enables them to better market their products. Policy makers are currently drafting rules to regulate the way in which these marketers can collect, store, and share this information. However, there is little evidence yet either of consumers' valuation of their privacy or of the benefits they might reap through better target marketing. We provide a framework for measuring a portion of the benefits from allowing marketers to make better use of consumer information. Target marketing is likely to reduce consumer search costs, improve consumer product selection decisions, and lower the marketing costs of goods sold. Our model allows us to estimate the value to consumers of only the latter, price reductions from more efficient marketing. △ Less

Submitted 23 September, 2001; originally announced September 2001.

Comments: 29th TPRC Conference, 2001

Report number: TPRC-2001-014 ACM Class: K.4.m Miscellaneous

arXiv:cs/0105006 [pdf, ps, other]

doi 10.1109/WCRE.2000.891448

Reverse Engineering from Assembler to Formal Specifications via Program Transformations

Authors: M. P. Ward

Abstract: The FermaT transformation system, based on research carried out over the last sixteen years at Durham University, De Montfort University and Software Migrations Ltd., is an industrial-strength formal transformation engine with many applications in program comprehension and language migration. This paper is a case study which uses automated plus manually-directed transformations and abstractions… ▽ More The FermaT transformation system, based on research carried out over the last sixteen years at Durham University, De Montfort University and Software Migrations Ltd., is an industrial-strength formal transformation engine with many applications in program comprehension and language migration. This paper is a case study which uses automated plus manually-directed transformations and abstractions to convert an IBM 370 Assembler code program into a very high-level abstract specification. △ Less

Submitted 4 May, 2001; originally announced May 2001.

Comments: 10 pages

ACM Class: D.2.7; D.3.2

Journal ref: 7th Working Conference on Reverse Engineering 2000, 23--25 Nov 2000, Brisbane, Queensland, Australia. IEEE Computer Society

arXiv:cs/9810019 [pdf, ps]

Gryphon: An Information Flow Based Approach to Message Brokering

Authors: Robert Strom, Guruduth Banavar, Tushar Chandra, Marc Kaplan, Kevan Miller, Bodhi Mukherjee, Daniel Sturman, Michael Ward

Abstract: Gryphon is a distributed computing paradigm for message brokering, which is the transferring of information in the form of streams of events from information providers to information consumers. This extended abstract outlines the major problems in message brokering and Gryphon's approach to solving them. Gryphon is a distributed computing paradigm for message brokering, which is the transferring of information in the form of streams of events from information providers to information consumers. This extended abstract outlines the major problems in message brokering and Gryphon's approach to solving them. △ Less

Submitted 21 October, 1998; originally announced October 1998.

Comments: Two page extended abstract

ACM Class: C.2.4

Showing 1–23 of 23 results for author: Ward, M