Search | arXiv e-print repository

AI Age Discrepancy: A Novel Parameter for Frailty Assessment in Kidney Tumor Patients

Authors: Rikhil Seshadri, Jayant Siva, Angelica Bartholomew, Clara Goebel, Gabriel Wallerstein-King, Beatriz López Morato, Nicholas Heller, Jason Scovell, Rebecca Campbell, Andrew Wood, Michal Ozery-Flato, Vesna Barros, Maria Gabrani, Michal Rosen-Zvi, Resha Tejpaul, Vidhyalakshmi Ramesh, Nikolaos Papanikolopoulos, Subodh Regmi, Ryan Ward, Robert Abouassaly, Steven C. Campbell, Erick Remer, Christopher Weight

Abstract: Kidney cancer is a global health concern, and accurate assessment of patient frailty is crucial for optimizing surgical outcomes. This paper introduces AI Age Discrepancy, a novel metric derived from machine learning analysis of preoperative abdominal CT scans, as a potential indicator of frailty and postoperative risk in kidney cancer patients. This retrospective study of 599 patients from the 20… ▽ More Kidney cancer is a global health concern, and accurate assessment of patient frailty is crucial for optimizing surgical outcomes. This paper introduces AI Age Discrepancy, a novel metric derived from machine learning analysis of preoperative abdominal CT scans, as a potential indicator of frailty and postoperative risk in kidney cancer patients. This retrospective study of 599 patients from the 2023 Kidney Tumor Segmentation (KiTS) challenge dataset found that a higher AI Age Discrepancy is significantly associated with longer hospital stays and lower overall survival rates, independent of established factors. This suggests that AI Age Discrepancy may provide valuable insights into patient frailty and could thus inform clinical decision-making in kidney cancer treatment. △ Less

Submitted 2 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: 10 pages, 3 figures, 2 tables

arXiv:2406.12513 [pdf, other]

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Authors: Ahmad Mohsin, Helge Janicke, Adrian Wood, Iqbal H. Sarker, Leandros Maglaras, Naeem Janjua

Abstract: Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-ba… ▽ More Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-based textual data, which may contain insecure code. This presents a significant risk of perpetuating vulnerabilities in the generated code, creating potential attack vectors for exploitation by malicious actors. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs through In-Content Learning (ICL) patterns during the code generation process, followed by rigorous security evaluations. To achieve this, we have selected four diverse LLMs for experimentation. We have evaluated these coding LLMs across three programming languages and identified security vulnerabilities and code smells. The code is generated through ICL with curated problem sets and undergoes rigorous security testing to evaluate the overall quality and trustworthiness of the generated code. Our research indicates that ICL-driven one-shot and few-shot learning patterns can enhance code security, reducing vulnerabilities in various programming scenarios. Developers and researchers should know that LLMs have a limited understanding of security principles. This may lead to security breaches when the generated code is deployed in production systems. Our research highlights LLMs are a potential source of new vulnerabilities to the software supply chain. It is important to consider this when using LLMs for code generation. This research article offers insights into improving LLM security and encourages proactive use of LLMs for code generation to ensure software system safety. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 27 pages, Standard Journal Paper submitted to Q1 Elsevier

arXiv:2405.05658 [pdf]

Artificial intelligence for abnormality detection in high volume neuroimaging: a systematic review and meta-analysis

Authors: Siddharth Agarwal, David A. Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth

Abstract: Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-vo… ▽ More Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks. Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line CT or MR neuroimaging. A bivariate random-effects model was used for meta-analysis where appropriate. PROSPERO: CRD42021269563. Results: Only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies. 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial haemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% CI 0.85 - 0.94) and 0.90 (95% CI 0.83 - 0.95) respectively. Other AI studies using CT and MRI detected target conditions other than haemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers. Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02782 [pdf]

A self-supervised text-vision framework for automated brain abnormality detection

Authors: David A. Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J. Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address… ▽ More Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors. △ Less

Submitted 11 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Under Review

arXiv:2404.15198 [pdf, other]

Lossless and Near-Lossless Compression for Foundation Models

Authors: Moshik Hershcovitch, Leshem Choshen, Andrew Wood, Ilias Enmouri, Peter Chin, Swaminathan Sundararaman, Danny Harnik

Abstract: With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to i… ▽ More With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to its original size -- namely lossless compression. Somewhat surprisingly, we show that such lossless compression can gain significant network and storage reduction on popular models, at times reducing over $50\%$ of the model size. We investigate the source of model compressibility, introduce compression variants tailored for models and categorize models to compressibility groups. We also introduce a tunable lossy compression technique that can further reduce size even on the less compressible models with little to no effect on the model accuracy. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like HuggingFace. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.07154 [pdf, other]

Weights with Maximal Symmetry and Failures of the MacWilliams Identities

Authors: Jay A. Wood

Abstract: This paper examines the $w$-weight enumerators of weights $w$ with maximal symmetry over finite chain rings and matrix rings over finite fields. In many cases, including the homogeneous weight, the MacWilliams identities for $w$-weight enumerators fail because there exist two linear codes with the same $w$-weight enumerator whose dual codes have different $w$-weight enumerators. This paper examines the $w$-weight enumerators of weights $w$ with maximal symmetry over finite chain rings and matrix rings over finite fields. In many cases, including the homogeneous weight, the MacWilliams identities for $w$-weight enumerators fail because there exist two linear codes with the same $w$-weight enumerator whose dual codes have different $w$-weight enumerators. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 84 pages, 1 figure

MSC Class: 94B05

arXiv:2401.14252 [pdf, other]

doi 10.1109/BigData59044.2023.10386248

On mission Twitter Profiles: A Study of Selective Toxic Behavior

Authors: Hina Qayyum, Muhammad Ikram, Benjamin Zi Hao Zhao, an D. Wood, Nicolas Kourtellis, Mohamed Ali Kaafar

Abstract: The argument for persistent social media influence campaigns, often funded by malicious entities, is gaining traction. These entities utilize instrumented profiles to disseminate divisive content and disinformation, sha** public perception. Despite ample evidence of these instrumented profiles, few identification methods exist to locate them in the wild. To evade detection and appear genuine, sm… ▽ More The argument for persistent social media influence campaigns, often funded by malicious entities, is gaining traction. These entities utilize instrumented profiles to disseminate divisive content and disinformation, sha** public perception. Despite ample evidence of these instrumented profiles, few identification methods exist to locate them in the wild. To evade detection and appear genuine, small clusters of instrumented profiles engage in unrelated discussions, diverting attention from their true goals. This strategic thematic diversity conceals their selective polarity towards certain topics and fosters public trust. This study aims to characterize profiles potentially used for influence operations, termed 'on-mission profiles,' relying solely on thematic content diversity within unlabeled data. Distinguishing this work is its focus on content volume and toxicity towards specific themes. Longitudinal data from 138K Twitter or X, profiles and 293M tweets enables profiling based on theme diversity. High thematic diversity groups predominantly produce toxic content concerning specific themes, like politics, health, and news classifying them as 'on-mission' profiles. Using the identified ``on-mission" profiles, we design a classifier for unseen, unlabeled data. Employing a linear SVM model, we train and test it on an 80/20% split of the most diverse profiles. The classifier achieves a flawless 100% accuracy, facilitating the discovery of previously unknown ``on-mission" profiles in the wild. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Journal ref: 2023 IEEE International Conference on Big Data (BigData)

arXiv:2401.11737 [pdf, other]

doi 10.1002/adts.202301227

Sphractal: Estimating the Fractal Dimension of Surfaces Computed from Precise Atomic Coordinates via Box-Counting Algorithm

Authors: Jonathan Yik Chang Ting, Andrew Thomas Agars Wood, Amanda Susan Barnard

Abstract: The fractal dimension of a surface allows its degree of roughness to be characterized quantitatively. However, limited effort is attempted to calculate the fractal dimension of surfaces computed from precisely known atomic coordinates from computational biomolecular and nanomaterial studies. This work proposes methods to estimate the fractal dimension of the surface of any 3D object composed of sp… ▽ More The fractal dimension of a surface allows its degree of roughness to be characterized quantitatively. However, limited effort is attempted to calculate the fractal dimension of surfaces computed from precisely known atomic coordinates from computational biomolecular and nanomaterial studies. This work proposes methods to estimate the fractal dimension of the surface of any 3D object composed of spheres, by representing the surface as either a voxelized point cloud or a mathematically exact surface, and computing its box-counting dimension. Sphractal is published as a Python package that provides these functionalities, and its utility is demonstrated on a set of simulated palladium nanoparticle data. △ Less

Submitted 10 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 18 pages, 13 figures

ACM Class: J.2

Journal ref: Adv. Theory Simul. 2024, 2301227

arXiv:2311.02992 [pdf]

NEURO HAND: A weakly supervised Hierarchical Attention Network for interpretable neuroimaging abnormality Detection

Authors: David A. Wood

Abstract: Clinical neuroimaging data is naturally hierarchical. Different magnetic resonance imaging (MRI) sequences within a series, different slices covering the head, and different regions within each slice all confer different information. In this work we present a hierarchical attention network for abnormality detection using MRI scans obtained in a clinical hospital setting. The proposed network is su… ▽ More Clinical neuroimaging data is naturally hierarchical. Different magnetic resonance imaging (MRI) sequences within a series, different slices covering the head, and different regions within each slice all confer different information. In this work we present a hierarchical attention network for abnormality detection using MRI scans obtained in a clinical hospital setting. The proposed network is suitable for non-volumetric data (i.e. stacks of high-resolution MRI slices), and can be trained from binary examination-level labels. We show that this hierarchical approach leads to improved classification, while providing interpretability through either coarse inter- and intra-slice abnormality localisation, or giving importance scores for different slices and sequences, making our model suitable for use as an automated triaging system in radiology departments. △ Less

Submitted 16 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2308.13127 [pdf, other]

JISA: A Polymorphic Test-and-Measurement Automation Library

Authors: William Alexander Wood, Thomas Marsh, Henning Sirringhaus

Abstract: JISA is a software library, written in Java, aimed at providing an easy, flexible and standardised means of creating experimental control software for physical sciences researchers. Specifically, with an emphasis on enabling measurement code to be written in an instrument-agnostic way, allowing such routines to be reused across multiple different setups without requiring modification. Additionally… ▽ More JISA is a software library, written in Java, aimed at providing an easy, flexible and standardised means of creating experimental control software for physical sciences researchers. Specifically, with an emphasis on enabling measurement code to be written in an instrument-agnostic way, allowing such routines to be reused across multiple different setups without requiring modification. Additionally, it provides a simple means of recording and handling data, as well as pre-built graphical user interface (GUI) "blocks" to enable the relatively easy creation of graphical control systems. Together these allow users to quickly piece together test-and-measurement programs with coherent user interfaces, without requiring much experience of such things. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Submitted to SoftwareX, under review

arXiv:2212.12724 [pdf, other]

doi 10.1109/LRA.2023.3286815

Certification of Bottleneck Task Assignment with Shortest Path Criteria

Authors: Tony A. Wood, Maryam Kamgarpour

Abstract: Minimising the longest travel distance for a group of mobile robots with interchangeable goals requires knowledge of the shortest length paths between all robots and goal destinations. Determining the exact length of the shortest paths in an environment with obstacles is NP-hard however. In this paper, we investigate when polynomial-time approximations of the shortest path search are sufficient to… ▽ More Minimising the longest travel distance for a group of mobile robots with interchangeable goals requires knowledge of the shortest length paths between all robots and goal destinations. Determining the exact length of the shortest paths in an environment with obstacles is NP-hard however. In this paper, we investigate when polynomial-time approximations of the shortest path search are sufficient to determine the optimal assignment of robots to goals. In particular, we propose an algorithm in which the accuracy of the path planning is iteratively increased. The approach provides a certificate when the uncertainties on estimates of the shortest paths become small enough to guarantee the optimality of the goal assignment. To this end, we apply results from assignment sensitivity assuming upper and lower bounds on the length of the shortest paths. We then provide polynomial-time methods to find such bounds by applying sampling-based path planning. The upper bounds are given by feasible paths, the lower bounds are obtained by expanding the sample set and leveraging the knowledge of the sample dispersion. We demonstrate the application of the proposed method with a multi-robot path-planning case study. △ Less

Submitted 8 June, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2202.12802 [pdf, other]

Probabilistic Data Association for Semantic SLAM at Scale

Authors: Elad Michael, Tyler Summers, Tony A. Wood, Chris Manzie, Iman Shames

Abstract: With advances in image processing and machine learning, it is now feasible to incorporate semantic information into the problem of simultaneous localisation and map** (SLAM). Previously, SLAM was carried out using lower level geometric features (points, lines, and planes) which are often view-point dependent and error prone in visually repetitive environments. Semantic information can improve th… ▽ More With advances in image processing and machine learning, it is now feasible to incorporate semantic information into the problem of simultaneous localisation and map** (SLAM). Previously, SLAM was carried out using lower level geometric features (points, lines, and planes) which are often view-point dependent and error prone in visually repetitive environments. Semantic information can improve the ability to recognise previously visited locations, as well as maintain sparser maps for long term SLAM applications. However, SLAM in repetitive environments has the critical problem of assigning measurements to the landmarks which generated them. In this paper, we use k-best assignment enumeration to compute marginal assignment probabilities for each measurement landmark pair, in real time. We present numerical studies on the KITTI dataset to demonstrate the effectiveness and speed of the proposed framework. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 6 Pages, 3 figures, submitted to Robotics and Automation Letters and the IROS 2020 conference

MSC Class: 4104 (Primary); 05-08 (Secondary)

arXiv:2202.11518 [pdf, other]

Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution Analysis

Authors: Andrew Wood, Moshik Hershcovitch, Daniel Waddington, Sarel Cohen, Meredith Wolf, Hongjun Suh, Weiyu Zong, Peter Chin

Abstract: Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from… ▽ More Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from biological subjects. However, dimensionality reduction techniques are limited by the von-Neumann architectures that they execute on. Specifically, data intensive algorithms such as dimensionality reduction techniques often require fast, high capacity, persistent memory which historically hardware has been unable to provide at the same time. In this paper, we present a re-implementation of an existing dimensionality reduction technique called Geometric Multi-Scale Resolution Analysis (GMRA) which has been accelerated via novel persistent memory technology called Memory Centric Active Storage (MCAS). Our implementation uses a specialized version of MCAS called PyMM that provides native support for Python datatypes including NumPy arrays and PyTorch tensors. We compare our PyMM implementation against a DRAM implementation, and show that when data fits in DRAM, PyMM offers competitive runtimes. When data does not fit in DRAM, our PyMM implementation is still able to process the data. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: 2021 IEEE High Performance Extreme Computing Conference (HPEC)

arXiv:2202.10522 [pdf, other]

Non-Volatile Memory Accelerated Posterior Estimation

Authors: Andrew Wood, Moshik Hershcovitch, Daniel Waddington, Sarel Cohen, Peter Chin

Abstract: Bayesian inference allows machine learning models to express uncertainty. Current machine learning models use only a single learnable parameter combination when making predictions, and as a result are highly overconfident when their predictions are wrong. To use more learnable parameter combinations efficiently, these samples must be drawn from the posterior distribution. Unfortunately computing t… ▽ More Bayesian inference allows machine learning models to express uncertainty. Current machine learning models use only a single learnable parameter combination when making predictions, and as a result are highly overconfident when their predictions are wrong. To use more learnable parameter combinations efficiently, these samples must be drawn from the posterior distribution. Unfortunately computing the posterior directly is infeasible, so often researchers approximate it with a well known distribution such as a Gaussian. In this paper, we show that through the use of high-capacity persistent storage, models whose posterior distribution was too big to approximate are now feasible, leading to improved predictions in downstream tasks. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2201.11742 [pdf, other]

A neural net architecture based on principles of neural plasticity and development evolves to effectively catch prey in a simulated environment

Authors: Addison Wood, Jory Schossau, Nick Sabaj, Richard Liu, Mark Reimers

Abstract: A profound challenge for A-Life is to construct agents whose behavior is 'life-like' in a deep way. We propose an architecture and approach to constructing networks driving artificial agents, using processes analogous to the processes that construct and sculpt the brains of animals. Furthermore the instantiation of action is dynamic: the whole network responds in real-time to sensory inputs to act… ▽ More A profound challenge for A-Life is to construct agents whose behavior is 'life-like' in a deep way. We propose an architecture and approach to constructing networks driving artificial agents, using processes analogous to the processes that construct and sculpt the brains of animals. Furthermore the instantiation of action is dynamic: the whole network responds in real-time to sensory inputs to activate effectors, rather than computing a representation of the optimal behavior and sending off an encoded representation to effector controllers. There are many parameters and we use an evolutionary algorithm to select them, in the context of a specific prey-capture task. We think this architecture may be useful for controlling small autonomous robots or drones, because it allows for a rapid response to changes in sensor inputs. △ Less

Submitted 30 January, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: 7 pages, 4 figures, 1 appendix page

arXiv:2110.09978 [pdf, other]

What is Learned in Knowledge Graph Embeddings?

Authors: Michael R. Douglas, Michael Simkin, Omri Ben-Eliezer, Tianqi Wu, Peter Chin, Trung V. Dang, Andrew Wood

Abstract: A knowledge graph (KG) is a data structure which represents entities and relations as the vertices and edges of a directed graph with edge types. KGs are an important primitive in modern machine learning and artificial intelligence. Embedding-based models, such as the seminal TransE [Bordes et al., 2013] and the recent PairRE [Chao et al., 2020] are among the most popular and successful approaches… ▽ More A knowledge graph (KG) is a data structure which represents entities and relations as the vertices and edges of a directed graph with edge types. KGs are an important primitive in modern machine learning and artificial intelligence. Embedding-based models, such as the seminal TransE [Bordes et al., 2013] and the recent PairRE [Chao et al., 2020] are among the most popular and successful approaches for representing KGs and inferring missing edges (link completion). Their relative success is often credited in the literature to their ability to learn logical rules between the relations. In this work, we investigate whether learning rules between relations is indeed what drives the performance of embedding-based methods. We define motif learning and two alternative mechanisms, network learning (based only on the connectivity of the KG, ignoring the relation types), and unstructured statistical learning (ignoring the connectivity of the graph). Using experiments on synthetic KGs, we show that KG models can learn motifs and how this ability is degraded by non-motif (noise) edges. We propose tests to distinguish the contributions of the three mechanisms to performance, and apply them to popular KG benchmarks. We also discuss an issue with the standard performance testing protocol and suggest an improvement. To appear in the proceedings of Complex Networks 2021. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 16 pages

ACM Class: I.2.4

arXiv:2106.08176 [pdf, other]

Automated triaging of head MRI examinations using convolutional neural networks

Authors: David A. Wood, Sina Kafiabadi, Ayisha Al Busaidi, Emily Guilhem, Antanas Montvila, Siddharth Agarwal, Jeremy Lynch, Matthew Townend, Gareth Barker, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abno… ▽ More The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abnormalities at the time of imaging and prioritizing the reporting of these scans. In this work, we present a convolutional neural network for detecting clinically-relevant abnormalities in $\text{T}_2$-weighted head MRI scans. Using a validated neuroradiology report classifier, we generated a labelled dataset of 43,754 scans from two large UK hospitals for model training, and demonstrate accurate classification (area under the receiver operating curve (AUC) = 0.943) on a test set of 800 scans labelled by a team of neuroradiologists. Importantly, when trained on scans from only a single hospital the model generalized to scans from the other hospital ($Δ$AUC $\leq$ 0.02). A simulation study demonstrated that our model would reduce the mean reporting time for abnormal examinations from 28 days to 14 days and from 9 days to 5 days at the two hospitals, demonstrating feasibility for use in a clinical triage environment. △ Less

Submitted 28 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: Accepted as an oral presentation at Medical Imaging with Deep Learning (MIDL) 2021

arXiv:2010.01730 [pdf, other]

Facebook Political Ads And Accountability: Outside Groups Are Most Negative, Especially When Hiding Donors

Authors: Shomik Jain, Abby K. Wood

Abstract: The emergence of online political advertising has come with little regulation, allowing political advertisers on social media to avoid accountability. We analyze how transparency and accountability deficits caused by dark money and disappearing groups relate to the sentiment of political ads on Facebook. We obtained 430,044 ads with FEC-registered advertisers from Facebook's ad library that ran be… ▽ More The emergence of online political advertising has come with little regulation, allowing political advertisers on social media to avoid accountability. We analyze how transparency and accountability deficits caused by dark money and disappearing groups relate to the sentiment of political ads on Facebook. We obtained 430,044 ads with FEC-registered advertisers from Facebook's ad library that ran between August-November 2018. We compare ads run by candidates, parties, and outside groups, which we classify by (1) their donor transparency (dark money or disclosed) and (2) the group's permanence (only FEC-registered in 2018 or persistent across cycles). The most negative advertising came from dark money and disappearing outside groups, which were mostly corporations or 501(c) organizations. However, only dark money was associated with a significant decrease in ad sentiment. These results suggest that accountability for political speech matters for advertising tone, especially in the context of affective polarization on social media. △ Less

Submitted 26 January, 2024; v1 submitted 4 October, 2020; originally announced October 2020.

ACM Class: K.4.0

arXiv:2009.11992 [pdf, other]

doi 10.1016/j.cma.2020.113500

A physics-informed operator regression framework for extracting data-driven continuum models

Authors: Ravi G. Patel, Nathaniel A. Trask, Mitchell A. Wood, Eric C. Cyr

Abstract: The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a… ▽ More The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 37 pages, 15 figures

arXiv:2007.04226 [pdf, other]

Labelling imaging datasets on the basis of neuroradiology reports: a validation study

Authors: David A. Wood, Sina Kafiabadi, Aisha Al Busaidi, Emily Guilhem, Jeremy Lynch, Matthew Townend, Antanas Montvila, Juveria Siddiqui, Naveen Gadapa, Matthew Benger, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth

Abstract: Natural language processing (NLP) shows promise as a means to automate the labelling of hospital-scale neuroradiology magnetic resonance imaging (MRI) datasets for computer vision applications. To date, however, there has been no thorough investigation into the validity of this approach, including determining the accuracy of report labels compared to image labels as well as examining the performan… ▽ More Natural language processing (NLP) shows promise as a means to automate the labelling of hospital-scale neuroradiology magnetic resonance imaging (MRI) datasets for computer vision applications. To date, however, there has been no thorough investigation into the validity of this approach, including determining the accuracy of report labels compared to image labels as well as examining the performance of non-specialist labellers. In this work, we draw on the experience of a team of neuroradiologists who labelled over 5000 MRI neuroradiology reports as part of a project to build a dedicated deep learning-based neuroradiology report classifier. We show that, in our experience, assigning binary labels (i.e. normal vs abnormal) to images from reports alone is highly accurate. In contrast to the binary labels, however, the accuracy of more granular labelling is dependent on the category, and we highlight reasons for this discrepancy. We also show that downstream model performance is reduced when labelling of training reports is performed by a non-specialist. To allow other researchers to accelerate their research, we make our refined abnormality definitions and labelling rules available, as well as our easy-to-use radiology report labelling app which helps streamline this process. △ Less

Submitted 8 March, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

arXiv:2007.03152 [pdf, other]

The gem5 Simulator: Version 20.0+

Authors: Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi , et al. (53 additional authors not shown)

Abstract: The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 si… ▽ More The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research. △ Less

Submitted 29 September, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Source, comments, and feedback: https://github.com/darchr/gem5-20-paper

arXiv:2004.12481 [pdf, other]

GymFG: A Framework with a Gym Interface for FlightGear

Authors: Andrew Wood, Ali Sydney, Peter Chin, Bishal Thapa, Ryan Ross

Abstract: Over the past decades, progress in deployable autonomous flight systems has slowly stagnated. This is reflected in today's production air-crafts, where pilots only enable simple physics-based systems such as autopilot for takeoff, landing, navigation, and terrain/traffic avoidance. Evidently, autonomy has not gained the trust of the community where higher problem complexity and cognitive workload… ▽ More Over the past decades, progress in deployable autonomous flight systems has slowly stagnated. This is reflected in today's production air-crafts, where pilots only enable simple physics-based systems such as autopilot for takeoff, landing, navigation, and terrain/traffic avoidance. Evidently, autonomy has not gained the trust of the community where higher problem complexity and cognitive workload are required. To address trust, we must revisit the process for develo** autonomous capabilities: modeling and simulation. Given the prohibitive costs for live tests, we need to prototype and evaluate autonomous aerial agents in a high fidelity flight simulator with autonomous learning capabilities applicable to flight systems: such a open-source development platform is not available. As a result, we have developed GymFG: GymFG couples and extends a high fidelity, open-source flight simulator and a robust agent learning framework to facilitate learning of more complex tasks. Furthermore, we have demonstrated the use of GymFG to train an autonomous aerial agent using Imitation Learning. With GymFG, we can now deploy innovative ideas to address complex problems and build the trust necessary to move prototypes to the real-world. △ Less

Submitted 26 April, 2020; originally announced April 2020.

ACM Class: I.2.1; I.6.5

arXiv:2002.06588 [pdf, other]

Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM)

Authors: David A. Wood, Jeremy Lynch, Sina Kafiabadi, Emily Guilhem, Aisha Al Busaidi, Antanas Montvila, Thomas Varsavsky, Juveria Siddiqui, Naveen Gadapa, Matthew Townend, Martin Kiik, Keena Patel, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth

Abstract: Labelling large datasets for training high-capacity neural networks is a major obstacle to the development of deep learning-based medical imaging applications. Here we present a transformer-based network for magnetic resonance imaging (MRI) radiology report classification which automates this task by assigning image labels on the basis of free-text expert radiology reports. Our model's performance… ▽ More Labelling large datasets for training high-capacity neural networks is a major obstacle to the development of deep learning-based medical imaging applications. Here we present a transformer-based network for magnetic resonance imaging (MRI) radiology report classification which automates this task by assigning image labels on the basis of free-text expert radiology reports. Our model's performance is comparable to that of an expert radiologist, and better than that of an expert physician, demonstrating the feasibility of this approach. We make code available online for researchers to label their own MRI datasets for medical imaging applications. △ Less

Submitted 16 February, 2020; originally announced February 2020.

arXiv:1910.10096 [pdf, other]

Formalizing Privacy Laws for License Generation and Data Repository Decision Automation

Authors: Micah Altman, Stephen Chong, Alexandra Wood

Abstract: In this paper, we summarize work-in-progress on expert system support to automate some data deposit and release decisions within a data repository, and to generate custom license agreements for those data transfers. Our approach formalizes via a logic programming language the privacy-relevant aspects of laws, regulations, and best practices, supported by legal analysis documented in legal memora… ▽ More In this paper, we summarize work-in-progress on expert system support to automate some data deposit and release decisions within a data repository, and to generate custom license agreements for those data transfers. Our approach formalizes via a logic programming language the privacy-relevant aspects of laws, regulations, and best practices, supported by legal analysis documented in legal memoranda. This formalization enables automated reasoning about the conditions under which a repository can transfer data, through interrogation of users, and the application of formal rules to the facts obtained from users. The proposed system takes the specific conditions for a given data release and produces a custom data use agreement that accurately captures the relevant restrictions on data use. This enables appropriate decisions and accurate licenses, while removing the bottleneck of lawyer effort per data transfer. The operation of the system aims to be transparent, in the sense that administrators, lawyers, institutional review boards, and other interested parties can evaluate the legal reasoning and interpretation embodied in the formalization, and the specific rationale for a decision to accept or release a particular dataset. △ Less

Submitted 22 October, 2019; originally announced October 2019.

arXiv:1907.05164 [pdf]

Disease classification of macular Optical Coherence Tomography scans using deep learning software: validation on independent, multi-centre data

Authors: Kanwal K. Bhatia, Mark S. Graham, Louise Terry, Ashley Wood, Paris Tranos, Sameer Trikha, Nicolas Jaccard

Abstract: Purpose: To evaluate Pegasus-OCT, a clinical decision support software for the identification of features of retinal disease from macula OCT scans, across heterogenous populations involving varying patient demographics, device manufacturers, acquisition sites and operators. Methods: 5,588 normal and anomalous macular OCT volumes (162,721 B-scans), acquired at independent centres in five countrie… ▽ More Purpose: To evaluate Pegasus-OCT, a clinical decision support software for the identification of features of retinal disease from macula OCT scans, across heterogenous populations involving varying patient demographics, device manufacturers, acquisition sites and operators. Methods: 5,588 normal and anomalous macular OCT volumes (162,721 B-scans), acquired at independent centres in five countries, were processed using the software. Results were evaluated against ground truth provided by the dataset owners. Results: Pegasus-OCT performed with AUROCs of at least 98% for all datasets in the detection of general macular anomalies. For scans of sufficient quality, the AUROCs for general AMD and DME detection were found to be at least 99% and 98%, respectively. Conclusions: The ability of a clinical decision support system to cater for different populations is key to its adoption. Pegasus-OCT was shown to be able to detect AMD, DME and general anomalies in OCT volumes acquired across multiple independent sites with high performance. Its use thus offers substantial promise, with the potential to alleviate the burden of growing demand in eye care services caused by retinal disease. △ Less

Submitted 11 July, 2019; originally announced July 2019.

arXiv:1806.05130 [pdf, other]

Detecting Speech Act Types in Developer Question/Answer Conversations During Bug Repair

Authors: Andrew Wood, Paige Rodeghero, Ameer Armaly, Collin McMillan

Abstract: This paper targets the problem of speech act detection in conversations about bug repair. We conduct a "Wizard of Oz" experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant for help. Then, we use an open coding manual annotation procedure to identify the speech act types in the conversations. Finally, we train and evaluat… ▽ More This paper targets the problem of speech act detection in conversations about bug repair. We conduct a "Wizard of Oz" experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant for help. Then, we use an open coding manual annotation procedure to identify the speech act types in the conversations. Finally, we train and evaluate a supervised learning algorithm to automatically detect the speech act types in the conversations. In 30 two-hour conversations, we made 2459 annotations and uncovered 26 speech act types. Our automated detection achieved 69% precision and 50% recall. The key application of this work is to advance the state of the art for virtual assistants in software engineering. Virtual assistant technology is growing rapidly, though applications in software engineering are behind those in other areas, largely due to a lack of relevant data and experiments. This paper targets this problem in the area of developer Q/A conversations about bug repair. △ Less

Submitted 3 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: 12 pages (10 for content, two for references), accepted into FSE (Foundations of Software Engineering) 2018

arXiv:1711.09939 [pdf, ps, other]

doi 10.1090/conm/727/14629

The Extension Theorem for Bi-invariant Weights over Frobenius Rings and Frobenius Bimodules

Authors: Oliver W. Gnilke, Marcus Greferath, Thomas Honold, Jay A. Wood, Jens Zumbrägel

Abstract: We give a sufficient condition for a bi-invariant weight on a Frobenius bimodule to satisfy the extension property. This condition applies to bi-invariant weights on a finite Frobenius ring as a special case. The complex-valued functions on a Frobenius bimodule are viewed as a module over the semigroup ring of the multiplicative semigroup of the coefficient ring. We give a sufficient condition for a bi-invariant weight on a Frobenius bimodule to satisfy the extension property. This condition applies to bi-invariant weights on a finite Frobenius ring as a special case. The complex-valued functions on a Frobenius bimodule are viewed as a module over the semigroup ring of the multiplicative semigroup of the coefficient ring. △ Less

Submitted 27 November, 2017; originally announced November 2017.

Comments: 15 pages

MSC Class: 94B05

Journal ref: Rings, Modules and Codes, 117-129, Contemp. Math., 727, AMS

arXiv:1711.01134 [pdf]

Accountability of AI Under the Law: The Role of Explanation

Authors: Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, Alexandra Wood

Abstract: The ubiquity of systems using artificial intelligence or "AI" has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before---applications range from clinical decision support to aut… ▽ More The ubiquity of systems using artificial intelligence or "AI" has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before---applications range from clinical decision support to autonomous driving and predictive policing. That said, there exist legitimate concerns about the intentional and unintentional negative consequences of AI systems. There are many ways to hold AI systems accountable. In this work, we focus on one: explanation. Questions about a legal right to explanation from AI systems was recently debated in the EU General Data Protection Regulation, and thus thinking carefully about when and how explanation from AI systems might improve accountability is timely. In this work, we review contexts in which explanation is currently required under the law, and then list the technical considerations that must be considered if we desired AI systems that could provide kinds of explanations that are currently required of humans. △ Less

Submitted 20 December, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

arXiv:1706.07165 [pdf, other]

Content-Centric Networking - Architectural Overview and Protocol Description

Authors: Marc Mosko, Ignacio Solis, Christopher A. Wood

Abstract: This document describes the core concepts of the CCNx architecture and presents a minimum network protocol based on two messages: Interests and Content Objects. It specifies the set of mandatory and optional fields within those messages and describes their behavior and interpretation. This architecture and protocol specification is independent of a specific wire encoding. This document describes the core concepts of the CCNx architecture and presents a minimum network protocol based on two messages: Interests and Content Objects. It specifies the set of mandatory and optional fields within those messages and describes their behavior and interpretation. This architecture and protocol specification is independent of a specific wire encoding. △ Less

Submitted 22 June, 2017; originally announced June 2017.

arXiv:1608.07485 [pdf, ps, other]

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads

Authors: Jason Lowe-Power, Mark D. Hill, David A. Wood

Abstract: Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data--the memory bandwidth--… ▽ More Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data--the memory bandwidth--has not grown at the same rate. In fact, some of these big-memory systems can access less than 10% of their main memory capacity in one second (billions of processor cycles). 3D die-stacking is one promising solution to this bandwidth problem, and industry is investing significantly in 3D die-stacking. We use a simple back-of-the-envelope-style model to characterize if and when the 3D die-stacked architecture is more cost-effective than current architectures for in-memory big data workloads. We find that die-stacking has much higher performance than current systems (up to 256x lower response times), and it does not require expensive memory over provisioning to meet real-time (10 ms) response time service-level agreements. However, the power requirements of the die-stacked systems are significantly higher (up to 50x) than current systems, and its memory capacity is lower in many cases. Even in this limited case study, we find 3D die-stacking is not a panacea. Today, die-stacking is the most cost-effective solution for strict SLAs and by reducing the power of the compute chip and increasing memory densities die-stacking can be cost-effective under other constraints in the future. △ Less

Submitted 26 August, 2016; originally announced August 2016.

Comments: Originally presented The Seventh workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE-7). http://www.bafst.com/events/asplos16/bpoe7/

arXiv:1512.07755 [pdf, other]

Living in a PIT-less World: A Case Against Stateful Forwarding in Content-Centric Networking

Authors: Cesar Ghali, Gene Tsudik, Ersin Uzun, Christopher A. Wood

Abstract: Information-Centric Networking (ICN) is a recent paradigm that claims to mitigate some limitations of the current IP-based Internet architecture. The centerpiece of ICN is named and addressable content, rather than hosts or interfaces. Content-Centric Networking (CCN) is a prominent ICN instance that shares the fundamental architectural design with its equally popular academic sibling Named-Data N… ▽ More Information-Centric Networking (ICN) is a recent paradigm that claims to mitigate some limitations of the current IP-based Internet architecture. The centerpiece of ICN is named and addressable content, rather than hosts or interfaces. Content-Centric Networking (CCN) is a prominent ICN instance that shares the fundamental architectural design with its equally popular academic sibling Named-Data Networking (NDN). CCN eschews source addresses and creates one-time virtual circuits for every content request (called an interest). As an interest is forwarded it creates state in intervening routers and the requested content back is delivered over the reverse path using that state. Although a stateful forwarding plane might be beneficial in terms of efficiency, and resilience to certain types of attacks, this has not been decisively proven via realistic experiments. Since kee** per-interest state complicates router operations and makes the infrastructure susceptible to router state exhaustion attacks (e.g., there is currently no effective defense against interest flooding attacks), the value of the stateful forwarding plane in CCN should be re-examined. In this paper, we explore supposed benefits and various problems of the stateful forwarding plane. We then argue that its benefits are uncertain at best and it should not be a mandatory CCN feature. To this end, we propose a new stateless architecture for CCN that provides nearly all functionality of the stateful design without its headaches. We analyze performance and resource requirements of the proposed architecture, via experiments. △ Less

Submitted 24 December, 2015; originally announced December 2015.

Comments: 10 pages, 6 figures

arXiv:1512.07311 [pdf, other]

BEAD: Best Effort Autonomous Deletion in Content-Centric Networking

Authors: Cesar Ghali, Gene Tsudik, Christopher A. Wood

Abstract: A core feature of Content-Centric Networking (CCN) is opportunistic content caching in routers. It enables routers to satisfy content requests with in-network cached copies, thereby reducing bandwidth utilization, decreasing congestion, and improving overall content retrieval latency. One major drawback of in-network caching is that content producers have no knowledge about where their content i… ▽ More A core feature of Content-Centric Networking (CCN) is opportunistic content caching in routers. It enables routers to satisfy content requests with in-network cached copies, thereby reducing bandwidth utilization, decreasing congestion, and improving overall content retrieval latency. One major drawback of in-network caching is that content producers have no knowledge about where their content is stored. This is problematic if a producer wishes to delete its content. In this paper, we show how to address this problem with a protocol called BEAD (Best-Effort Autonomous Deletion). BEAD achieves content deletion via small and secure packets that resemble current CCN messages. We discuss several methods of routing BEAD messages from producers to caching routers with varying levels of network overhead and efficacy. We assess BEAD performance via simulations and provide a detailed analysis of its properties. △ Less

Submitted 22 December, 2015; originally announced December 2015.

Comments: 9 pages, 4 figures

arXiv:1510.01852 [pdf, other]

Practical Accounting in Content-Centric Networking (extended version)

Authors: Cesar Ghali, Gene Tsudik, Christopher A. Wood, Edmund Yeh

Abstract: Content-Centric Networking (CCN) is a new class of network architectures designed to address some key limitations of the current IP-based Internet. One of its main features is in-network content caching, which allows requests for content to be served by routers. Despite improved bandwidth utilization and lower latency for popular content retrieval, in-network content caching offers producers no me… ▽ More Content-Centric Networking (CCN) is a new class of network architectures designed to address some key limitations of the current IP-based Internet. One of its main features is in-network content caching, which allows requests for content to be served by routers. Despite improved bandwidth utilization and lower latency for popular content retrieval, in-network content caching offers producers no means of collecting information about content that is requested and later served from network caches. Such information is often needed for accounting purposes. In this paper, we design some secure accounting schemes that vary in the degree of consumer, router, and producer involvement. Next, we identify and analyze performance and security tradeoffs, and show that specific per-consumer accounting is impossible in the presence of router caches and without application-specific support. We then recommend accounting strategies that entail a few simple requirements for CCN architectures. Finally, our experimental results show that forms of native and secure CCN accounting are both more viable and practical than application-specific approaches with little modification to the existing architecture and protocol. △ Less

Submitted 7 October, 2015; originally announced October 2015.

Comments: 13 pages, 6 figures

arXiv:1505.06258 [pdf, other]

doi 10.1145/2810156.2810174

Interest-Based Access Control for Content Centric Networks (extended version)

Authors: Cesar Ghali, Marc A. Schlosberg, Gene Tsudik, Christopher A. Wood

Abstract: Content-Centric Networking (CCN) is an emerging network architecture designed to overcome limitations of the current IP-based Internet. One of the fundamental tenets of CCN is that data, or content, is a named and addressable entity in the network. Consumers request content by issuing interest messages with the desired content name. These interests are forwarded by routers to producers, and the re… ▽ More Content-Centric Networking (CCN) is an emerging network architecture designed to overcome limitations of the current IP-based Internet. One of the fundamental tenets of CCN is that data, or content, is a named and addressable entity in the network. Consumers request content by issuing interest messages with the desired content name. These interests are forwarded by routers to producers, and the resulting content object is returned and optionally cached at each router along the path. In-network caching makes it difficult to enforce access control policies on sensitive content outside of the producer since routers only use interest information for forwarding decisions. To that end, we propose an Interest-Based Access Control (IBAC) scheme that enables access control enforcement using only information contained in interest messages, i.e., by making sensitive content names unpredictable to unauthorized parties. Our IBAC scheme supports both hash- and encryption-based name obfuscation. We address the problem of interest replay attacks by formulating a mutual trust framework between producers and consumers that enables routers to perform authorization checks when satisfying interests from their cache. We assess the computational, storage, and bandwidth overhead of each IBAC variant. Our design is flexible and allows producers to arbitrarily specify and enforce any type of access control on content, without having to deal with the problems of content encryption and key distribution. This is the first comprehensive design for CCN access control using only information contained in interest messages. △ Less

Submitted 22 May, 2015; originally announced May 2015.

Comments: 11 pages, 2 figures

arXiv:1405.2861 [pdf, other]

Secure Fragmentation for Content-Centric Networks (extended version)

Authors: Cesar Ghali, Ashok Narayanan, David Oran, Gene Tsudik, Christopher A. Wood

Abstract: Content-Centric Networking (CCN) is a communication paradigm that emphasizes content distribution. Named-Data Networking (NDN) is an instantiation of CCN, a candidate Future Internet Architecture. NDN supports human-readable content naming and router-based content caching which lends itself to efficient, secure, and scalable content distribution. Because of NDN's fundamental requirement that each… ▽ More Content-Centric Networking (CCN) is a communication paradigm that emphasizes content distribution. Named-Data Networking (NDN) is an instantiation of CCN, a candidate Future Internet Architecture. NDN supports human-readable content naming and router-based content caching which lends itself to efficient, secure, and scalable content distribution. Because of NDN's fundamental requirement that each content object must be signed by its producer, fragmentation has been considered incompatible with NDN since it precludes authentication of individual content fragments by routers. The alternative is to perform hop-by-hop reassembly, which incurs prohibitive delays. In this paper, we show that secure and efficient content fragmentation is both possible and even advantageous in NDN and similar content-centric network architectures that involve signed content. We design a concrete technique that facilitates efficient and secure content fragmentation in NDN, discuss its security guarantees and assess performance. We also describe a prototype implementation and compare performance of cut-through with hop-by-hop fragmentation and reassembly. △ Less

Submitted 19 August, 2015; v1 submitted 12 May, 2014; originally announced May 2014.

Comments: 13 pages, 6 figures

arXiv:1309.3292 [pdf, ps, other]

MacWilliams' Extension Theorem for Bi-Invariant Weights over Finite Principal Ideal Rings

Authors: Marcus Greferath, Thomas Honold, Cathy Mc Fadden, Jay A. Wood, Jens Zumbrägel

Abstract: A finite ring R and a weight w on R satisfy the Extension Property if every R-linear w-isometry between two R-linear codes in R^n extends to a monomial transformation of R^n that preserves w. MacWilliams proved that finite fields with the Hamming weight satisfy the Extension Property. It is known that finite Frobenius rings with either the Hamming weight or the homogeneous weight satisfy the Exten… ▽ More A finite ring R and a weight w on R satisfy the Extension Property if every R-linear w-isometry between two R-linear codes in R^n extends to a monomial transformation of R^n that preserves w. MacWilliams proved that finite fields with the Hamming weight satisfy the Extension Property. It is known that finite Frobenius rings with either the Hamming weight or the homogeneous weight satisfy the Extension Property. Conversely, if a finite ring with the Hamming or homogeneous weight satisfies the Extension Property, then the ring is Frobenius. This paper addresses the question of a characterization of all bi-invariant weights on a finite ring that satisfy the Extension Property. Having solved this question in previous papers for all direct products of finite chain rings and for matrix rings, we have now arrived at a characterization of these weights for finite principal ideal rings, which form a large subclass of the finite Frobenius rings. We do not assume commutativity of the rings in question. △ Less

Submitted 12 September, 2013; originally announced September 2013.

Comments: 12 pages

arXiv:0904.3912 [pdf, ps, other]

Refutation of Aslam's Proof that NP = P

Authors: Frank Ferraro, Garrett Hall, Andrew Wood

Abstract: Aslam presents an algorithm he claims will count the number of perfect matchings in any incomplete bipartite graph with an algorithm in the function-computing version of NC, which is itself a subset of FP. Counting perfect matchings is known to be #P-complete; therefore if Aslam's algorithm is correct, then NP=P. However, we show that Aslam's algorithm does not correctly count the number of perf… ▽ More Aslam presents an algorithm he claims will count the number of perfect matchings in any incomplete bipartite graph with an algorithm in the function-computing version of NC, which is itself a subset of FP. Counting perfect matchings is known to be #P-complete; therefore if Aslam's algorithm is correct, then NP=P. However, we show that Aslam's algorithm does not correctly count the number of perfect matchings and offer an incomplete bipartite graph as a concrete counter-example. △ Less

Submitted 14 May, 2009; v1 submitted 24 April, 2009; originally announced April 2009.

Comments: 13 pages, 2 figures, a response to Aslam's paper (arXiv:0812.1385v11) and the underlying arguments (arXiv:0812.1385v9). Very minor content changes and typos fixed

ACM Class: F.2.0; F.2.2

arXiv:cs/0509050 [pdf]

Effect of door delay on aircraft evacuation time

Authors: Martyn Amos, Andrew Wood

Abstract: The recent commercial launch of twin-deck Very Large Transport Aircraft (VLTA) such as the Airbus A380 has raised questions concerning the speed at which they may be evacuated. The abnormal height of emergency exits on the upper deck has led to speculation that emotional factors such as fear may lead to door delay, and thus play a significant role in increasing overall evacuation time. Full-scal… ▽ More The recent commercial launch of twin-deck Very Large Transport Aircraft (VLTA) such as the Airbus A380 has raised questions concerning the speed at which they may be evacuated. The abnormal height of emergency exits on the upper deck has led to speculation that emotional factors such as fear may lead to door delay, and thus play a significant role in increasing overall evacuation time. Full-scale evacuation tests are financially expensive and potentially hazardous, and systematic studies of the evacuation of VLTA are rare. Here we present a computationally cheap agent-based framework for the general simulation of aircraft evacuation, and apply it to the particular case of the Airbus A380. In particular, we investigate the effect of door delay, and conclude that even a moderate average delay can lead to evacuation times that exceed the maximum for safety certification. The model suggests practical ways to minimise evacuation time, as well as providing a general framework for the simulation of evacuation. △ Less

Submitted 16 September, 2005; originally announced September 2005.

Comments: 8 pages, 2 figures

Showing 1–38 of 38 results for author: Wood, A