Search | arXiv e-print repository

Digital Twin Generators for Disease Modeling

Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh

Abstract: A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine… ▽ More A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2403.04629 [pdf, other]

Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration

Authors: Julian Rodemann, Federico Croppi, Philipp Arens, Yusuf Sale, Julia Herbinger, Bernd Bischl, Eyke Hüllermeier, Thomas Augustin, Conor J. Walsh, Giuseppe Casalicchio

Abstract: Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address thi… ▽ More Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address this issue by proposing ShapleyBO, a framework for interpreting BO's proposals by game-theoretic Shapley values.They quantify each parameter's contribution to BO's acquisition function. Exploiting the linearity of Shapley values, we are further able to identify how strongly each parameter drives BO's exploration and exploitation for additive acquisition functions like the confidence bound. We also show that ShapleyBO can disentangle the contributions to exploration into those that explore aleatoric and epistemic uncertainty. Moreover, our method gives rise to a ShapleyBO-assisted human machine interface (HMI), allowing users to interfere with BO in case proposals do not align with human reasoning. We demonstrate this HMI's benefits for the use case of personalizing wearable robotic devices (assistive back exosuits) by human-in-the-loop BO. Results suggest human-BO teams with access to ShapleyBO can achieve lower regret than teams without. △ Less

Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Preprint. Copyright by the authors. 19 pages, 24 figures

ACM Class: I.2.6; I.2.9; F.2.2; J.6

arXiv:2402.10551 [pdf, other]

Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2312.06845 [pdf, other]

High-Cadence Thermospheric Density Estimation enabled by Machine Learning on Solar Imagery

Authors: Shreshth A. Malik, James Walsh, Giacomo Acciarini, Thomas E. Berger, Atılım Güneş Baydin

Abstract: Accurate estimation of thermospheric density is critical for precise modeling of satellite drag forces in low Earth orbit (LEO). Improving this estimation is crucial to tasks such as state estimation, collision avoidance, and re-entry calculations. The largest source of uncertainty in determining thermospheric density is modeling the effects of space weather driven by solar and geomagnetic activit… ▽ More Accurate estimation of thermospheric density is critical for precise modeling of satellite drag forces in low Earth orbit (LEO). Improving this estimation is crucial to tasks such as state estimation, collision avoidance, and re-entry calculations. The largest source of uncertainty in determining thermospheric density is modeling the effects of space weather driven by solar and geomagnetic activity. Current operational models rely on ground-based proxy indices which imperfectly correlate with the complexity of solar outputs and geomagnetic responses. In this work, we directly incorporate NASA's Solar Dynamics Observatory (SDO) extreme ultraviolet (EUV) spectral images into a neural thermospheric density model to determine whether the predictive performance of the model is increased by using space-based EUV imagery data instead of, or in addition to, the ground-based proxy indices. We demonstrate that EUV imagery can enable predictions with much higher temporal resolution and replace ground-based proxies while significantly increasing performance relative to current operational models. Our method paves the way for assimilating EUV image data into operational thermospheric density forecasting models for use in LEO satellite navigation processes. △ Less

Submitted 12 November, 2023; originally announced December 2023.

Comments: Accepted at the Machine Learning and the Physical Sciences workshop, NeurIPS 2023

arXiv:2307.15816 [pdf]

Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum)

Authors: Guy RY Coleman, Matthew Kutugata, Michael J Walsh, Muthukumar Bagavathiannan

Abstract: Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management hinge on the reliable detection and phenoty** of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenoty** and weed recognition, but their ability to recognize growth st… ▽ More Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management hinge on the reliable detection and phenoty** of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenoty** and weed recognition, but their ability to recognize growth stages, often with stark differences in appearance, is uncertain. Amaranthus palmeri (Palmer amaranth) is a particularly challenging weed plant in cotton (Gossypium hirsutum) production, exhibiting highly variable plant morphology both across growth stages over a growing season, as well as between plants at a given growth stage due to high genetic diversity. In this paper, we investigate eight-class growth stage recognition of A. palmeri in cotton as a challenging model for You Only Look Once (YOLO) architectures. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 on an eight-class growth stage dataset of A. palmeri. The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mean average precision (mAP@[0.5:0.95]) of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X. Class activation maps (CAM) were used to understand model attention on the complex dataset. Fewer classes, grouped by visual or size features improved performance over the ground-truth eight-class dataset. Successful growth stage detection highlights the substantial opportunity for improving plant phenoty** and weed recognition technologies with open-source object detection architectures. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 27 pages, 10 figures, 5 tables

arXiv:2306.07372 [pdf, other]

Composing Efficient, Robust Tests for Policy Selection

Authors: Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Abstract: Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations.… ▽ More Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 26 pages, 13 figures. To appear in Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023)

ACM Class: B.8.1; I.2.6

arXiv:2305.10311 [pdf]

Investigating image-based fallow weed detection performance on Raphanus sativus and Avena sativa at speeds up to 30 km h$^{-1}$

Authors: Guy R. Y. Coleman, Angus Macintyre, Michael J. Walsh, William T. Salter

Abstract: Site-specific weed control (SSWC) can provide considerable reductions in weed control costs and herbicide usage. Despite the promise of machine vision for SSWC systems and the importance of ground speed in weed control efficacy, there has been little investigation of the role of ground speed and camera characteristics on weed detection performance. Here, we compare the performance of four camera-s… ▽ More Site-specific weed control (SSWC) can provide considerable reductions in weed control costs and herbicide usage. Despite the promise of machine vision for SSWC systems and the importance of ground speed in weed control efficacy, there has been little investigation of the role of ground speed and camera characteristics on weed detection performance. Here, we compare the performance of four camera-software combinations using the open-source OpenWeedLocator platform - (1) default settings on a Raspberry Pi HQ camera, (2) optimised software settings on a HQ camera, (3) optimised software settings on the Raspberry Pi v2 camera, and (4) a global shutter Arducam AR0234 camera - at speeds ranging from 5 km h$^{-1}$ to 30 km h$^{-1}$. A combined excess green (ExG) and hue, saturation, value (HSV) thresholding algorithm was used for testing under fallow conditions using tillage radish (Raphanus sativus) and forage oats (Avena sativa) as representative broadleaf and grass weeds, respectively. ARD demonstrated the highest recall among camera systems, with up to 95.7% of weeds detected at 5 km h$^{-1}$ and 85.7% at 30 km h$^{-1}$. HQ1 and V2 cameras had the lowest recall of 31.1% and 26.0% at 30 km h$^{-1}$, respectively. All cameras experienced a decrease in recall as speed increased. The highest rate of decrease was observed for HQ1 with 1.12% and 0.90% reductions in recall for every km h$^{-1}$ increase in speed for tillage radish and forage oats, respectively. Detection of the grassy forage oats was worse (P<0.05) than the broadleaved tillage radish for all cameras. Despite the variations in recall, HQ1, HQ2, and V2 maintained near-perfect precision at all tested speeds. The variable effect of ground speed and camera system on detection performance of grass and broadleaf weeds, indicates that careful hardware and software considerations must be made when develo** SSWC systems. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 15 pages, 9 figures, 3 tables

ACM Class: C.3; I.4.8; J.3

arXiv:2211.01885 [pdf, other]

Using U-Net Network for Efficient Brain Tumor Segmentation in MRI Images

Authors: Jason Walsh, Alice Othmani, Mayank Jain, Soumyabrata Dev

Abstract: Magnetic Resonance Imaging (MRI) is the most commonly used non-intrusive technique for medical image acquisition. Brain tumor segmentation is the process of algorithmically identifying tumors in brain MRI scans. While many approaches have been proposed in the literature for brain tumor segmentation, this paper proposes a lightweight implementation of U-Net. Apart from providing real-time segmentat… ▽ More Magnetic Resonance Imaging (MRI) is the most commonly used non-intrusive technique for medical image acquisition. Brain tumor segmentation is the process of algorithmically identifying tumors in brain MRI scans. While many approaches have been proposed in the literature for brain tumor segmentation, this paper proposes a lightweight implementation of U-Net. Apart from providing real-time segmentation of MRI scans, the proposed architecture does not need large amount of data to train the proposed lightweight U-Net. Moreover, no additional data augmentation step is required. The lightweight U-Net shows very promising results on BITE dataset and it achieves a mean intersection-over-union (IoU) of 89% while outperforming the standard benchmark algorithms. Additionally, this work demonstrates an effective use of the three perspective planes, instead of the original three-dimensional volumetric images, for simplified brain tumor segmentation. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Published in Healthcare Analytics, 2022

arXiv:2211.00576 [pdf, other]

Event Tables for Efficient Experience Replay

Authors: Varun Kompella, Thomas J. Walsh, Samuel Barrett, Peter Wurman, Peter Stone

Abstract: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretic… ▽ More Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches. △ Less

Submitted 21 April, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

Journal ref: Transactions on Machine Learning Research, 2023

arXiv:2208.05800 [pdf]

Regressing Relative Fine-Grained Change for Sub-Groups in Unreliable Heterogeneous Data Through Deep Multi-Task Metric Learning

Authors: Niall O' Mahony, Sean Campbell, Lenka Krpalkova, Joseph Walsh, Daniel Riordan

Abstract: Fine-Grained Change Detection and Regression Analysis are essential in many applications of ArtificialIntelligence. In practice, this task is often challenging owing to the lack of reliable ground truth information andcomplexity arising from interactions between the many underlying factors affecting a system. Therefore,develo** a framework which can represent the relatedness and reliability of m… ▽ More Fine-Grained Change Detection and Regression Analysis are essential in many applications of ArtificialIntelligence. In practice, this task is often challenging owing to the lack of reliable ground truth information andcomplexity arising from interactions between the many underlying factors affecting a system. Therefore,develo** a framework which can represent the relatedness and reliability of multiple sources of informationbecomes critical. In this paper, we investigate how techniques in multi-task metric learning can be applied for theregression of fine-grained change in real data.The key idea is that if we incorporate the incremental change in a metric of interest between specific instancesof an individual object as one of the tasks in a multi-task metric learning framework, then interpreting thatdimension will allow the user to be alerted to fine-grained change invariant to what the overall metric isgeneralised to be. The techniques investigated are specifically tailored for handling heterogeneous data sources,i.e. the input data for each of the tasks might contain missing values, the scale and resolution of the values is notconsistent across tasks and the data contains non-independent and identically distributed (non-IID) instances. Wepresent the results of our initial experimental implementations of this idea and discuss related research in thisdomain which may offer direction for further research. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Journal ref: Sensors and Transducers ISSN 2306-8515 e-ISSN 1726-5479 vol 252 pp 50-57 2021

arXiv:2206.06940 [pdf, other]

Generating Exact Optimal Designs via Particle Swarm Optimization: Assessing Efficacy and Efficiency via Case Study

Authors: Stephen J. Walsh, John J. Borkowski

Abstract: In this study we address existing deficiencies in the literature on applications of Particle Swarm Optimization to generate optimal designs. We present the results of a large computer study in which we bench-mark both efficiency and efficacy of PSO to generate high quality candidate designs for small-exact response surface scenarios commonly encountered by industrial practitioners. A preferred ver… ▽ More In this study we address existing deficiencies in the literature on applications of Particle Swarm Optimization to generate optimal designs. We present the results of a large computer study in which we bench-mark both efficiency and efficacy of PSO to generate high quality candidate designs for small-exact response surface scenarios commonly encountered by industrial practitioners. A preferred version of PSO is demonstrated and recommended. Further, in contrast to popular local optimizers such as the coordinate exchange, PSO is demonstrated to, even in a single run, generate highly efficient designs with large probability at small computing cost. Therefore, it appears beneficial for more practitioners to adopt and use PSO as tool for generating candidate experimental designs. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2012.13455 [pdf, other]

Modeling Disease Progression in Mild Cognitive Impairment and Alzheimer's Disease with Digital Twins

Authors: Daniele Bertolini, Anton D. Loukianov, Aaron M. Smith, David Li-Bland, Yannick Pouliot, Jonathan R. Walsh, Charles K. Fisher

Abstract: Alzheimer's Disease (AD) is a neurodegenerative disease that affects subjects in a broad range of severity and is assessed in clinical trials with multiple cognitive and functional instruments. As clinical trials in AD increasingly focus on earlier stages of the disease, especially Mild Cognitive Impairment (MCI), the ability to model subject outcomes across the disease spectrum is extremely impor… ▽ More Alzheimer's Disease (AD) is a neurodegenerative disease that affects subjects in a broad range of severity and is assessed in clinical trials with multiple cognitive and functional instruments. As clinical trials in AD increasingly focus on earlier stages of the disease, especially Mild Cognitive Impairment (MCI), the ability to model subject outcomes across the disease spectrum is extremely important. We use unsupervised machine learning models called Conditional Restricted Boltzmann Machines (CRBMs) to create Digital Twins of AD subjects. Digital Twins are simulated clinical records that share baseline data with actual subjects and comprehensively model their outcomes under standard-of-care. The CRBMs are trained on a large set of records from subjects in observational studies and the placebo arms of clinical trials across the AD spectrum. These data exhibit a challenging, but common, patchwork of measured and missing observations across subjects in the dataset, and we present a novel model architecture designed to learn effectively from it. We evaluate performance against a held-out test dataset and show how Digital Twins simultaneously capture the progression of a number of key endpoints in clinical trials across a broad spectrum of disease severity, including MCI and mild-to-moderate AD. △ Less

Submitted 24 December, 2020; originally announced December 2020.

arXiv:2012.09935 [pdf, ps, other]

Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score

Authors: Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher

Abstract: Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowin… ▽ More Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's Disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance. △ Less

Submitted 2 December, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2012.07751 [pdf, other]

Near Real-Time Social Distance Estimation in London

Authors: James Walsh, Oluwafunmilola Kesa, Andrew Wang, Mihai Ilas, Patrick O'Hara, Oscar Giles, Neil Dhir, Mark Girolami, Theodoros Damoulas

Abstract: During the COVID-19 pandemic, policy makers at the Greater London Authority, the regional governance body of London, UK, are reliant upon prompt and accurate data sources. Large well-defined heterogeneous compositions of activity throughout the city are sometimes difficult to acquire, yet are a necessity in order to learn 'busyness' and consequently make safe policy decisions. One component of our… ▽ More During the COVID-19 pandemic, policy makers at the Greater London Authority, the regional governance body of London, UK, are reliant upon prompt and accurate data sources. Large well-defined heterogeneous compositions of activity throughout the city are sometimes difficult to acquire, yet are a necessity in order to learn 'busyness' and consequently make safe policy decisions. One component of our project within this space is to utilise existing infrastructure to estimate social distancing adherence by the general public. Our method enables near immediate sampling and contextualisation of activity and physical distancing on the streets of London via live traffic camera feeds. We introduce a framework for inspecting and improving upon existing methods, whilst also describing its active deployment on over 900 real-time feeds. △ Less

Submitted 14 August, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: Version accepted by The Computer Journal

arXiv:2012.07574 [pdf, other]

An Expectation-Based Network Scan Statistic for a COVID-19 Early Warning System

Authors: Chance Haycock, Edward Thorpe-Woods, James Walsh, Patrick O'Hara, Oscar Giles, Neil Dhir, Theodoros Damoulas

Abstract: One of the Greater London Authority's (GLA) response to the COVID-19 pandemic brings together multiple large-scale and heterogeneous datasets capturing mobility, transportation and traffic activity over the city of London to better understand 'busyness' and enable targeted interventions and effective policy-making. As part of Project Odysseus we describe an early-warning system and introduce an ex… ▽ More One of the Greater London Authority's (GLA) response to the COVID-19 pandemic brings together multiple large-scale and heterogeneous datasets capturing mobility, transportation and traffic activity over the city of London to better understand 'busyness' and enable targeted interventions and effective policy-making. As part of Project Odysseus we describe an early-warning system and introduce an expectation-based scan statistic for networks to help the GLA and Transport for London, understand the extent to which populations are following government COVID-19 guidelines. We explicitly treat the case of geographically fixed time-series data located on a (road) network and primarily focus on monitoring the dynamics across large regions of the capital. Additionally, we also focus on the detection and reporting of significant spatio-temporal regions. Our approach is extending the Network Based Scan Statistic (NBSS) by making it expectation-based (EBP) and by using stochastic processes for time-series forecasting, which enables us to quantify metric uncertainty in both the EBP and NBSS frameworks. We introduce a variant of the metric used in the EBP model which focuses on identifying space-time regions in which activity is quieter than expected. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2009.03820 [pdf]

doi 10.1007/978-3-030-55180-3_8

Understanding and Exploiting Dependent Variables with Deep Metric Learning

Authors: Niall O' Mahony, Sean Campbell, Anderson Carvalho, Lenka Krpalkova, Gustavo Velasco-Hernandez, Daniel Riordan, Joseph Walsh

Abstract: Deep Metric Learning (DML) approaches learn to represent inputs to a lower-dimensional latent space such that the distance between representations in this space corresponds with a predefined notion of similarity. This paper investigates how the map** element of DML may be exploited in situations where the salient features in arbitrary classification problems vary over time or due to changing und… ▽ More Deep Metric Learning (DML) approaches learn to represent inputs to a lower-dimensional latent space such that the distance between representations in this space corresponds with a predefined notion of similarity. This paper investigates how the map** element of DML may be exploited in situations where the salient features in arbitrary classification problems vary over time or due to changing underlying variables. Examples of such variable features include seasonal and time-of-day variations in outdoor scenes in place recognition tasks for autonomous navigation and age/gender variations in human/animal subjects in classification tasks for medical/ethological studies. Through the use of visualisation tools for observing the distribution of DML representations per each query variable for which prior information is available, the influence of each variable on the classification task may be better understood. Based on these relationships, prior information on these salient background variables may be exploited at the inference stage of the DML approach by using a clustering algorithm to improve classification performance. This research proposes such a methodology establishing the saliency of query background variables and formulating clustering algorithms for better separating latent-space representations at run-time. The paper also discusses online management strategies to preserve the quality and diversity of data and the representation of each class in the gallery of embeddings in the DML approach. We also discuss latent works towards understanding the relevance of underlying/multiple variables with DML. △ Less

Submitted 8 September, 2020; originally announced September 2020.

Journal ref: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 1, B. R. Arai K., Kapoor S., Ed. Springer, Cham, 2020, pp. 97 to 113

arXiv:2007.00843 [pdf, other]

doi 10.1109/MLSP49062.2020.9231894

Low-light Environment Neural Surveillance

Authors: Michael Potter, Henry Gridley, Noah Lichtenstein, Kevin Hines, John Nguyen, Jacob Walsh

Abstract: We design and implement an end-to-end system for real-time crime detection in low-light environments. Unlike Closed-Circuit Television, which performs reactively, the Low-Light Environment Neural Surveillance provides real time crime alerts. The system uses a low-light video feed processed in real-time by an optical-flow network, spatial and temporal networks, and a Support Vector Machine to ident… ▽ More We design and implement an end-to-end system for real-time crime detection in low-light environments. Unlike Closed-Circuit Television, which performs reactively, the Low-Light Environment Neural Surveillance provides real time crime alerts. The system uses a low-light video feed processed in real-time by an optical-flow network, spatial and temporal networks, and a Support Vector Machine to identify shootings, assaults, and thefts. We create a low-light action-recognition dataset, LENS-4, which will be publicly available. An IoT infrastructure set up via Amazon Web Services interprets messages from the local board hosting the camera for action recognition and parses the results in the cloud to relay messages. The system achieves 71.5% accuracy at 20 FPS. The user interface is a mobile app which allows local authorities to receive notifications and to view a video of the crime scene. Citizens have a public app which enables law enforcement to push crime alerts based on user proximity. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: Pre-print, accepted to IEEE International Workshop on Machine Learning for Signal Processing 2020 Conference Proceedings. Code and dataset are available at https://github.com/mcgridles/

ACM Class: I.4.9; I.5.4

arXiv:2006.16319 [pdf, ps, other]

Estimation and Decomposition of Rack Force for Driving on Uneven Roads

Authors: Akshay Bhardwaj, Daniel Slavin, John Walsh, James Freudenberg, R. Brent Gillespie

Abstract: The force transmitted from the front tires to the steering rack of a vehicle, called the rack force, plays an important role in the function of electric power steering (EPS) systems. Estimates of rack force can be used by EPS to attenuate road feedback and reduce driver effort. Further, estimates of the components of rack force (arising, for example, due to steering angle and road profile) can be… ▽ More The force transmitted from the front tires to the steering rack of a vehicle, called the rack force, plays an important role in the function of electric power steering (EPS) systems. Estimates of rack force can be used by EPS to attenuate road feedback and reduce driver effort. Further, estimates of the components of rack force (arising, for example, due to steering angle and road profile) can be used to separately compensate for each component and thereby enhance steering feel. In this paper, we present three vehicle and tire model-based rack force estimators that utilize sensed steering angle and road profile to estimate total rack force and individual components of rack force. We test and compare the real-time performance of the estimators by performing driving experiments with non-aggressive and aggressive steering maneuvers on roads with low and high frequency profile variations. The results indicate that for aggressive maneuvers the estimators using non-linear tire models produce more accurate rack force estimates. Moreover, only the estimator that incorporates a semi-empirical Rigid Ring tire model is able to capture rack force variation for driving on a road with high frequency profile variation. Finally, we present results from a simulation study to validate the component-wise estimates of rack force. △ Less

Submitted 12 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: 23 pages, 10 figures; fixed references

arXiv:2006.04944 [pdf, other]

A Machine Learning System for Retaining Patients in HIV Care

Authors: Avishek Kumar, Arthi Ramachandran, Adolfo De Unanue, Christina Sung, Joe Walsh, John Schneider, Jessica Ridgway, Stephanie Masiello Schuette, Jeff Lauritsen, Rayid Ghani

Abstract: Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current metho… ▽ More Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic. △ Less

Submitted 31 May, 2020; originally announced June 2020.

arXiv:2003.04570 [pdf, other]

The Locus Algorithm IV: Performance metrics of a grid computing system used to create catalogues of optimised pointings

Authors: Oisín Creaner, John Walsh, Kevin Nolan, Eugene Hickey

Abstract: This paper discusses the requirements for and performance metrics of the the Grid Computing system used to implement the Locus Algorithm to identify optimum pointings for differential photometry of 61,662,376 stars and 23,779 quasars. Initial operational tests indicated a need for a software system to analyse the data and a High Performance Computing system to run that software in a scalable manne… ▽ More This paper discusses the requirements for and performance metrics of the the Grid Computing system used to implement the Locus Algorithm to identify optimum pointings for differential photometry of 61,662,376 stars and 23,779 quasars. Initial operational tests indicated a need for a software system to analyse the data and a High Performance Computing system to run that software in a scalable manner. Practical assessments of the performance of the software in a serial computing environment were used to provide a benchmark against which the performance metrics of the HPC solution could be compared, as well as to indicate any bottlenecks in performance. These performance metrics indicated a distinct split in the performance dictated more by differences in the input data than by differences in the design of the systems used. This indicates a need for experimental analysis of system performance, and suggests that algorithmic complexity analyses may lead to incorrect or naive conclusions, especially in systems with high data I/O overhead such as grid computing. Further, it implies that systems which reduce or eliminate this bottleneck such as in-memory processing could lead to a substantial increase in performance. △ Less

Submitted 11 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: 6 Pages, 1 Figure, arxiv references updated with correct links

ACM Class: C.4

arXiv:2003.04565 [pdf, other]

The Locus Algorithm III: A Grid Computing system to generate catalogues of optimised pointings for Differential Photometry

Authors: Oisń Creaner, Kevin Nolan, John Walsh, Eugene Hickey

Abstract: This paper discusses the hardware and software components of the Grid Computing system used to implement the Locus Algorithm to identify optimum pointings for differential photometry of 61,662,376 stars and 23,799 quasars. The scale of the data, together with initial operational assessments demanded a High Performance Computing (HPC) system to complete the data analysis. Grid computing was chosen… ▽ More This paper discusses the hardware and software components of the Grid Computing system used to implement the Locus Algorithm to identify optimum pointings for differential photometry of 61,662,376 stars and 23,799 quasars. The scale of the data, together with initial operational assessments demanded a High Performance Computing (HPC) system to complete the data analysis. Grid computing was chosen as the HPC solution as the optimum choice available within this project. The physical and logical structure of the National Grid computing Infrastructure informed the approach that was taken. That approach was one of layered separation of the different project components to enable maximum flexibility and extensibility. △ Less

Submitted 11 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

Comments: 12 Pages, 9 Figures, 1 reference corrected, cross-references for arxiv links updated

ACM Class: J.2; C.5

arXiv:2002.02779 [pdf, other]

Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks

Authors: Jonathan R. Walsh, Aaron M. Smith, Yannick Pouliot, David Li-Bland, Anton Loukianov, Charles K. Fisher

Abstract: Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins,… ▽ More Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures. △ Less

Submitted 19 April, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

arXiv:1910.13796 [pdf]

doi 10.1007/978-3-030-17795-9

Deep Learning vs. Traditional Computer Vision

Authors: Niall O' Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Velasco-Hernandez, Lenka Krpalkova, Daniel Riordan, Joseph Walsh

Abstract: Deep Learning has pushed the limits of what was possible in the domain of Digital Image Processing. However, that is not to say that the traditional computer vision techniques which had been undergoing progressive development in years prior to the rise of DL have become obsolete. This paper will analyse the benefits and drawbacks of each approach. The aim of this paper is to promote a discussion o… ▽ More Deep Learning has pushed the limits of what was possible in the domain of Digital Image Processing. However, that is not to say that the traditional computer vision techniques which had been undergoing progressive development in years prior to the rise of DL have become obsolete. This paper will analyse the benefits and drawbacks of each approach. The aim of this paper is to promote a discussion on whether knowledge of classical computer vision techniques should be maintained. The paper will also explore how the two sides of computer vision can be combined. Several recent hybrid methodologies are reviewed which have demonstrated the ability to improve computer vision performance and to tackle problems not suited to Deep Learning. For example, combining traditional computer vision techniques with Deep Learning has been popular in emerging domains such as Panoramic Vision and 3D vision for which Deep Learning models have not yet been fully optimised △ Less

Submitted 30 October, 2019; originally announced October 2019.

Journal ref: in Advances in Computer Vision Proceedings of the 2019 Computer Vision Conference (CVC). Springer Nature Switzerland AG, pp. 128-144

arXiv:1812.01106 [pdf, other]

doi 10.1007/978-3-030-29513-4_48

Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia

Authors: João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala, Joseph Walsh, Katy Dupre, Muhammad Rizal Khaefi, Setiaji, George Hodge, Zakiya Aryana Pramestri, Muhammad Adib Imtiyazi

Abstract: This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta w… ▽ More This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta will better understand how human behavior and built infrastructure contribute to traffic challenges and safety risks. The results of this work should also be broadly applicable to smart city initiatives around the globe as they improve urban planning and sustainability through data science approaches. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: 6 pages; LaTeX; Presented at NeurIPS 2018 Workshop on Machine Learning for the Develo** World; Presented at NeurIPS 2018 Workshop on AI for Social Good

Journal ref: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2, 642-649

arXiv:1807.03876 [pdf, other]

doi 10.1038/s41598-019-49656-2

Deep learning for comprehensive forecasting of Alzheimer's Disease progression

Authors: Charles K. Fisher, Aaron M. Smith, Jonathan R. Walsh, the Coalition Against Major Diseases

Abstract: Most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative that uses unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1908 patients with Mild Cognitive Impairment or Alzheimer's Disease to train a model for personalized forecasting… ▽ More Most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative that uses unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1908 patients with Mild Cognitive Impairment or Alzheimer's Disease to train a model for personalized forecasting of disease progression. We simulate synthetic patient data including the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics, generating both predictions and their confidence intervals. Our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifically trained supervised models and identifies sub-components associated with word recall as predictive of progression. The ability to simultaneously simulate dozens of patient characteristics is a crucial step towards personalized medicine for Alzheimer's Disease. △ Less

Submitted 7 November, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

arXiv:1804.08682 [pdf, other]

Boltzmann Encoded Adversarial Machines

Authors: Charles K. Fisher, Aaron M. Smith, Jonathan R. Walsh

Abstract: Restricted Boltzmann Machines (RBMs) are a class of generative neural network that are typically trained to maximize a log-likelihood objective function. We argue that likelihood-based training strategies may fail because the objective does not sufficiently penalize models that place a high probability in regions where the training data distribution has low probability. To overcome this problem, w… ▽ More Restricted Boltzmann Machines (RBMs) are a class of generative neural network that are typically trained to maximize a log-likelihood objective function. We argue that likelihood-based training strategies may fail because the objective does not sufficiently penalize models that place a high probability in regions where the training data distribution has low probability. To overcome this problem, we introduce Boltzmann Encoded Adversarial Machines (BEAMs). A BEAM is an RBM trained against an adversary that uses the hidden layer activations of the RBM to discriminate between the training data and the probability distribution generated by the model. We present experiments demonstrating that BEAMs outperform RBMs and GANs on multiple benchmarks. △ Less

Submitted 23 April, 2018; originally announced April 2018.

arXiv:1705.06379 [pdf, ps, other]

General auction method for real-valued optimal transport

Authors: J. D. Walsh III, Luca Dieci

Abstract: Optimal transportation theory is an area of mathematics with real-world applications in fields ranging from economics to optimal control to machine learning. We propose a new algorithm for solving discrete transport (network flow) problems, based on classical auction methods. Auction methods were originally developed as an alternative to the Hungarian method for the assignment problem, so the clas… ▽ More Optimal transportation theory is an area of mathematics with real-world applications in fields ranging from economics to optimal control to machine learning. We propose a new algorithm for solving discrete transport (network flow) problems, based on classical auction methods. Auction methods were originally developed as an alternative to the Hungarian method for the assignment problem, so the classic auction-based algorithms solve integer-valued optimal transport by converting such problems into assignment problems. The general transport auction method we propose works directly on real-valued transport problems. Our results prove termination, bound the transport error, and relate our algorithm to the classic algorithms of Bertsekas and Castanon. △ Less

Submitted 1 May, 2019; v1 submitted 17 May, 2017; originally announced May 2017.

Comments: 36 pages

MSC Class: 49M20; 90C08; 90C46

arXiv:1704.08931 [pdf, other]

A Framework for Rate Efficient Control of Distributed Discrete Systems

Authors: Jie Ren, Solmaz Torabi, John MacLaren Walsh

Abstract: A key issue in the control of distributed discrete systems modeled as Markov decisions processes, is that often the state of the system is not directly observable at any single location in the system. The participants in the control scheme must share information with one another regarding the state of the system in order to collectively make informed control decisions, but this information sharing… ▽ More A key issue in the control of distributed discrete systems modeled as Markov decisions processes, is that often the state of the system is not directly observable at any single location in the system. The participants in the control scheme must share information with one another regarding the state of the system in order to collectively make informed control decisions, but this information sharing can be costly. Harnessing recent results from information theory regarding distributed function computation, in this paper we derive, for several information sharing model structures, the minimum amount of control information that must be exchanged to enable local participants to derive the same control decisions as an imaginary omniscient controller having full knowledge of the global state. Incorporating consideration for this amount of information that must be exchanged into the reward enables one to trade the competing objectives of minimizing this control information exchange and maximizing the performance of the controller. An alternating optimization framework is then provided to help find the efficient controllers and messaging schemes. A series of running examples from wireless resource allocation illustrate the ideas and design tradeoffs. △ Less

Submitted 28 April, 2017; originally announced April 2017.

arXiv:1704.01891 [pdf, other]

On Multi-source Networks: Enumeration, Rate Region Computation, and Hierarchy

Authors: Congduan Li, Steven Weber, John MacLaren Walsh

Abstract: Recent algorithmic developments have enabled computers to automatically determine and prove the capacity regions of small hypergraph networks under network coding. A structural theory relating network coding problems of different sizes is developed to make best use of this newfound computational capability. A formal notion of network minimality is developed which removes components of a network co… ▽ More Recent algorithmic developments have enabled computers to automatically determine and prove the capacity regions of small hypergraph networks under network coding. A structural theory relating network coding problems of different sizes is developed to make best use of this newfound computational capability. A formal notion of network minimality is developed which removes components of a network coding problem that are inessential to its core complexity. Equivalence between different network coding problems under relabeling is formalized via group actions, an algorithm which can directly list single representatives from each equivalence class of minimal networks up to a prescribed network size is presented. This algorithm, together with rate region software, is leveraged to create a database containing the rate regions for all minimal network coding problems with five or fewer sources and edges, a collection of 744119 equivalence classes representing more than 9 million networks. In order to best learn from this database, and to leverage it to infer rate regions and their characteristics of networks at scale, a hierarchy between different network coding problems is created with a new theory of combinations and embedding operators. △ Less

Submitted 6 April, 2017; originally announced April 2017.

Comments: 20 pages with double column, revision of previous submission arXiv:1507.05728

arXiv:1607.06833 [pdf, other]

Explicit Polyhedral Bounds on Network Coding Rate Regions via Entropy Function Region: Algorithms, Symmetry, and Computation

Authors: Jayant Apte, John MacLaren Walsh

Abstract: Automating the solutions of multiple network information theory problems, stretching from fundamental concerns such as determining all information inequalities and the limitations of linear codes, to applied ones such as designing coded networks, distributed storage systems, and caching systems, can be posed as polyhedral projections. These problems are demonstrated to exhibit multiple types of po… ▽ More Automating the solutions of multiple network information theory problems, stretching from fundamental concerns such as determining all information inequalities and the limitations of linear codes, to applied ones such as designing coded networks, distributed storage systems, and caching systems, can be posed as polyhedral projections. These problems are demonstrated to exhibit multiple types of polyhedral symmetries. It is shown how these symmetries can be exploited to reduce the complexity of solving these problems through polyhedral projection. △ Less

Submitted 6 July, 2017; v1 submitted 22 July, 2016; originally announced July 2016.

Comments: 23 pages, 15 figures

arXiv:1605.04598 [pdf, other]

Constrained Linear Representability of Polymatroids and Algorithms for Computing Achievability Proofs in Network Coding

Authors: Jayant Apte, John MacLaren Walsh

Abstract: The constrained linear representability problem (CLRP) for polymatroids determines whether there exists a polymatroid that is linear over a specified field while satisfying a collection of constraints on the rank function. Using a computer to test whether a certain rate vector is achievable with vector linear network codes for a multi-source network coding instance and whether there exists a multi… ▽ More The constrained linear representability problem (CLRP) for polymatroids determines whether there exists a polymatroid that is linear over a specified field while satisfying a collection of constraints on the rank function. Using a computer to test whether a certain rate vector is achievable with vector linear network codes for a multi-source network coding instance and whether there exists a multi-linear secret sharing scheme achieving a specified information ratio for a given secret sharing instance are shown to be special cases of CLRP. Methods for solving CLRP built from group theoretic techniques for combinatorial generation are developed and described. These techniques form the core of an information theoretic achievability prover, an implementation accompanies the article, and several computational experiments with interesting instances of network coding and secret sharing demonstrating the utility of the method are provided. △ Less

Submitted 1 February, 2017; v1 submitted 15 May, 2016; originally announced May 2016.

Comments: submitted to IEEE Transactions on Information Theory, (this version: corrected figure 9)

arXiv:1605.01744 [pdf, other]

Improving Automated Patent Claim Parsing: Dataset, System, and Experiments

Authors: Mengke Hu, David Cinciruk, John MacLaren Walsh

Abstract: Off-the-shelf natural language processing software performs poorly when parsing patent claims owing to their use of irregular language relative to the corpora built from news articles and the web typically utilized to train this software. Stop** short of the extensive and expensive process of accumulating a large enough dataset to completely retrain parsers for patent claims, a method of adaptin… ▽ More Off-the-shelf natural language processing software performs poorly when parsing patent claims owing to their use of irregular language relative to the corpora built from news articles and the web typically utilized to train this software. Stop** short of the extensive and expensive process of accumulating a large enough dataset to completely retrain parsers for patent claims, a method of adapting existing natural language processing software towards patent claims via forced part of speech tag correction is proposed. An Amazon Mechanical Turk collection campaign organized to generate a public corpus to train such an improved claim parsing system is discussed, identifying lessons learned during the campaign that can be of use in future NLP dataset collection campaigns with AMT. Experiments utilizing this corpus and other patent claim sets measure the parsing performance improvement garnered via the claim parsing system. Finally, the utility of the improved claim parsing system within other patent processing applications is demonstrated via experiments showing improved automated patent subject classification when the new claim parsing system is utilized to generate the features. △ Less

Submitted 5 May, 2016; originally announced May 2016.

arXiv:1512.03324 [pdf, other]

Map** the Region of Entropic Vectors with Support Enumeration & Information Geometry

Authors: Yunshu Liu, John MacLaren Walsh

Abstract: The region of entropic vectors is a convex cone that has been shown to be at the core of many fundamental limits for problems in multiterminal data compression, network coding, and multimedia transmission. This cone has been shown to be non-polyhedral for four or more random variables, however its boundary remains unknown for four or more discrete random variables. Methods for specifying probabili… ▽ More The region of entropic vectors is a convex cone that has been shown to be at the core of many fundamental limits for problems in multiterminal data compression, network coding, and multimedia transmission. This cone has been shown to be non-polyhedral for four or more random variables, however its boundary remains unknown for four or more discrete random variables. Methods for specifying probability distributions that are in faces and on the boundary of the convex cone are derived, then utilized to map optimized inner bounds to the unknown part of the entropy region. The first method utilizes tools and algorithms from abstract algebra to efficiently determine those supports for the joint probability mass functions for four or more random variables that can, for some appropriate set of non-zero probabilities, yield entropic vectors in the gap between the best known inner and outer bounds. These supports are utilized, together with numerical optimization over non-zero probabilities, to provide inner bounds to the unknown part of the entropy region. Next, information geometry is utilized to parameterize and study the structure of probability distributions on these supports yielding entropic vectors in the faces of entropy and in the unknown part of the entropy region. △ Less

Submitted 10 December, 2015; originally announced December 2015.

arXiv:1507.05728 [pdf, other]

On Multi-source Networks: Enumeration, Rate Region Computation, and Hierarchy

Authors: Congduan Li, Steven Weber, John MacLaren Walsh

Abstract: This paper investigates the enumeration, rate region computation, and hierarchy of general multi-source multi-sink hyperedge networks under network coding, which includes multiple network models, such as independent distributed storage systems and index coding problems, as special cases. A notion of minimal networks and a notion of network equivalence under group action are defined. An efficient a… ▽ More This paper investigates the enumeration, rate region computation, and hierarchy of general multi-source multi-sink hyperedge networks under network coding, which includes multiple network models, such as independent distributed storage systems and index coding problems, as special cases. A notion of minimal networks and a notion of network equivalence under group action are defined. An efficient algorithm capable of directly listing single minimal canonical representatives from each network equivalence class is presented and utilized to list all minimal canonical networks with up to 5 sources and hyperedges. Computational tools are then applied to obtain the rate regions of all of these canonical networks, providing exact expressions for 744,119 newly solved network coding rate regions corresponding to more than 2 trillion isomorphic network coding problems. In order to better understand and analyze the huge repository of rate regions through hierarchy, several embedding and combination operations are defined so that the rate region of the network after operation can be derived from the rate regions of networks involved in the operation. The embedding operations enable the definition and determination of a list of forbidden network minors for the sufficiency of classes of linear codes. The combination operations enable the rate regions of some larger networks to be obtained as the combination of the rate regions of smaller networks. The integration of both the combinations and embedding operators is then shown to enable the calculation of rate regions for many networks not reachable via combination operations alone. △ Less

Submitted 21 July, 2015; originally announced July 2015.

Comments: 63 pages, submitted to TransIT

arXiv:1505.04202 [pdf, other]

doi 10.1109/TSP.2015.2483479

Interactive Scalar Quantization for Distributed Resource Allocation

Authors: Bradford D. Boyle, Jie Ren, John MacLaren Walsh, Steven Weber

Abstract: In many resource allocation problems, a centralized controller needs to award some resource to a user selected from a collection of distributed users with the goal of maximizing the utility the user would receive from the resource. This can be modeled as the controller computing an extremum of the distributed users' utilities. The overhead rate necessary to enable the controller to reproduce the u… ▽ More In many resource allocation problems, a centralized controller needs to award some resource to a user selected from a collection of distributed users with the goal of maximizing the utility the user would receive from the resource. This can be modeled as the controller computing an extremum of the distributed users' utilities. The overhead rate necessary to enable the controller to reproduce the users' local state can be prohibitively high. An approach to reduce this overhead is interactive communication wherein rate savings are achieved by tolerating an increase in delay. In this paper, we consider the design of a simple achievable scheme based on successive refinements of scalar quantization at each user. The optimal quantization policy is computed via a dynamic program and we demonstrate that tolerating a small increase in delay can yield significant rate savings. We then consider two simpler quantization policies to investigate the scaling properties of the rate-delay trade-offs. Using a combination of these simpler policies, the performance of the optimal policy can be closely approximated with lower computational costs. △ Less

Submitted 6 September, 2015; v1 submitted 15 May, 2015; originally announced May 2015.

Comments: 31 pages, 9 figures. Submitted on 2015-05-15 to IEEE Transactions on Signal Processing. Revised 2015-09-06

arXiv:1408.3661 [pdf, other]

Overhead Performance Tradeoffs - A Resource Allocation Perspective

Authors: Jie Ren, Bradford D. Boyle, Gwanmo Ku, Steven Weber, John MacLaren Walsh

Abstract: A key aspect of many resource allocation problems is the need for the resource controller to compute a function, such as the max or arg max, of the competing users metrics. Information must be exchanged between the competing users and the resource controller in order for this function to be computed. In many practical resource controllers the competing users' metrics are communicated to the resour… ▽ More A key aspect of many resource allocation problems is the need for the resource controller to compute a function, such as the max or arg max, of the competing users metrics. Information must be exchanged between the competing users and the resource controller in order for this function to be computed. In many practical resource controllers the competing users' metrics are communicated to the resource controller, which then computes the desired extremization function. However, in this paper it is shown that information rate savings can be obtained by recognizing that controller only needs to determine the result of this extremization function. If the extremization function is to be computed losslessly, the rate savings are shown in most cases to be at most 2 bits independent of the number of competing users. Motivated by the small savings in the lossless case, simple achievable schemes for both the lossy and interactive variants of this problem are considered. It is shown that both of these approaches have the potential to realize large rate savings, especially in the case where the number of competing users is large. For the lossy variant, it is shown that the proposed simple achievable schemes are in fact close to the fundamental limit given by the rate distortion function. △ Less

Submitted 15 August, 2014; originally announced August 2014.

Comments: 70 pages, 18 figures, Submitted to IEEE Transactions on Information Theory on 2014-08-14

arXiv:1408.3469 [pdf, other]

doi 10.1109/TIT.2016.2640302

Properties of an Aloha-like stability region

Authors: Nan Xie, John MacLaren Walsh, Steven Weber

Abstract: A well-known inner bound on the stability region of the finite-user slotted Aloha protocol is the set of all arrival rates for which there exists some choice of the contention probabilities such that the associated worst-case service rate for each user exceeds the user's arrival rate, denoted $Λ$. Although testing membership in $Λ$ of a given arrival rate can be posed as a convex program, it is no… ▽ More A well-known inner bound on the stability region of the finite-user slotted Aloha protocol is the set of all arrival rates for which there exists some choice of the contention probabilities such that the associated worst-case service rate for each user exceeds the user's arrival rate, denoted $Λ$. Although testing membership in $Λ$ of a given arrival rate can be posed as a convex program, it is nonetheless of interest to understand the properties of this set. In this paper we develop new results of this nature, including $i)$ an equivalence between membership in $Λ$ and the existence of a positive root of a given polynomial, $ii)$ a method to construct a vector of contention probabilities to stabilize any stabilizable arrival rate vector, $iii)$ the volume of $Λ$, $iv)$ explicit polyhedral, spherical, and ellipsoid inner and outer bounds on $Λ$, and $v)$ characterization of the generalized convexity properties of a natural ``excess rate'' function associated with $Λ$, including the convexity of the set of contention probabilities that stabilize a given arrival rate vector. △ Less

Submitted 4 January, 2017; v1 submitted 15 August, 2014; originally announced August 2014.

Comments: 28 pages, 9 figures. Submitted August 15, 2014, revised September 21, 2015 and August 31, 2016, and accepted November 06, 2016 for publication in IEEE Transactions on Information Theory. Preliminary results presented at ISIT 2010, ITA 2010, and ITA 2011. DOI: 10.1109/TIT.2016.2640302. Copyright transferred to IEEE. This is last version uploaded by the authors prior to IEEE proofing process

arXiv:1407.5659 [pdf, other]

Multilevel Diversity Coding Systems: Rate Regions, Codes, Computation, & Forbidden Minors

Authors: Congduan Li, Steven Weber, John MacLaren Walsh

Abstract: The rate regions of multilevel diversity coding systems (MDCS), a sub-class of the broader family of multi-source multi-sink networks with special structure, are investigated. After showing how to enumerate all non-isomorphic MDCS instances of a given size, the Shannon outer bound and several achievable inner bounds based on linear codes are given for the rate region of each non-isomorphic instanc… ▽ More The rate regions of multilevel diversity coding systems (MDCS), a sub-class of the broader family of multi-source multi-sink networks with special structure, are investigated. After showing how to enumerate all non-isomorphic MDCS instances of a given size, the Shannon outer bound and several achievable inner bounds based on linear codes are given for the rate region of each non-isomorphic instance. For thousands of MDCS instances, the bounds match, and hence exact rate regions are proven. Results gained from these computations are summarized in key statistics involving aspects such as the sufficiency of scalar binary codes, the necessary size of vector binary codes, etc. Also, it is shown how to generate computer aided human readable converse proofs, as well as how to construct the codes for an achievability proof. Based on this large repository of rate regions, a series of results about general MDCS cases that they inspired are introduced and proved. In particular, a series of embedding operations that preserve the property of sufficiency of scalar or vector codes are presented. The utility of these operations is demonstrated by boiling the thousands of MDCS instances for which binary scalar codes are insufficient down to 12 forbidden smallest embedded MDCS instances. △ Less

Submitted 26 August, 2014; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: Submitted to IEEE Transactions on Information Theory, 52 pages

arXiv:1309.6831 [pdf]

Batch-iFDD for Representation Expansion in Large MDPs

Authors: Alborz Geramifard, Thomas J. Walsh, Nicholas Roy, Jonathan How

Abstract: Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inhe… ▽ More Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm. △ Less

Submitted 26 September, 2013; originally announced September 2013.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-242-251

arXiv:1210.4918 [pdf]

Dynamic Teaching in Sequential Decision Making Environments

Authors: Thomas J. Walsh, Sergiu Goschin

Abstract: We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several tea… ▽ More We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP.We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-863-872

arXiv:1209.0029 [pdf, ps, other]

Statistically adaptive learning for a general class of cost functions (SA L-BFGS)

Authors: Stephen Purpura, Dustin Hillard, Mark Hubenthal, Jim Walsh, Scott Golder, Scott Smith

Abstract: We present a system that enables rapid model experimentation for tera-scale machine learning with trillions of non-zero features, billions of training examples, and millions of parameters. Our contribution to the literature is a new method (SA L-BFGS) for changing batch L-BFGS to perform in near real-time by using statistical tools to balance the contributions of previous weights, old training exa… ▽ More We present a system that enables rapid model experimentation for tera-scale machine learning with trillions of non-zero features, billions of training examples, and millions of parameters. Our contribution to the literature is a new method (SA L-BFGS) for changing batch L-BFGS to perform in near real-time by using statistical tools to balance the contributions of previous weights, old training examples, and new training examples to achieve fast convergence with few iterations. The result is, to our knowledge, the most scalable and flexible linear learning system reported in the literature, beating standard practice with the current best system (Vowpal Wabbit and AllReduce). Using the KDD Cup 2012 data set from Tencent, Inc. we provide experimental results to verify the performance of this method. △ Less

Submitted 5 September, 2012; v1 submitted 31 August, 2012; originally announced September 2012.

Comments: 7 pages, 2 tables

Report number: version 0.05

arXiv:1205.2606 [pdf]

Exploring compact reinforcement-learning representations with linear regression

Authors: Thomas J. Walsh, Istvan Szita, Carlos Diuk, Michael L. Littman

Abstract: This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK l… ▽ More This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together. △ Less

Submitted 9 May, 2012; originally announced May 2012.

Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Report number: UAI-P-2009-PG-591-598

Showing 1–42 of 42 results for author: Walsh, J