Search | arXiv e-print repository

A Novel Approach To User Agent String Parsing For Vulnerability Analysis Using Mutli-Headed Attention

Authors: Dhruv Nandakumar, Sathvik Murli, Ankur Khosla, Kevin Choi, Abdul Rahman, Drew Walsh, Scott Riede, Eric Dull, Edward Bowen

Abstract: The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various pur… ▽ More The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various purposes such as content negotiation and security. However, due to the proliferation of various browsers and devices, parsing UASs is a non-trivial task due to a lack of standardization of UAS formats. Current rules-based approaches are often brittle and can fail when encountering such non-standard formats. In this work, a novel methodology for parsing UASs using Multi-Headed Attention Based transformers is proposed. The proposed methodology exhibits strong performance in parsing a variety of UASs with differing formats. Furthermore, a framework to utilize parsed UASs to estimate the vulnerability scores for large sections of publicly visible IT networks or regions is also discussed. The methodology present here can also be easily extended or deployed for real-time parsing of logs in enterprise settings. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Accepted to the International Conference on Machine Learning and Cybernetics (ICMLC) 2023

arXiv:2305.02097 [pdf]

Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning

Authors: Carl Chalmers, Paul Fergus, Serge Wich, Steven N Longmore, Naomi Davies Walsh, Philip Stephens, Chris Sutherland, Naomi Matthews, Jens Mudde, Amira Nuseibeh

Abstract: Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is… ▽ More Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is essential for ecologists. However, monitoring is time consuming, costly and often difficult to manage over long durations and at meaningfully large spatial scales. Technology such as camera traps, acoustic monitors and drones provide methods for non-invasive monitoring. There are two main problems with using camera traps for monitoring: a) cameras generate many images, making it difficult to process and analyse the data in a timely manner; and b) the high proportion of false positives hinders the processing and analysis for reporting. In this paper, we outline an approach for overcoming these issues by utilising deep learning for real-time classi-fication of bird species and automated removal of false positives in camera trap data. Images are classified in real-time using a Faster-RCNN architecture. Images are transmitted over 3/4G cam-eras and processed using Graphical Processing Units (GPUs) to provide conservationists with key detection metrics therefore removing the requirement for manual observations. Our models achieved an average sensitivity of 88.79%, a specificity of 98.16% and accuracy of 96.71%. This demonstrates the effectiveness of using deep learning for automatic bird monitoring. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2204.06849 [pdf, other]

Ensuring accurate stain reproduction in deep generative networks for virtual immunohistochemistry

Authors: Christopher D. Walsh, Joanne Edwards, Robert H. Insall

Abstract: Immunohistochemistry is a valuable diagnostic tool for cancer pathology. However, it requires specialist labs and equipment, is time-intensive, and is difficult to reproduce. Consequently, a long term aim is to provide a digital method of recreating physical immunohistochemical stains. Generative Adversarial Networks have become exceedingly advanced at map** one image type to another and have sh… ▽ More Immunohistochemistry is a valuable diagnostic tool for cancer pathology. However, it requires specialist labs and equipment, is time-intensive, and is difficult to reproduce. Consequently, a long term aim is to provide a digital method of recreating physical immunohistochemical stains. Generative Adversarial Networks have become exceedingly advanced at map** one image type to another and have shown promise at inferring immunostains from haematoxylin and eosin. However, they have a substantial weakness when used with pathology images as they can fabricate structures that are not present in the original data. CycleGANs can mitigate invented tissue structures in pathology image map** but have a related disposition to generate areas of inaccurate staining. In this paper, we describe a modification to the loss function of a CycleGAN to improve its map** ability for pathology images by enforcing realistic stain replication while retaining tissue structure. Our approach improves upon others by considering structure and staining during model training. We evaluated our network using the Fréchet Inception distance, coupled with a new technique that we propose to appraise the accuracy of virtual immunohistochemistry. This assesses the overlap between each stain component in the inferred and ground truth images through colour deconvolution, thresholding and the Sorensen-Dice coefficient. Our modified loss function resulted in a Dice coefficient for the virtual stain of 0.78 compared with the real AE1/AE3 slide. This was superior to the unaltered CycleGAN's score of 0.74. Additionally, our loss function improved the Fréchet Inception distance for the reconstruction to 74.54 from 76.47. We, therefore, describe an advance in virtual restaining that can extend to other immunostains and tumour types and deliver reproducible, fast and readily accessible immunohistochemistry worldwide. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Eighteen pages, six figures

ACM Class: I.4.3; I.4.5

arXiv:2012.09935 [pdf, ps, other]

Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score

Authors: Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher

Abstract: Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowin… ▽ More Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's Disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance. △ Less

Submitted 2 December, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2009.09086 [pdf, other]

Focused Clinical Query Understanding and Retrieval of Medical Snippets powered through a Healthcare Knowledge Graph

Authors: Maulik R. Kamdar, Michael Carroll, Will Dowling, Linda Wogulis, Cailey Fitzgerald, Matt Corkum, Danielle Walsh, David Conrad, Craig E. Stanley, Jr., Steve Ross, Dru Henke, Mevan Samarasinghe

Abstract: Clinicians face several significant barriers to search and synthesize accurate, succinct, updated, and trustworthy medical information from several literature sources during the practice of medicine and patient care. In this talk, we will be presenting our research behind the development of a Focused Clinical Search Service, powered by a Healthcare Knowledge Graph, to interpret the query intent be… ▽ More Clinicians face several significant barriers to search and synthesize accurate, succinct, updated, and trustworthy medical information from several literature sources during the practice of medicine and patient care. In this talk, we will be presenting our research behind the development of a Focused Clinical Search Service, powered by a Healthcare Knowledge Graph, to interpret the query intent behind clinical search queries and retrieve relevant medical snippets from a diverse corpus of medical literature. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Under Review as a Podium Talk at the AMIA Informatics Summit 2021

arXiv:1811.00143 [pdf, other]

Democratizing Production-Scale Distributed Deep Learning

Authors: Minghuang Ma, Hadi Pouransari, Daniel Chao, Saurabh Adya, Santiago Akle Serrano, Yi Qin, Dan Gimnicher, Dominic Walsh

Abstract: The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved. One consequence is that the responsibility of orchestrating these complex components is often left to one-off… ▽ More The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved. One consequence is that the responsibility of orchestrating these complex components is often left to one-off scripts and glue code customized for specific problems. To address these restrictions, we introduce \emph{Alchemist} - an internal service built at Apple from the ground up for \emph{easy}, \emph{fast}, and \emph{scalable} distributed training. We discuss its design, implementation, and examples of running different flavors of distributed training. We also present case studies of its internal adoption in the development of autonomous systems, where training times have been reduced by 10x to keep up with the ever-growing data collection. △ Less

Submitted 3 November, 2018; v1 submitted 31 October, 2018; originally announced November 2018.

arXiv:1705.06379 [pdf, ps, other]

General auction method for real-valued optimal transport

Authors: J. D. Walsh III, Luca Dieci

Abstract: Optimal transportation theory is an area of mathematics with real-world applications in fields ranging from economics to optimal control to machine learning. We propose a new algorithm for solving discrete transport (network flow) problems, based on classical auction methods. Auction methods were originally developed as an alternative to the Hungarian method for the assignment problem, so the clas… ▽ More Optimal transportation theory is an area of mathematics with real-world applications in fields ranging from economics to optimal control to machine learning. We propose a new algorithm for solving discrete transport (network flow) problems, based on classical auction methods. Auction methods were originally developed as an alternative to the Hungarian method for the assignment problem, so the classic auction-based algorithms solve integer-valued optimal transport by converting such problems into assignment problems. The general transport auction method we propose works directly on real-valued transport problems. Our results prove termination, bound the transport error, and relate our algorithm to the classic algorithms of Bertsekas and Castanon. △ Less

Submitted 1 May, 2019; v1 submitted 17 May, 2017; originally announced May 2017.

Comments: 36 pages

MSC Class: 49M20; 90C08; 90C46

Showing 1–7 of 7 results for author: Walsh, D