Search | arXiv e-print repository

arXiv:2305.02097 [pdf]

Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning

Authors: Carl Chalmers, Paul Fergus, Serge Wich, Steven N Longmore, Naomi Davies Walsh, Philip Stephens, Chris Sutherland, Naomi Matthews, Jens Mudde, Amira Nuseibeh

Abstract: Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is… ▽ More Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is essential for ecologists. However, monitoring is time consuming, costly and often difficult to manage over long durations and at meaningfully large spatial scales. Technology such as camera traps, acoustic monitors and drones provide methods for non-invasive monitoring. There are two main problems with using camera traps for monitoring: a) cameras generate many images, making it difficult to process and analyse the data in a timely manner; and b) the high proportion of false positives hinders the processing and analysis for reporting. In this paper, we outline an approach for overcoming these issues by utilising deep learning for real-time classi-fication of bird species and automated removal of false positives in camera trap data. Images are classified in real-time using a Faster-RCNN architecture. Images are transmitted over 3/4G cam-eras and processed using Graphical Processing Units (GPUs) to provide conservationists with key detection metrics therefore removing the requirement for manual observations. Our models achieved an average sensitivity of 88.79%, a specificity of 98.16% and accuracy of 96.71%. This demonstrates the effectiveness of using deep learning for automatic bird monitoring. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.12703 [pdf, other]

Empowering Wildlife Guardians: An Equitable Digital Stewardship and Reward System for Biodiversity Conservation using Deep Learning and 3/4G Camera Traps

Authors: Paul Fergus, Carl Chalmers, Steven Longmore, Serge Wich, Carmen Warmenhove, Jonathan Swart, Thuto Ngongwane, André Burger, Jonathan Ledgard, Erik Meijaard

Abstract: The biodiversity of our planet is under threat, with approximately one million species expected to become extinct within decades. The reason; negative human actions, which include hunting, overfishing, pollution, and the conversion of land for urbanisation and agricultural purposes. Despite significant investment from charities and governments for activities that benefit nature, global wildlife po… ▽ More The biodiversity of our planet is under threat, with approximately one million species expected to become extinct within decades. The reason; negative human actions, which include hunting, overfishing, pollution, and the conversion of land for urbanisation and agricultural purposes. Despite significant investment from charities and governments for activities that benefit nature, global wildlife populations continue to decline. Local wildlife guardians have historically played a critical role in global conservation efforts and have shown their ability to achieve sustainability at various levels. In 2021, COP26 recognised their contributions and pledged US$1.7 billion per year; however, this is a fraction of the global biodiversity budget available (between US$124 billion and US$143 billion annually) given they protect 80% of the planets biodiversity. This paper proposes a radical new solution based on "Interspecies Money," where animals own their own money. Creating a digital twin for each species allows animals to dispense funds to their guardians for the services they provide. For example, a rhinoceros may release a payment to its guardian each time it is detected in a camera trap as long as it remains alive and well. To test the efficacy of this approach 27 camera traps were deployed over a 400km2 area in Welgevonden Game Reserve in Limpopo Province in South Africa. The motion-triggered camera traps were operational for ten months and, using deep learning, we managed to capture images of 12 distinct animal species. For each species, a makeshift bank account was set up and credited with £100. Each time an animal was captured in a camera and successfully classified, 1 penny (an arbitrary amount - mechanisms still need to be developed to determine the real value of species) was transferred from the animal account to its associated guardian. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 28 pages and 12 figures

arXiv:2203.06248 [pdf, other]

Pressure Ulcer Categorisation using Deep Learning: A Clinical Trial to Evaluate Model Performance

Authors: Paul Fergus, Carl Chalmers, William Henderson, Danny Roberts, Atif Waraich

Abstract: Pressure ulcers are a challenge for patients and healthcare professionals. In the UK, 700,000 people are affected by pressure ulcers each year. Treating them costs the National Health Service £3.8 million every day. Their etiology is complex and multifactorial. However, evidence has shown a strong link between old age, disease-related sedentary lifestyles and unhealthy eating habits. Pressure ulce… ▽ More Pressure ulcers are a challenge for patients and healthcare professionals. In the UK, 700,000 people are affected by pressure ulcers each year. Treating them costs the National Health Service £3.8 million every day. Their etiology is complex and multifactorial. However, evidence has shown a strong link between old age, disease-related sedentary lifestyles and unhealthy eating habits. Pressure ulcers are caused by direct skin contact with a bed or chair without frequent position changes. Urinary and faecal incontinence, diabetes, and injuries that restrict body position and nutrition are also known risk factors. Guidelines and treatments exist but their implementation and success vary across different healthcare settings. This is primarily because healthcare practitioners have a) minimal experience in dealing with pressure ulcers, and b) a general lack of understanding of pressure ulcer treatments. Poorly managed, pressure ulcers lead to severe pain, poor quality of life, and significant healthcare costs. In this paper, we report the findings of a clinical trial conducted by Mersey Care NHS Foundation Trust that evaluated the performance of a faster region-based convolutional neural network and mobile platform that categorised and documented pressure ulcers. The neural network classifies category I, II, III, and IV pressure ulcers, deep tissue injuries, and unstageable pressure ulcers. Photographs of pressure ulcers taken by district nurses are transmitted over 4/5G communications to an inferencing server for classification. Classified images are stored and reviewed to assess the model's predictions and relevance as a tool for clinical decision making and standardised reporting. The results from the study generated a mean average Precision=0.6796, Recall=0.6997, F1-Score=0.6786 with 45 false positives using an @.75 confidence score threshold. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 12 pages, 8 figures

arXiv:2202.02283 [pdf, ps, other]

Choosing an Appropriate Platform and Workflow for Processing Camera Trap Data using Artificial Intelligence

Authors: Juliana Vélez, Paula J. Castiblanco-Camacho, Michael A. Tabak, Carl Chalmers, Paul Fergus, John Fieberg

Abstract: Camera traps have transformed how ecologists study wildlife species distributions, activity patterns, and interspecific interactions. Although camera traps provide a cost-effective method for monitoring species, the time required for data processing can limit survey efficiency. Thus, the potential of Artificial Intelligence (AI), specifically Deep Learning (DL), to process camera-trap data has gai… ▽ More Camera traps have transformed how ecologists study wildlife species distributions, activity patterns, and interspecific interactions. Although camera traps provide a cost-effective method for monitoring species, the time required for data processing can limit survey efficiency. Thus, the potential of Artificial Intelligence (AI), specifically Deep Learning (DL), to process camera-trap data has gained considerable attention. Using DL for these applications involves training algorithms, such as Convolutional Neural Networks (CNNs), to automatically detect objects and classify species. To overcome technical challenges associated with training CNNs, several research communities have recently developed platforms that incorporate DL in easy-to-use interfaces. We review key characteristics of four AI-powered platforms -- Wildlife Insights (WI), MegaDetector (MD), Machine Learning for Wildlife Image Classification (MLWIC2), and Conservation AI -- including data management tools and AI features. We also provide R code in an open-source GitBook, to demonstrate how users can evaluate model performance, and incorporate AI output in semi-automated workflows. We found that species classifications from WI and MLWIC2 generally had low recall values (animals that were present in the images often were not classified to the correct species). Yet, the precision of WI and MLWIC2 classifications for some species was high (i.e., when classifications were made, they were generally accurate). MD, which classifies images using broader categories (e.g., "blank" or "animal"), also performed well. Thus, we conclude that, although species classifiers were not accurate enough to automate image processing, DL could be used to improve efficiencies by accepting classifications with high confidence values for certain species or by filtering images containing blanks. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 30 pages, 2 figures, 3 tables

arXiv:2110.01447 [pdf]

Real-Time Predictive Maintenance using Autoencoder Reconstruction and Anomaly Detection

Authors: Sean Givnan, Carl Chalmers, Paul Fergus, Sandra Ortega, Tom Whalley

Abstract: Rotary machine breakdown detection systems are outdated and dependent upon routine testing to discover faults. This is costly and often reactive in nature. Real-time monitoring offers a solution for detecting faults without the need for manual observation. However, manual interpretation for threshold anomaly detection is often subjective and varies between industrial experts. This approach is ridg… ▽ More Rotary machine breakdown detection systems are outdated and dependent upon routine testing to discover faults. This is costly and often reactive in nature. Real-time monitoring offers a solution for detecting faults without the need for manual observation. However, manual interpretation for threshold anomaly detection is often subjective and varies between industrial experts. This approach is ridged and prone to a large number of false positives. To address this issue, we propose a Machine Learning (ML) approach to model normal working operation and detect anomalies. The approach extracts key features from signals representing known normal operation to model machine behaviour and automatically identify anomalies. The ML learns generalisations and generates thresholds based on fault severity. This provides engineers with a traffic light system were green is normal behaviour, amber is worrying and red signifies a machine fault. This scale allows engineers to undertake early intervention measures at the appropriate time. The approach is evaluated on windowed real machine sensor data to observe normal and abnormal behaviour. The results demonstrate that it is possible to detect anomalies within the amber range and raise alarms before machine failure. △ Less

Submitted 1 October, 2021; originally announced October 2021.

arXiv:2103.07276 [pdf]

Modelling Animal Biodiversity Using Acoustic Monitoring and Deep Learning

Authors: C. Chalmers, P. Fergus, S. Wich, S. N. Longmore

Abstract: For centuries researchers have used sound to monitor and study wildlife. Traditionally, conservationists have identified species by ear; however, it is now common to deploy audio recording technology to monitor animal and ecosystem sounds. Animals use sound for communication, mating, navigation and territorial defence. Animal sounds provide valuable information and help conservationists to quantif… ▽ More For centuries researchers have used sound to monitor and study wildlife. Traditionally, conservationists have identified species by ear; however, it is now common to deploy audio recording technology to monitor animal and ecosystem sounds. Animals use sound for communication, mating, navigation and territorial defence. Animal sounds provide valuable information and help conservationists to quantify biodiversity. Acoustic monitoring has grown in popularity due to the availability of diverse sensor types which include camera traps, portable acoustic sensors, passive acoustic sensors, and even smartphones. Passive acoustic sensors are easy to deploy and can be left running for long durations to provide insights on habitat and the sounds made by animals and illegal activity. While this technology brings enormous benefits, the amount of data that is generated makes processing a time-consuming process for conservationists. Consequently, there is interest among conservationists to automatically process acoustic data to help speed up biodiversity assessments. Processing these large data sources and extracting relevant sounds from background noise introduces significant challenges. In this paper we outline an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals and modelling deep learning models to classify different bird species based on the sounds they make. The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP). Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74. △ Less

Submitted 12 March, 2021; originally announced March 2021.

arXiv:2002.12899 [pdf]

BMI: A Behavior Measurement Indicator for Fuel Poverty Using Aggregated Load Readings from Smart Meters

Authors: P. Fergus, C. Chalmers

Abstract: Fuel poverty affects between 50 and 125 million households in Europe and is a significant issue for both developed and develo** countries globally. This means that fuel poor residents are unable to adequately warm their home and run the necessary energy services needed for lighting, cooking, hot water, and electrical appliances. The problem is complex but is typically caused by three factors; lo… ▽ More Fuel poverty affects between 50 and 125 million households in Europe and is a significant issue for both developed and develo** countries globally. This means that fuel poor residents are unable to adequately warm their home and run the necessary energy services needed for lighting, cooking, hot water, and electrical appliances. The problem is complex but is typically caused by three factors; low income, high energy costs, and energy inefficient homes. In the United Kingdom (UK), 4 million families are currently living in fuel poverty. Those in series financial difficulty are either forced to self-disconnect or have their services terminated by energy providers. Fuel poverty contributed to 10,000 reported deaths in England in the winter of 2016-2107 due to homes being cold. While it is recognized by governments as a social, public health and environmental policy issue, the European Union (EU) has failed to provide a common definition of fuel poverty or a conventional set of indicators to measure it. This chapter discusses current fuel poverty strategies across the EU and proposes a new and foundational behavior measurement indicator designed to directly assess and monitor fuel poverty risks in households using smart meters, Consumer Access Device (CAD) data and machine learning. By detecting Activities of Daily Living (ADLS) through household appliance usage, it is possible to spot the early signs of financial difficulty and identify when support packages are required. △ Less

Submitted 16 February, 2020; originally announced February 2020.

Comments: 33 Pages, 12 Figures, Submitted as a book chapter to Springer

ACM Class: I.2; I.5

arXiv:2002.00833 [pdf]

Detection of Obstructive Sleep Apnoea Using Features Extracted from Segmented Time-Series ECG Signals Using a One Dimensional Convolutional Neural Network

Authors: Steven Thompson, Paul Fergus, Carl Chalmers, Denis Reilly

Abstract: The study in this paper presents a one-dimensional convolutional neural network (1DCNN) model, designed for the automated detection of obstructive Sleep Apnoea (OSA) captured from single-channel electrocardiogram (ECG) signals. The system provides mechanisms in clinical practice that help diagnose patients suffering with OSA. Using the state-of-the-art in 1DCNNs, a model is constructed using convo… ▽ More The study in this paper presents a one-dimensional convolutional neural network (1DCNN) model, designed for the automated detection of obstructive Sleep Apnoea (OSA) captured from single-channel electrocardiogram (ECG) signals. The system provides mechanisms in clinical practice that help diagnose patients suffering with OSA. Using the state-of-the-art in 1DCNNs, a model is constructed using convolutional, max pooling layers and a fully connected Multilayer Perceptron (MLP) consisting of a hidden layer and SoftMax output for classification. The 1DCNN extracts prominent features, which are used to train an MLP. The model is trained using segmented ECG signals grouped into 5 unique datasets of set window sizes. 35 ECG signal recordings were selected from an annotated database containing 70 night-time ECG recordings. (Group A = a01 to a20 (Apnoea breathing), Group B = b01 to b05 (moderate), and Group C = c01 to c10 (normal). A total of 6514 minutes of Apnoea was recorded. Evaluation of the model is performed using a set of standard metrics which show the proposed model achieves high classification results in both training and validation using our windowing strategy, particularly W=500 (Sensitivity 0.9705, Specificity 0.9725, F1 Score 0.9717, Kappa Score 0.9430, Log Loss 0.0836, ROCAUC 0.9945). This demonstrates the model can identify the presence of Apnoea with a high degree of accuracy. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: 8 pages, 6 figures, 10 tables

ACM Class: I.2.0; I.5.1

arXiv:1910.07360 [pdf]

Conservation AI: Live Stream Analysis for the Detection of Endangered Species Using Convolutional Neural Networks and Drone Technology

Authors: C. Chalmers, P. Fergus, Serge Wich, Aday Curbelo Montanez

Abstract: Many different species are adversely affected by poaching. In response to this escalating crisis, efforts to stop poaching using hidden cameras, drones and DNA tracking have been implemented with varying degrees of success. Limited resources, costs and logistical limitations are often the cause of most unsuccessful poaching interventions. The study presented in this paper outlines a flexible and i… ▽ More Many different species are adversely affected by poaching. In response to this escalating crisis, efforts to stop poaching using hidden cameras, drones and DNA tracking have been implemented with varying degrees of success. Limited resources, costs and logistical limitations are often the cause of most unsuccessful poaching interventions. The study presented in this paper outlines a flexible and interoperable framework for the automatic detection of animals and poaching activity to facilitate early intervention practices. Using a robust deep learning pipeline, a convolutional neural network is trained and implemented to detect rhinos and cars (considered an important tool in poaching for fast access and artefact transportation in natural habitats) in the study, that are found within live video streamed from drones Transfer learning with the Faster RCNN Resnet 101 is performed to train a custom model with 350 images of rhinos and 350 images of cars. Inference is performed using a frame sampling technique to address the required trade-off control precision and processing speed and maintain synchronisation with the live feed. Inference models are hosted on a web platform using flask web serving, OpenCV and TensorFlow 1.13. Video streams are transmitted from a DJI Mavic Pro 2 drone using the Real-Time Messaging Protocol (RMTP). The best trained Faster RCNN model achieved a mAP of 0.83 @IOU 0.50 and 0.69 @IOU 0.75 respectively. In comparison an SSD-mobilenetmodel trained under the same experimental conditions achieved a mAP of 0.55 @IOU .50 and 0.27 @IOU 0.75.The results demonstrate that using a FRCNN and off-the-shelf drones is a promising and scalable option for a range of conservation projects. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: The papaer is 10 pages and contains 11 images and 1 table

arXiv:1908.10166 [pdf]

SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity

Authors: Casimiro Aday Curbelo Montañez, Paul Fergus, Carl Chalmers, Nurul Ahamed Hassain Malim, Basma Abdulaimma, Denis Reilly, Francesco Falciani

Abstract: One of the most important challenges in the analysis of high-throughput genetic data is the development of efficient computational methods to identify statistically significant Single Nucleotide Polymorphisms (SNPs). Genome-wide association studies (GWAS) use single-locus analysis where each SNP is independently tested for association with phenotypes. The limitation with this approach, however, is… ▽ More One of the most important challenges in the analysis of high-throughput genetic data is the development of efficient computational methods to identify statistically significant Single Nucleotide Polymorphisms (SNPs). Genome-wide association studies (GWAS) use single-locus analysis where each SNP is independently tested for association with phenotypes. The limitation with this approach, however, is its inability to explain genetic variation in complex diseases. Alternative approaches are required to model the intricate relationships between SNPs. Our proposed approach extends GWAS by combining deep learning stacked autoencoders (SAEs) and association rule mining (ARM) to identify epistatic interactions between SNPs. Following traditional GWAS quality control and association analysis, the most significant SNPs are selected and used in the subsequent analysis to investigate epistasis. SAERMA controls the classification results produced in the final fully connected multi-layer feedforward artificial neural network (MLP) by manipulating the interestingness measures, support and confidence, in the rule generation process. The best classification results were achieved with 204 SNPs compressed to 100 units (77% AUC, 77% SE, 68% SP, 53% Gini, logloss=0.58, and MSE=0.20), although it was possible to achieve 73% AUC (77% SE, 63% SP, 45% Gini, logloss=0.62, and MSE=0.21) with 50 hidden units - both supported by close model interpretation. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: 12 pages, 6 figures, 12 tables, 9 equations, journal

arXiv:1908.02338 [pdf, other]

Modelling Segmented Cardiotocography Time-Series Signals Using One-Dimensional Convolutional Neural Networks for the Early Detection of Abnormal Birth Outcomes

Authors: Paul Fergus, Carl Chalmers, Casimiro Curbelo Montanez, Denis Reilly, Paulo Lisboa, Beth Pineles

Abstract: Gynaecologists and obstetricians visually interpret cardiotocography (CTG) traces using the International Federation of Gynaecology and Obstetrics (FIGO) guidelines to assess the wellbeing of the foetus during antenatal care. This approach has raised concerns among professionals with regards to inter- and intra-variability where clinical diagnosis only has a 30\% positive predictive value when cla… ▽ More Gynaecologists and obstetricians visually interpret cardiotocography (CTG) traces using the International Federation of Gynaecology and Obstetrics (FIGO) guidelines to assess the wellbeing of the foetus during antenatal care. This approach has raised concerns among professionals with regards to inter- and intra-variability where clinical diagnosis only has a 30\% positive predictive value when classifying pathological outcomes. Machine learning models, trained with FIGO and other user derived features extracted from CTG traces, have been shown to increase positive predictive capacity and minimise variability. This is only possible however when class distributions are equal which is rarely the case in clinical trials where case-control observations are heavily skewed in favour of normal outcomes. Classes can be balanced using either synthetic data derived from resampled case training data or by decreasing the number of control instances. However, this either introduces bias or removes valuable information. Concerns have also been raised regarding machine learning studies and their reliance on manually handcrafted features. While this has led to some interesting results, deriving an optimal set of features is considered to be an art as well as a science and is often an empirical and time consuming process. In this paper, we address both of these issues and propose a novel CTG analysis methodology that a) splits CTG time-series signals into n-size windows with equal class distributions, and b) automatically extracts features from time-series windows using a one dimensional convolutional neural network (1DCNN) and multilayer perceptron (MLP) ensemble. Collectively, the proposed approach normally distributes classes and removes the need to handcrafted features from CTG traces. △ Less

Submitted 22 August, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

Comments: 11 Pages, 12 Figures (excluding profile pictures), accepted for publication in IEEE Transactions in Emerging Topics in Computational Intelligence

arXiv:1903.12080 [pdf]

Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation

Authors: C. Chalmers, P. Fergus, C. Aday Curbelo Montanez, S. Sikdar, F. Ball, B. Kendall

Abstract: The emergence of an ageing population is a significant public health concern. This has led to an increase in the number of people living with progressive neurodegenerative disorders like dementia. Consequently, the strain this is places on health and social care services means providing 24-hour monitoring is not sustainable. Technological intervention is being considered, however no solution exist… ▽ More The emergence of an ageing population is a significant public health concern. This has led to an increase in the number of people living with progressive neurodegenerative disorders like dementia. Consequently, the strain this is places on health and social care services means providing 24-hour monitoring is not sustainable. Technological intervention is being considered, however no solution exists to non-intrusively monitor the independent living needs of patients with dementia. As a result many patients hit crisis point before intervention and support is provided. In parallel, patient care relies on feedback from informal carers about significant behavioural changes. Yet, not all people have a social support network and early intervention in dementia care is often missed. The smart meter rollout has the potential to change this. Using machine learning and signal processing techniques, a home energy supply can be disaggregated to detect which home appliances are turned on and off. This will allow Activities of Daily Living (ADLs) to be assessed, such as eating and drinking, and observed changes in routine to be detected for early intervention. The primary aim is to help reduce deterioration and enable patients to stay in their homes for longer. A Support Vector Machine (SVM) and Random Decision Forest classifier are modelled using data from three test homes. The trained models are then used to monitor two patients with dementia during a six-month clinical trial undertaken in partnership with Mersey Care NHS Foundation Trust. In the case of load disaggregation for appliance detection, the SVM achieved (AUC=0.86074, Sen=0.756 and Spec=0.92838). While the Decision Forest achieved (AUC=0.9429, Sen=0.9634 and Spec=0.9634). ADLs are also analysed to identify the behavioural patterns of the occupant while detecting alterations in routine. △ Less

Submitted 18 March, 2019; originally announced March 2019.

arXiv:1808.09517 [pdf]

Extracting Epistatic Interactions in Type 2 Diabetes Genome-Wide Data Using Stacked Autoencoder

Authors: Basma Abdulaimma, Paul Fergus, Carl Chalmers

Abstract: 2 Diabetes is a leading worldwide public health concern, and its increasing prevalence has significant health and economic importance in all nations. The condition is a multifactorial disorder with a complex aetiology. The genetic determinants remain largely elusive, with only a handful of identified candidate genes. Genome wide association studies (GWAS) promised to significantly enhance our unde… ▽ More 2 Diabetes is a leading worldwide public health concern, and its increasing prevalence has significant health and economic importance in all nations. The condition is a multifactorial disorder with a complex aetiology. The genetic determinants remain largely elusive, with only a handful of identified candidate genes. Genome wide association studies (GWAS) promised to significantly enhance our understanding of genetic based determinants of common complex diseases. To date, 83 single nucleotide polymorphisms (SNPs) for type 2 diabetes have been identified using GWAS. Standard statistical tests for single and multi-locus analysis such as logistic regression, have demonstrated little effect in understanding the genetic architecture of complex human diseases. Logistic regression is modelled to capture linear interactions but neglects the non-linear epistatic interactions present within genetic data. There is an urgent need to detect epistatic interactions in complex diseases as this may explain the remaining missing heritability in such diseases. In this paper, we present a novel framework based on deep learning algorithms that deal with non-linear epistatic interactions that exist in genome wide association data. Logistic association analysis under an additive genetic model, adjusted for genomic control inflation factor, is conducted to remove statistically improbable SNPs to minimize computational overheads. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Comments: 9 pages, 1 figure

arXiv:1808.06503 [pdf]

Collaborative Pressure Ulcer Prevention: An Automated Skin Damage and Pressure Ulcer Assessment Tool for Nursing Professionals, Patients, Family Members and Carers

Authors: Paul Fergus, Carl Chalmers, David Tully

Abstract: This paper describes the Pressure Ulcers Online Website, which is a first step solution towards a new and innovative platform for hel** people to detect, understand and manage pressure ulcers. It outlines the reasons why the project has been developed and provides a central point of contact for pressure ulcer analysis and ongoing research. Using state-of-the-art technologies in convolutional neu… ▽ More This paper describes the Pressure Ulcers Online Website, which is a first step solution towards a new and innovative platform for hel** people to detect, understand and manage pressure ulcers. It outlines the reasons why the project has been developed and provides a central point of contact for pressure ulcer analysis and ongoing research. Using state-of-the-art technologies in convolutional neural networks and transfer learning along with end-to-end web technologies, this platform allows pressure ulcers to be analysed and findings to be reported. As the system evolves through collaborative partnerships, future versions will provide decision support functions to describe the complex characteristics of pressure ulcers along with information on wound care across multiple user boundaries. This project is therefore intended to raise awareness and support for people suffering with or providing care for pressure ulcers. △ Less

Submitted 17 August, 2018; originally announced August 2018.

Comments: 5 Pages, 7 figures, Position Paper

arXiv:1804.06262 [pdf]

Analysis of Extremely Obese Individuals Using Deep Learning Stacked Autoencoders and Genome-Wide Genetic Data

Authors: Casimiro A. Curbelo Montañez, Paul Fergus, Carl Chalmers, Jade Hind

Abstract: The aetiology of polygenic obesity is multifactorial, which indicates that life-style and environmental factors may influence multiples genes to aggravate this disorder. Several low-risk single nucleotide polymorphisms (SNPs) have been associated with BMI. However, identified loci only explain a small proportion of the variation ob-served for this phenotype. The linear nature of genome wide associ… ▽ More The aetiology of polygenic obesity is multifactorial, which indicates that life-style and environmental factors may influence multiples genes to aggravate this disorder. Several low-risk single nucleotide polymorphisms (SNPs) have been associated with BMI. However, identified loci only explain a small proportion of the variation ob-served for this phenotype. The linear nature of genome wide association studies (GWAS) used to identify associations between genetic variants and the phenotype have had limited success in explaining the heritability variation of BMI and shown low predictive capacity in classification studies. GWAS ignores the epistatic interactions that less significant variants have on the phenotypic outcome. In this paper we utilise a novel deep learning-based methodology to reduce the high dimensional space in GWAS and find epistatic interactions between SNPs for classification purposes. SNPs were filtered based on the effects associations have with BMI. Since Bonferroni adjustment for multiple testing is highly conservative, an important proportion of SNPs involved in SNP-SNP interactions are ignored. Therefore, only SNPs with p-values < 1x10-2 were considered for subsequent epistasis analysis using stacked auto encoders (SAE). This allows the nonlinearity present in SNP-SNP interactions to be discovered through progressively smaller hidden layer units and to initialise a multi-layer feedforward artificial neural network (ANN) classifier. The classifier is fine-tuned to classify extremely obese and non-obese individuals. The best results were obtained with 2000 compressed units (SE=0.949153, SP=0.933014, Gini=0.949936, Lo-gloss=0.1956, AUC=0.97497 and MSE=0.054057). Using 50 compressed units it was possible to achieve (SE=0.785311, SP=0.799043, Gini=0.703566, Logloss=0.476864, AUC=0.85178 and MSE=0.156315). △ Less

Submitted 24 August, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

Comments: 13 pages, 4 figures, 13 equations, 2 tables, conference

arXiv:1804.03198 [pdf]

Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

Authors: Casimiro Adays Curbelo Montañez, Paul Fergus, Almudena Curbelo Montañez, Carl Chalmers

Abstract: In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and… ▽ More In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and the important interactions between them. Basic statistical methods and techniques for the analysis of genetic SNP data from population-based genome-wide studies have been considered. Statistical association testing between individual SNPs and obesity was conducted under an additive model using logistic regression. Four subsets of loci after quality-control (QC) and association analysis were selected: P-values lower than 1x10-5 (5 SNPs), 1x10-4 (32 SNPs), 1x10-3 (248 SNPs) and 1x10-2 (2465 SNPs). A deep learning classifier is initialised using these sets of SNPs and fine-tuned to classify obese and non-obese observations. Using a deep learning classifier model and genetic variants with P-value < 1x10-2 (2465 SNPs) it was possible to obtain results (SE=0.9604, SP=0.9712, Gini=0.9817, LogLoss=0.1150, AUC=0.9908 and MSE=0.0300). As the P-value increased, an evident deterioration in performance was observed. Results demonstrate that single SNP analysis fails to capture the cumulative effect of less significant variants and their overall contribution to the outcome in disease prediction, which is captured using a deep learning framework. △ Less

Submitted 24 August, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: 8 pages, 2 figures, 4 tables, 9 equations, conference

arXiv:1801.02977 [pdf, other]

Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women

Authors: Paul Fergus, Casimiro Curbelo Montanez, Basma Abdulaimma, Paulo Lisboa, Carl Chalmers

Abstract: Genome Wide Association Studies (GWAS) are used to identify statistically significant genetic variants in case-control studies. GWAS typically use a p-value threshold of 5 x 10-8 to identify highly ranked single nucleotide polymorphisms (SNPs). However, evidence has shown that many of these are, in fact, false positives. Using lower p-values it is possible to to investigate the joint epistatic int… ▽ More Genome Wide Association Studies (GWAS) are used to identify statistically significant genetic variants in case-control studies. GWAS typically use a p-value threshold of 5 x 10-8 to identify highly ranked single nucleotide polymorphisms (SNPs). However, evidence has shown that many of these are, in fact, false positives. Using lower p-values it is possible to to investigate the joint epistatic interactions between SNPs and provide better insights into phenotype expression. However, computational complexity is increased exponentially as a function of higher-order combinations. In this paper, we propose a novel framework, based on nonlinear transformations of combinatorically large SNP data, using stacked autoencoders, to identify higher-order SNP interactions. We focus on the challenging problem of classifying preterm births. Evidence suggests that this complex condition has a strong genetic component with unexplained heritability reportedly between 20%-40%. This claim is substantiated using a GWAS data set, obtained from dbGap, which contains predominantly urban low-income African-American women who had normal deliveries (between 37 and 42 weeks of gestation) and preterm deliveries (less than 37 weeks of gestation). Latent representations from original SNP sequences are used to initialize a deep learning classifier before it is fine-tuned for classification tasks (term and preterm births). The complete network models the epistatic effects of major and minor SNP perturbations. All models are evaluated using standard binary classifier performance metrics. The findings show that important information pertaining to SNPs and epistasis can be extracted from 4666 raw SNPs generated using logistic regression (p-value=5 x 10-3) and used to fit a deep learning model and obtain results (Sen=0.9289, Spec=0.9591, Gini=0.9651, Logloss=0.3080, AUC=0.9825, MSE=0.0942) using 500 hidden nodes. △ Less

Submitted 6 January, 2018; originally announced January 2018.

Comments: 11 pages, 18 equations, four figures, journal paper

Showing 1–17 of 17 results for author: Chalmers, C