Search | arXiv e-print repository

arXiv:2312.09387 [pdf, other]

High-Resolution Maps of Left Atrial Displacements and Strains Estimated with 3D CINE MRI and Unsupervised Neural Networks

Authors: Christoforos Galazis, Samuel Shepperd, Emma Brouwer, Sandro Queirós, Ebraham Alskaf, Mustafa Anjari, Amedeo Chiribiri, Jack Lee, Anil A. Bharath, Marta Varela

Abstract: The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed… ▽ More The functional analysis of the left atrium (LA) is important for evaluating cardiac health and understanding diseases like atrial fibrillation. Cine MRI is ideally placed for the detailed 3D characterisation of LA motion and deformation, but it is lacking appropriate acquisition and analysis tools. In this paper, we present Analysis for Left Atrial Displacements and Deformations using unsupervIsed neural Networks, \textit{Aladdin}, to automatically and reliably characterise regional LA deformations from high-resolution 3D Cine MRI. The tool includes: an online few-shot segmentation network (Aladdin-S), an online unsupervised image registration network (Aladdin-R), and a strain calculations pipeline tailored to the LA. We create maps of LA Displacement Vector Field (DVF) magnitude and LA principal strain values from images of 10 healthy volunteers and 8 patients with cardiovascular disease (CVD). We additionally create an atlas of these biomarkers using the data from the healthy volunteers. Aladdin is able to accurately track the LA wall across the cardiac cycle and characterize its motion and deformation. The overall DVF magnitude and principal strain values are significantly higher in the healthy group vs CVD patients: $2.85 \pm 1.59~mm$ and $0.09 \pm 0.05$ vs $1.96 \pm 0.74~mm$ and $0.03 \pm 0.04$, respectively. The time course of these metrics is also different in the two groups, with a more marked active contraction phase observed in the healthy cohort. Finally, utilizing the LA atlas allows us to identify regional deviations from the population distribution that may indicate focal tissue abnormalities. The proposed tool for the quantification of novel regional LA deformation biomarkers should have important clinical applications. The source code, anonymized images, generated maps and atlas are publicly available: https://github.com/cgalaz01/aladdin_cmr_la. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2309.02179 [pdf, other]

High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network

Authors: Christoforos Galazis, Anil Anthony Bharath, Marta Varela

Abstract: Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and d… ▽ More Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and deformation in 3D, at higher spatial resolution and with full LA coverage. However, there are no dedicated tools to automatically characterise LA motion in 3D. Thus, we propose a tool that automatically segments the LA and extracts the displacement fields across the cardiac cycle. The pipeline is able to accurately track the LA wall across the cardiac cycle with an average Hausdorff distance of $2.51 \pm 1.3~mm$ and Dice score of $0.96 \pm 0.02$. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Journal ref: Medical Imaging with Deep Learning, short paper track, 2023

arXiv:2212.10877 [pdf, other]

doi 10.1016/j.compbiomed.2022.106422

TMS-Net: A Segmentation Network Coupled With A Run-time Quality Control Method For Robust Cardiac Image Segmentation

Authors: Fatmatulzehra Uslu, Anil A. Bharath

Abstract: Recently, deep networks have shown impressive performance for the segmentation of cardiac Magnetic Resonance Imaging (MRI) images. However, their achievement is proving slow to transition to widespread use in medical clinics because of robustness issues leading to low trust of clinicians to their results. Predicting run-time quality of segmentation masks can be useful to warn clinicians against po… ▽ More Recently, deep networks have shown impressive performance for the segmentation of cardiac Magnetic Resonance Imaging (MRI) images. However, their achievement is proving slow to transition to widespread use in medical clinics because of robustness issues leading to low trust of clinicians to their results. Predicting run-time quality of segmentation masks can be useful to warn clinicians against poor results. Despite its importance, there are few studies on this problem. To address this gap, we propose a quality control method based on the agreement across decoders of a multi-view network, TMS-Net, measured by the cosine similarity. The network takes three view inputs resliced from the same 3D image along different axes. Different from previous multi-view networks, TMS-Net has a single encoder and three decoders, leading to better noise robustness, segmentation performance and run-time quality estimation in our experiments on the segmentation of the left atrium on STACOM 2013 and STACOM 2018 challenge datasets. We also present a way to generate poor segmentation masks by using noisy images generated with engineered noise and Rician noise to simulate undertraining, high anisotropy and poor imaging settings problems. Our run-time quality estimation method show a good classification of poor and good quality segmentation masks with an AUC reaching to 0.97 on STACOM 2018. We believe that TMS-Net and our run-time quality estimation method has a high potential to increase the thrust of clinicians to automatic image analysis tools. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Journal ref: Computers in Biology and Medicine (2022): 106422

arXiv:2212.03012 [pdf, other]

Estimation of fibre architecture and scar in myocardial tissue using electrograms: an in-silico study

Authors: Konstantinos Ntagiantas, Eduardo Pignatelli, Nicholas S. Peters, Chris D. Cantwell, Rasheda A. Chowdhury, Anil A. Bharath

Abstract: Atrial Fibrillation (AF) is characterized by disorganised electrical activity in the atria and is known to be sustained by the presence of regions of fibrosis (scars) or functional cellular remodeling, both of which may lead to areas of slow conduction. Estimating the effective conductivity of the myocardium and identifying regions of abnormal propagation is therefore crucial for the effective tre… ▽ More Atrial Fibrillation (AF) is characterized by disorganised electrical activity in the atria and is known to be sustained by the presence of regions of fibrosis (scars) or functional cellular remodeling, both of which may lead to areas of slow conduction. Estimating the effective conductivity of the myocardium and identifying regions of abnormal propagation is therefore crucial for the effective treatment of AF. We hypothesise that the spatial distribution of tissue conductivity can be directly inferred from an array of concurrently acquired contact electrograms (EGMs). We generate a dataset of simulated cardiac AP propagation using randomised scar distributions and a phenomenological cardiac model and calculate contact EGMs at various positions on the field. EGMs are enriched with noise extracted from biological data acquired in the lab. A deep neural network, based on a modified U-net architecture, is trained to estimate the location of the scar and quantify conductivity of the tissue with a Jaccard index of 91%. We adapt a wavelet-based surrogate testing analysis to confirm that the inferred conductivity distribution is an accurate representation of the ground truth input to the model. We find that the root mean square error (RMSE) between the ground truth and our predictions is significantly smaller ($p_{val}<0.01$) than the RMSE between the ground truth and surrogate samples. △ Less

Submitted 21 February, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 16 pages, 15 figures

arXiv:2205.07852 [pdf, other]

REMuS-GNN: A Rotation-Equivariant Model for Simulating Continuum Dynamics

Authors: Mario Lino, Stati Fotiadis, Anil A. Bharath, Chris Cantwell

Abstract: Numerical simulation is an essential tool in many areas of science and engineering, but its performance often limits application in practice or when used to explore large parameter spaces. On the other hand, surrogate deep learning models, while accelerating simulations, often exhibit poor accuracy and ability to generalise. In order to improve these two factors, we introduce REMuS-GNN, a rotation… ▽ More Numerical simulation is an essential tool in many areas of science and engineering, but its performance often limits application in practice or when used to explore large parameter spaces. On the other hand, surrogate deep learning models, while accelerating simulations, often exhibit poor accuracy and ability to generalise. In order to improve these two factors, we introduce REMuS-GNN, a rotation-equivariant multi-scale model for simulating continuum dynamical systems encompassing a range of length scales. REMuS-GNN is designed to predict an output vector field from an input vector field on a physical domain discretised into an unstructured set of nodes. Equivariance to rotations of the domain is a desirable inductive bias that allows the network to learn the underlying physics more efficiently, leading to improved accuracy and generalisation compared with similar architectures that lack such symmetry. We demonstrate and evaluate this method on the incompressible flow around elliptical cylinders. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Accepted at the ICLR 2022 Workshop on Geometrical and Topological Representation Learning

arXiv:2205.02637 [pdf, other]

Towards Fast Simulation of Environmental Fluid Mechanics with Multi-Scale Graph Neural Networks

Authors: Mario Lino, Stathi Fotiadis, Anil A. Bharath, Chris Cantwell

Abstract: Numerical simulators are essential tools in the study of natural fluid-systems, but their performance often limits application in practice. Recent machine-learning approaches have demonstrated their ability to accelerate spatio-temporal predictions, although, with only moderate accuracy in comparison. Here we introduce MultiScaleGNN, a novel multi-scale graph neural network model for learning to i… ▽ More Numerical simulators are essential tools in the study of natural fluid-systems, but their performance often limits application in practice. Recent machine-learning approaches have demonstrated their ability to accelerate spatio-temporal predictions, although, with only moderate accuracy in comparison. Here we introduce MultiScaleGNN, a novel multi-scale graph neural network model for learning to infer unsteady continuum mechanics in problems encompassing a range of length scales and complex boundary geometries. We demonstrate this method on advection problems and incompressible fluid dynamics, both fundamental phenomena in oceanic and atmospheric processes. Our results show good extrapolation to new domain geometries and parameters for long-term temporal simulations. Simulations obtained with MultiScaleGNN are between two and four orders of magnitude faster than those on which it was trained. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Accepted at the ICLR 2022 Workshop on AI for Earth and Space Science. arXiv admin note: substantial text overlap with arXiv:2106.04900

arXiv:2203.00355 [pdf, other]

doi 10.1007/978-3-030-93722-5_29

Tempera: Spatial Transformer Feature Pyramid Network for Cardiac MRI Segmentation

Authors: Christoforos Galazis, Huiyi Wu, Zhuoyu Li, Camille Petri, Anil A. Bharath, Marta Varela

Abstract: Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid… ▽ More Assessing the structure and function of the right ventricle (RV) is important in the diagnosis of several cardiac pathologies. However, it remains more challenging to segment the RV than the left ventricle (LV). In this paper, we focus on segmenting the RV in both short (SA) and long-axis (LA) cardiac MR images simultaneously. For this task, we propose a new multi-input/output architecture, hybrid 2D/3D geometric spatial TransformEr Multi-Pass fEature pyRAmid (Tempera). Our feature pyramid extends current designs by allowing not only a multi-scale feature output but multi-scale SA and LA input images as well. Tempera transfers learned features between SA and LA images via layer weight sharing and incorporates a geometric target transformer to map the predicted SA segmentation to LA space. Our model achieves an average Dice score of 0.836 and 0.798 for the SA and LA, respectively, and 26.31 mm and 31.19 mm Hausdorff distances. This opens up the potential for the incorporation of RV segmentation models into clinical workflows. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Journal ref: Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge. STACOM 2021. Lecture Notes in Computer Science, vol 13131

arXiv:2108.11684 [pdf, other]

Disentangled Generative Models for Robust Prediction of System Dynamics

Authors: Stathi Fotiadis, Mario Lino, Shunlong Hu, Stef Garasto, Chris D Cantwell, Anil Anthony Bharath

Abstract: Deep neural networks have become increasingly of interest in dynamical system prediction, but out-of-distribution generalization and long-term stability still remains challenging. In this work, we treat the domain parameters of dynamical systems as factors of variation of the data generating process. By leveraging ideas from supervised disentanglement and causal factorization, we aim to separate t… ▽ More Deep neural networks have become increasingly of interest in dynamical system prediction, but out-of-distribution generalization and long-term stability still remains challenging. In this work, we treat the domain parameters of dynamical systems as factors of variation of the data generating process. By leveraging ideas from supervised disentanglement and causal factorization, we aim to separate the domain parameters from the dynamics in the latent space of generative models. In our experiments we model dynamics both in phase space and in video sequences and conduct rigorous OOD evaluations. Results indicate that disentangled VAEs adapt better to domain parameters spaces that were not present in the training data. At the same time, disentanglement can improve the long-term and out-of-distribution predictions of state-of-the-art models in video sequences. △ Less

Submitted 1 June, 2023; v1 submitted 26 August, 2021; originally announced August 2021.

arXiv:2108.07887 [pdf, other]

doi 10.1007/978-3-030-89370-5_3

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

Authors: Tianhong Dai, Hengyan Liu, Kai Arulkumaran, Guangyu Ren, Anil Anthony Bharath

Abstract: Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training… ▽ More Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks. △ Less

Submitted 8 November, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: Pacific Rim International Conference on Artificial Intelligence, 2021

arXiv:2106.04900 [pdf, other]

Simulating Continuum Mechanics with Multi-Scale Graph Neural Networks

Authors: Mario Lino, Chris Cantwell, Anil A. Bharath, Stathi Fotiadis

Abstract: Continuum mechanics simulators, numerically solving one or more partial differential equations, are essential tools in many areas of science and engineering, but their performance often limits application in practice. Recent modern machine learning approaches have demonstrated their ability to accelerate spatio-temporal predictions, although, with only moderate accuracy in comparison. Here we intr… ▽ More Continuum mechanics simulators, numerically solving one or more partial differential equations, are essential tools in many areas of science and engineering, but their performance often limits application in practice. Recent modern machine learning approaches have demonstrated their ability to accelerate spatio-temporal predictions, although, with only moderate accuracy in comparison. Here we introduce MultiScaleGNN, a novel multi-scale graph neural network model for learning to infer unsteady continuum mechanics. MultiScaleGNN represents the physical domain as an unstructured set of nodes, and it constructs one or more graphs, each of them encoding different scales of spatial resolution. Successive learnt message passing between these graphs improves the ability of GNNs to capture and forecast the system state in problems encompassing a range of length scales. Using graph representations, MultiScaleGNN can impose periodic boundary conditions as an inductive bias on the edges in the graphs, and achieve independence to the nodes' positions. We demonstrate this method on advection problems and incompressible fluid dynamics. Our results show that the proposed model can generalise from uniform advection fields to high-gradient fields on complex domains at test time and infer long-term Navier-Stokes solutions within a range of Reynolds numbers. Simulations obtained with MultiScaleGNN are between two and four orders of magnitude faster than the ones on which it was trained. △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2011.13467 [pdf, other]

Episodic Self-Imitation Learning with Hindsight

Authors: Tianhong Dai, Hengyan Liu, Anil Anthony Bharath

Abstract: Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A s… ▽ More Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:2002.08981 [pdf, other]

Comparing recurrent and convolutional neural networks for predicting wave propagation

Authors: Stathi Fotiadis, Eduardo Pignatelli, Mario Lino Valencia, Chris Cantwell, Amos Storkey, Anil A. Bharath

Abstract: Dynamical systems can be modelled by partial differential equations and numerical computations are used everywhere in science and engineering. In this work, we investigate the performance of recurrent and convolutional deep neural network architectures to predict the surface waves. The system is governed by the Saint-Venant equations. We improve on the long-term prediction over previous methods wh… ▽ More Dynamical systems can be modelled by partial differential equations and numerical computations are used everywhere in science and engineering. In this work, we investigate the performance of recurrent and convolutional deep neural network architectures to predict the surface waves. The system is governed by the Saint-Venant equations. We improve on the long-term prediction over previous methods while kee** the inference time at a fraction of numerical simulations. We also show that convolutional networks perform at least as well as recurrent networks in this task. Finally, we assess the generalisation capability of each network by extrapolating in longer time-frames and in different physical settings. △ Less

Submitted 20 April, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

arXiv:1912.08324 [pdf, other]

doi 10.1016/j.neucom.2022.04.005

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Authors: Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath

Abstract: Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects… ▽ More Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap. However, less work has gone into understanding such agents - which are deployed in the real world - beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation. We train agents for Fetch and Jaco robots on a visuomotor control task and evaluate how well they generalise using different testing conditions. Finally, we investigate the internals of the trained agents by using a suite of interpretability techniques. Our results show that the primary outcome of domain randomisation is more robust, entangled representations, accompanied with larger weights with greater spatial structure; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Additionally, we demonstrate that our domain randomised agents require higher sample complexity, can overfit and more heavily rely on recurrent processing. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the combination of inspection tools in order to provide sufficient insights into the behaviour of trained agents. △ Less

Submitted 17 February, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

arXiv:1911.09615 [pdf, other]

Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control

Authors: Marta Sarrico, Kai Arulkumaran, Andrea Agostinelli, Pierre Richemond, Anil Anthony Bharath

Abstract: Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way t… ▽ More Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way to further improve the sample efficiency of these approaches is to use more principled exploration strategies. In this work, we therefore propose maximum entropy mellowmax episodic control (MEMEC), which samples actions according to a Boltzmann policy with a state-dependent temperature. We demonstrate that MEMEC outperforms other uncertainty- and softmax-based exploration methods on classic reinforcement learning environments and Atari games, achieving both more rapid learning and higher final rewards. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: Workshop on Biological and Artificial Reinforcement Learning, NeurIPS 2019

arXiv:1911.09560 [pdf, other]

Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means

Authors: Andrea Agostinelli, Kai Arulkumaran, Marta Sarrico, Pierre Richemond, Anil Anthony Bharath

Abstract: Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations… ▽ More Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations in memory is essential to speed up the learning and avoid catastrophic forgetting. Unfortunately, EC methods have a large space and time complexity. We investigate different solutions to these problems based on prioritising and ranking stored states, as well as online clustering techniques. We also propose a new dynamic online k-means algorithm that is both computationally-efficient and yields significantly better performance at smaller memory sizes; we validate this approach on classic reinforcement learning environments and Atari games. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: Workshop on Biological and Artificial Reinforcement Learning, NeurIPS 2019

arXiv:1812.01609 [pdf, other]

Approximating the solution to wave propagation using deep neural networks

Authors: Wilhelm E. Sorteberg, Stef Garasto, Alison S. Pouplin, Chris D. Cantwell, Anil A. Bharath

Abstract: Humans gain an implicit understanding of physical laws through observing and interacting with the world. Endowing an autonomous agent with an understanding of physical laws through experience and observation is seldom practical: we should seek alternatives. Fortunately, many of the laws of behaviour of the physical world can be derived from prior knowledge of dynamical systems, expressed through t… ▽ More Humans gain an implicit understanding of physical laws through observing and interacting with the world. Endowing an autonomous agent with an understanding of physical laws through experience and observation is seldom practical: we should seek alternatives. Fortunately, many of the laws of behaviour of the physical world can be derived from prior knowledge of dynamical systems, expressed through the use of partial differential equations. In this work, we suggest a neural network capable of understanding a specific physical phenomenon: wave propagation in a two-dimensional medium. We define `understanding' in this context as the ability to predict the future evolution of the spatial patterns of rendered wave amplitude from a relatively small set of initial observations. The inherent complexity of the wave equations -- together with the existence of reflections and interference -- makes the prediction problem non-trivial. A network capable of making approximate predictions also unlocks the opportunity to speed-up numerical simulations for wave propagation. To this aim, we created a novel dataset of simulated wave motion and built a predictive deep neural network comprising of three main blocks: an encoder, a propagator made by 3 LSTMs, and a decoder. Results show reasonable predictions for as long as 80 time steps into the future on a dataset not seen during training. Furthermore, the network is able to generalize to an initial condition that is qualitatively different from those seen during training. △ Less

Submitted 4 December, 2018; originally announced December 2018.

Comments: Accepted to the NeurIPS 2018 Workshop "Modeling the Physical World: Perception, Learning, and Control" (extended abstract)

arXiv:1810.04227 [pdf, other]

doi 10.1016/j.compbiomed.2018.10.015

Rethinking multiscale cardiac electrophysiology with machine learning and predictive modelling

Authors: Chris D. Cantwell, Yumnah Mohamied, Konstantinos N. Tzortzis, Stef Garasto, Charles Houston, Rasheda A. Chowdhury, Fu Siong Ng, Anil A. Bharath, Nicholas S. Peters

Abstract: We review some of the latest approaches to analysing cardiac electrophysiology data using machine learning and predictive modelling. Cardiac arrhythmias, particularly atrial fibrillation, are a major global healthcare challenge. Treatment is often through catheter ablation, which involves the targeted localized destruction of regions of the myocardium responsible for initiating or perpetuating the… ▽ More We review some of the latest approaches to analysing cardiac electrophysiology data using machine learning and predictive modelling. Cardiac arrhythmias, particularly atrial fibrillation, are a major global healthcare challenge. Treatment is often through catheter ablation, which involves the targeted localized destruction of regions of the myocardium responsible for initiating or perpetuating the arrhythmia. Ablation targets are either anatomically defined, or identified based on their functional properties as determined through the analysis of contact intracardiac electrograms acquired with increasing spatial density by modern electroanatomic map** systems. While numerous quantitative approaches have been investigated over the past decades for identifying these critical curative sites, few have provided a reliable and reproducible advance in success rates. Machine learning techniques, including recent deep-learning approaches, offer a potential route to gaining new insight from this wealth of highly complex spatio-temporal information that existing methods struggle to analyse. Coupled with predictive modelling, these techniques offer exciting opportunities to advance the field and produce more accurate diagnoses and robust personalised treatment. We outline some of these methods and illustrate their use in making predictions from the contact electrogram and augmenting predictive modelling tools, both by more rapidly predicting future states of the system and by inferring the parameters of these models from experimental observations. △ Less

Submitted 9 October, 2018; originally announced October 2018.

arXiv:1802.05701 [pdf, other]

Inverting The Generator Of A Generative Adversarial Network (II)

Authors: Antonia Creswell, Anil A Bharath

Abstract: Generative adversarial networks (GANs) learn a deep generative model that is able to synthesise novel, high-dimensional data samples. New data samples are synthesised by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties, that may be useful for down stream tasks such as classification or retri… ▽ More Generative adversarial networks (GANs) learn a deep generative model that is able to synthesise novel, high-dimensional data samples. New data samples are synthesised by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties, that may be useful for down stream tasks such as classification or retrieval. Unfortunately, GANs do not offer an "inverse model", a map** from data space back to latent space, making it difficult to infer a latent representation for a given data sample. In this paper, we introduce a technique, inversion, to project data samples, specifically images, to the latent space using a pre-trained GAN. Using our proposed inversion technique, we are able to identify which attributes of a dataset a trained GAN is able to model and quantify GAN performance, based on a reconstruction loss. We demonstrate how our proposed inversion technique may be used to quantitatively compare performance of various GAN models trained on three image datasets. We provide code for all of our experiments, https://github.com/ToniCreswell/InvertingGAN. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: Under review at IEEE TNNLS

arXiv:1801.00693 [pdf, other]

Denoising Adversarial Autoencoders: Classifying Skin Lesions Using Limited Labelled Training Data

Authors: Antonia Creswell, Alison Pouplin, Anil A Bharath

Abstract: We propose a novel deep learning model for classifying medical images in the setting where there is a large amount of unlabelled medical data available, but labelled data is in limited supply. We consider the specific case of classifying skin lesions as either malignant or benign. In this setting, the proposed approach -- the semi-supervised, denoising adversarial autoencoder -- is able to utilise… ▽ More We propose a novel deep learning model for classifying medical images in the setting where there is a large amount of unlabelled medical data available, but labelled data is in limited supply. We consider the specific case of classifying skin lesions as either malignant or benign. In this setting, the proposed approach -- the semi-supervised, denoising adversarial autoencoder -- is able to utilise vast amounts of unlabelled data to learn a representation for skin lesions, and small amounts of labelled data to assign class labels based on the learned representation. We analyse the contributions of both the adversarial and denoising components of the model and find that the combination yields superior classification performance in the setting of limited labelled training data. △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: Under consideration for the IET Computer Vision Journal special issue on "Computer Vision in Cancer Data Analysis"

arXiv:1711.10521 [pdf, ps, other]

doi 10.1016/j.patcog.2018.10.017

A Recursive Bayesian Approach To Describe Retinal Vasculature Geometry

Authors: Fatmatulzehra Uslu, Anil Anthony Bharath

Abstract: Demographic studies suggest that changes in the retinal vasculature geometry, especially in vessel width, are associated with the incidence or progression of eye-related or systemic diseases. To date, the main information source for width estimation from fundus images has been the intensity profile between vessel edges. However, there are many factors affecting the intensity profile: pathologies,… ▽ More Demographic studies suggest that changes in the retinal vasculature geometry, especially in vessel width, are associated with the incidence or progression of eye-related or systemic diseases. To date, the main information source for width estimation from fundus images has been the intensity profile between vessel edges. However, there are many factors affecting the intensity profile: pathologies, the central light reflex and local illumination levels, to name a few. In this study, we introduce three information sources for width estimation. These are the probability profiles of vessel interior, centreline and edge locations generated by a deep network. The probability profiles provide direct access to vessel geometry and are used in the likelihood calculation for a Bayesian method, particle filtering. We also introduce a geometric model which can handle non-ideal conditions of the probability profiles. Our experiments conducted on the REVIEW dataset yielded consistent estimates of vessel width, even in cases when one of the vessel edges is difficult to identify. Moreover, our results suggest that the method is better than human observers at locating edges of low contrast vessels. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: 26 pages,13 figures, journal paper

arXiv:1711.05175 [pdf, other]

Adversarial Information Factorization

Authors: Antonia Creswell, Yumnah Mohamied, Biswa Sengupta, Anil A Bharath

Abstract: We propose a novel generative model architecture designed to learn representations for images that factor out a single attribute from the rest of the representation. A single object may have many attributes which when altered do not change the identity of the object itself. Consider the human face; the identity of a particular person is independent of whether or not they happen to be wearing glass… ▽ More We propose a novel generative model architecture designed to learn representations for images that factor out a single attribute from the rest of the representation. A single object may have many attributes which when altered do not change the identity of the object itself. Consider the human face; the identity of a particular person is independent of whether or not they happen to be wearing glasses. The attribute of wearing glasses can be changed without changing the identity of the person. However, the ability to manipulate and alter image attributes without altering the object identity is not a trivial task. Here, we are interested in learning a representation of the image that separates the identity of an object (such as a human face) from an attribute (such as 'wearing glasses'). We demonstrate the success of our factorization approach by using the learned representation to synthesize the same face with and without a chosen attribute. We refer to this specific synthesis process as image attribute manipulation. We further demonstrate that our model achieves competitive scores, with state of the art, on a facial attribute classification task. △ Less

Submitted 28 September, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

arXiv:1711.02879 [pdf, other]

LatentPoison - Adversarial Attacks On The Latent Space

Authors: Antonia Creswell, Anil A. Bharath, Biswa Sengupta

Abstract: Robustness and security of machine learning (ML) systems are intertwined, wherein a non-robust ML system (classifiers, regressors, etc.) can be subject to attacks using a wide variety of exploits. With the advent of scalable deep learning methodologies, a lot of emphasis has been put on the robustness of supervised, unsupervised and reinforcement learning algorithms. Here, we study the robustness… ▽ More Robustness and security of machine learning (ML) systems are intertwined, wherein a non-robust ML system (classifiers, regressors, etc.) can be subject to attacks using a wide variety of exploits. With the advent of scalable deep learning methodologies, a lot of emphasis has been put on the robustness of supervised, unsupervised and reinforcement learning algorithms. Here, we study the robustness of the latent space of a deep variational autoencoder (dVAE), an unsupervised generative framework, to show that it is indeed possible to perturb the latent space, flip the class predictions and keep the classification probability approximately equal before and after an attack. This means that an agent that looks at the outputs of a decoder would remain oblivious to an attack. △ Less

Submitted 8 November, 2017; originally announced November 2017.

Comments: Submitted to ICLR 2018

arXiv:1710.07035 [pdf, other]

doi 10.1109/MSP.2017.2765202

Generative Adversarial Networks: An Overview

Authors: Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, Anil A Bharath

Abstract: Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transf… ▽ More Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application. △ Less

Submitted 19 October, 2017; originally announced October 2017.

Comments: Accepted in the IEEE Signal Processing Magazine Special Issue on Deep Learning for Visual Understanding

arXiv:1708.08487 [pdf, other]

On denoising autoencoders trained to minimise binary cross-entropy

Authors: Antonia Creswell, Kai Arulkumaran, Anil A. Bharath

Abstract: Denoising autoencoders (DAEs) are powerful deep learning models used for feature extraction, data generation and network pre-training. DAEs consist of an encoder and decoder which may be trained simultaneously to minimise a loss (function) between an input and the reconstruction of a corrupted version of the input. There are two common loss functions used for training autoencoders, these include t… ▽ More Denoising autoencoders (DAEs) are powerful deep learning models used for feature extraction, data generation and network pre-training. DAEs consist of an encoder and decoder which may be trained simultaneously to minimise a loss (function) between an input and the reconstruction of a corrupted version of the input. There are two common loss functions used for training autoencoders, these include the mean-squared error (MSE) and the binary cross-entropy (BCE). When training autoencoders on image data a natural choice of loss function is BCE, since pixel values may be normalised to take values in [0,1] and the decoder model may be designed to generate samples that take values in (0,1). We show theoretically that DAEs trained to minimise BCE may be used to take gradient steps in the data space towards regions of high probability under the data-generating distribution. Previously this had only been shown for DAEs trained using MSE. As a consequence of the theory, iterative application of a trained DAE moves a data sample from regions of low probability to regions of higher probability under the data-generating distribution. Firstly, we validate the theory by showing that novel data samples, consistent with the training data, may be synthesised when the initial data samples are random noise. Secondly, we motivate the theory by showing that initial data samples synthesised via other methods may be improved via iterative application of a trained DAE to those initial samples. △ Less

Submitted 9 October, 2017; v1 submitted 28 August, 2017; originally announced August 2017.

Comments: Submitted to Pattern Recognition Letters

arXiv:1708.05866 [pdf, other]

doi 10.1109/MSP.2017.2743240

A Brief Survey of Deep Reinforcement Learning

Authors: Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath

Abstract: Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are… ▽ More Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field. △ Less

Submitted 28 September, 2017; v1 submitted 19 August, 2017; originally announced August 2017.

Comments: IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding (arXiv extended version)

arXiv:1703.01220 [pdf, other]

Denoising Adversarial Autoencoders

Authors: Antonia Creswell, Anil Anthony Bharath

Abstract: Unsupervised learning is of growing interest because it unlocks the potential held in vast amounts of unlabelled data to learn useful representations for inference. Autoencoders, a form of generative model, may be trained by learning to reconstruct unlabelled input data from a latent representation space. More robust representations may be produced by an autoencoder if it learns to recover clean i… ▽ More Unsupervised learning is of growing interest because it unlocks the potential held in vast amounts of unlabelled data to learn useful representations for inference. Autoencoders, a form of generative model, may be trained by learning to reconstruct unlabelled input data from a latent representation space. More robust representations may be produced by an autoencoder if it learns to recover clean input samples from corrupted ones. Representations may be further improved by introducing regularisation during training to shape the distribution of the encoded data in latent space. We suggest denoising adversarial autoencoders, which combine denoising and regularisation, sha** the distribution of latent space using adversarial training. We introduce a novel analysis that shows how denoising may be incorporated into the training and sampling of adversarial autoencoders. Experiments are performed to assess the contributions that denoising makes to the learning of representations for classification and sample synthesis. Our results suggest that autoencoders trained using a denoising criterion achieve higher classification performance, and can synthesise samples that are more consistent with the input data than those trained without a corruption process. △ Less

Submitted 4 January, 2018; v1 submitted 3 March, 2017; originally announced March 2017.

Comments: submitted to journal

arXiv:1611.05644 [pdf, other]

Inverting The Generator Of A Generative Adversarial Network

Authors: Antonia Creswell, Anil Anthony Bharath

Abstract: Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent spa… ▽ More Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent space. For tasks such as image retrieval and image classification, it may be useful to exploit the arrangement of the latent space by projecting images into it, and using this as a representation for discriminative tasks. GANs often consist of multiple layers of non-linear computations, making them very difficult to invert. This paper introduces techniques for projecting image samples into the latent space using any pre-trained GAN, provided that the computational graph is available. We evaluate these techniques on both MNIST digits and Omniglot handwritten characters. In the case of MNIST digits, we show that projections into the latent space maintain information about the style and the identity of the digit. In the case of Omniglot characters, we show that even characters from alphabets that have not been seen during training may be projected well into the latent space; this suggests that this approach may have applications in one-shot learning. △ Less

Submitted 17 November, 2016; originally announced November 2016.

Comments: Accepted at NIPS 2016 Workshop on Adversarial Training

arXiv:1610.09296 [pdf, other]

Improving Sampling from Generative Autoencoders with Markov Chains

Authors: Antonia Creswell, Kai Arulkumaran, Anil Anthony Bharath

Abstract: We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution… ▽ More We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution, which may not be consistent with the prior. We formulate a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively decoding and encoding, which allows us to sample from the learned latent distribution. Since, the generative model learns to map from the learned latent distribution, rather than the prior, we may use MCMC to improve the quality of samples drawn from the generative model, especially when the learned latent distribution is far from the prior. Using MCMC sampling, we are able to reveal previously unseen differences between generative autoencoders trained either with or without a denoising criterion. △ Less

Submitted 12 January, 2017; v1 submitted 28 October, 2016; originally announced October 2016.

arXiv:1610.07570 [pdf, other]

A data augmentation methodology for training machine/deep learning gait recognition algorithms

Authors: Christoforos C. Charalambous, Anil A. Bharath

Abstract: There are several confounding factors that can reduce the accuracy of gait recognition systems. These factors can reduce the distinctiveness, or alter the features used to characterise gait, they include variations in clothing, lighting, pose and environment, such as the walking surface. Full invariance to all confounding factors is challenging in the absence of high-quality labelled training data… ▽ More There are several confounding factors that can reduce the accuracy of gait recognition systems. These factors can reduce the distinctiveness, or alter the features used to characterise gait, they include variations in clothing, lighting, pose and environment, such as the walking surface. Full invariance to all confounding factors is challenging in the absence of high-quality labelled training data. We introduce a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation. With this methodology, we generated a multi-modal dataset. In addition, we supply simulation files that provide the ability to simultaneously sample from several confounding variables. The basis of the data is real motion capture data of subjects walking and running on a treadmill at different speeds. Results from gait recognition experiments suggest that information about the identity of subjects is retained within synthetically generated examples. The dataset and methodology allow studies into fully-invariant identity recognition spanning a far greater number of observation conditions than would otherwise be possible. △ Less

Submitted 24 October, 2016; originally announced October 2016.

Comments: The paper and supplementary material are available on http://www.bmva.org/bmvc/2016/papers/paper110/index.html Dataset is available on http://www.bicv.org/datasets/m Proceedings of the BMVC 2016

arXiv:1609.08661 [pdf, other]

Task Specific Adversarial Cost Function

Authors: Antonia Creswell, Anil A. Bharath

Abstract: The cost function used to train a generative model should fit the purpose of the model. If the model is intended for tasks such as generating perceptually correct samples, it is beneficial to maximise the likelihood of a sample drawn from the model, Q, coming from the same distribution as the training data, P. This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P]. However,… ▽ More The cost function used to train a generative model should fit the purpose of the model. If the model is intended for tasks such as generating perceptually correct samples, it is beneficial to maximise the likelihood of a sample drawn from the model, Q, coming from the same distribution as the training data, P. This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P]. However, if the model is intended for tasks such as retrieval or classification it is beneficial to maximise the likelihood that a sample drawn from the training data is captured by the model, equivalent to minimising KL[P||Q]. The cost function used in adversarial training optimises the Jensen-Shannon entropy which can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here, we propose an alternative adversarial cost function which allows easy tuning of the model for either task. Our task specific cost function is evaluated on a dataset of hand-written characters in the following tasks: Generation, retrieval and one-shot learning. △ Less

Submitted 27 September, 2016; originally announced September 2016.

Comments: Submitted to TPAMI

arXiv:1607.02748 [pdf, other]

doi 10.1007/978-3-319-46604-0_55

Adversarial Training For Sketch Retrieval

Authors: Antonia Creswell, Anil Anthony Bharath

Abstract: Generative Adversarial Networks (GAN) are able to learn excellent representations for unlabelled data which can be applied to image generation and scene classification. Representations learned by GANs have not yet been applied to retrieval. In this paper, we show that the representations learned by GANs can indeed be used for retrieval. We consider heritage documents that contain unlabelled Mercha… ▽ More Generative Adversarial Networks (GAN) are able to learn excellent representations for unlabelled data which can be applied to image generation and scene classification. Representations learned by GANs have not yet been applied to retrieval. In this paper, we show that the representations learned by GANs can indeed be used for retrieval. We consider heritage documents that contain unlabelled Merchant Marks, sketch-like symbols that are similar to hieroglyphs. We introduce a novel GAN architecture with design features that make it suitable for sketch retrieval. The performance of this sketch-GAN is compared to a modified version of the original GAN architecture with respect to simple invariance properties. Experiments suggest that sketch-GANs learn representations that are suitable for retrieval and which also have increased stability to rotation, scale and translation compared to the standard GAN architecture. △ Less

Submitted 23 August, 2016; v1 submitted 10 July, 2016; originally announced July 2016.

Comments: Accepted to ECCV2016 VisArt Workshop

arXiv:1604.08153 [pdf, other]

Classifying Options for Deep Reinforcement Learning

Authors: Kai Arulkumaran, Nat Dilokthanakul, Murray Shanahan, Anil Anthony Bharath

Abstract: In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across… ▽ More In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample complexity when simultaneously learning subtasks with negative transfer, without degrading performance when learning subtasks with positive transfer. △ Less

Submitted 19 June, 2017; v1 submitted 27 April, 2016; originally announced April 2016.

Comments: IJCAI 2016 Workshop on Deep Reinforcement Learning: Frontiers and Challenges

arXiv:1503.03514 [pdf, other]

doi 10.1016/j.patrec.2015.03.003

Appearance-based indoor localization: A comparison of patch descriptor performance

Authors: Jose Rivera-Rubio, Ioannis Alexiou, Anil A. Bharath

Abstract: Vision is one of the most important of the senses, and humans use it extensively during navigation. We evaluated different types of image and video frame descriptors that could be used to determine distinctive visual landmarks for localizing a person based on what is seen by a camera that they carry. To do this, we created a database containing over 3 km of video-sequences with ground-truth in the… ▽ More Vision is one of the most important of the senses, and humans use it extensively during navigation. We evaluated different types of image and video frame descriptors that could be used to determine distinctive visual landmarks for localizing a person based on what is seen by a camera that they carry. To do this, we created a database containing over 3 km of video-sequences with ground-truth in the form of distance travelled along different corridors. Using this database, the accuracy of localization - both in terms of knowing which route a user is on - and in terms of position along a certain route, can be evaluated. For each type of descriptor, we also tested different techniques to encode visual structure and to search between journeys to estimate a user's position. The techniques include single-frame descriptors, those using sequences of frames, and both colour and achromatic descriptors. We found that single-frame indexing worked better within this particular dataset. This might be because the motion of the person holding the camera makes the video too dependent on individual steps and motions of one particular journey. Our results suggest that appearance-based information could be an additional source of navigational data indoors, augmenting that provided by, say, radio signal strength indicators (RSSIs). Such visual information could be collected by crowdsourcing low-resolution video feeds, allowing journeys made by different users to be associated with each other, and location to be inferred without requiring explicit map**. This offers a complementary approach to methods based on simultaneous localization and map** (SLAM) algorithms. △ Less

Submitted 11 March, 2015; originally announced March 2015.

Comments: Accepted for publication on Pattern Recognition Letters

MSC Class: 68T45; 68T40

Showing 1–33 of 33 results for author: Bharath, A A