Search | arXiv e-print repository

ReCoRe: Regularized Contrastive Representation Learning of World Model

Authors: Rudra P. K. Poudel, Harit Pandya, Stephan Liwicki, Roberto Cipolla

Abstract: While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a wor… ▽ More While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a world model that learns invariant features using (i) contrastive unsupervised learning and (ii) an intervention-invariant regularizer. Learning an explicit representation of the world dynamics i.e. a world model, improves sample efficiency while contrastive learning implicitly enforces learning of invariant features, which improves generalization. However, the naïve integration of contrastive loss to world models is not good enough, as world-model-based RL methods independently optimize representation learning and agent policy. To overcome this issue, we propose an intervention-invariant regularizer in the form of an auxiliary task such as depth prediction, image denoising, image segmentation, etc., that explicitly enforces invariance to style interventions. Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark. With only visual observations, we further demonstrate that our approach outperforms recent language-guided foundation models for point navigation, which is essential for deployment on robots with limited computation capabilities. Finally, we demonstrate that our proposed model excels at the sim-to-real transfer of its perception module on the Gibson benchmark. △ Less

Submitted 3 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted at CVPR 2024. arXiv admin note: text overlap with arXiv:2209.14932

arXiv:2311.17593 [pdf, other]

LanGWM: Language Grounded World Model

Authors: Rudra P. K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla

Abstract: Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language. Building upon recent success of th… ▽ More Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language. Building upon recent success of the large language models, our main objective is to improve the state abstraction technique in reinforcement learning by leveraging language for robust action selection. Specifically, we focus on learning language-grounded visual features to enhance the world model learning, a model-based reinforcement learning technique. To enforce our hypothesis explicitly, we mask out the bounding boxes of a few objects in the image observation and provide the text prompt as descriptions for these masked objects. Subsequently, we predict the masked objects along with the surrounding regions as pixel reconstruction, similar to the transformer-based masked autoencoder approach. Our proposed LanGWM: Language Grounded World Model achieves state-of-the-art performance in out-of-distribution test at the 100K interaction steps benchmarks of iGibson point navigation tasks. Furthermore, our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction because our extracted visual features are language grounded. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2310.09650 [pdf]

Multimodal Federated Learning in Healthcare: a Review

Authors: Jacob Thrasher, Alina Devkota, Prasiddha Siwakotai, Rohit Chivukula, Pranav Poudel, Chaunbo Hu, Binod Bhattarai, Prashnna Gyawali

Abstract: Recent advancements in multimodal machine learning have empowered the development of accurate and robust AI systems in the medical domain, especially within centralized database systems. Simultaneously, Federated Learning (FL) has progressed, providing a decentralized mechanism where data need not be consolidated, thereby enhancing the privacy and security of sensitive healthcare data. The integra… ▽ More Recent advancements in multimodal machine learning have empowered the development of accurate and robust AI systems in the medical domain, especially within centralized database systems. Simultaneously, Federated Learning (FL) has progressed, providing a decentralized mechanism where data need not be consolidated, thereby enhancing the privacy and security of sensitive healthcare data. The integration of these two concepts supports the ongoing progress of multimodal learning in healthcare while ensuring the security and privacy of patient records within local data-holding agencies. This paper offers a concise overview of the significance of FL in healthcare and outlines the current state-of-the-art approaches to Multimodal Federated Learning (MMFL) within the healthcare domain. It comprehensively examines the existing challenges in the field, shedding light on the limitations of present models. Finally, the paper outlines potential directions for future advancements in the field, aiming to bridge the gap between cutting-edge AI technology and the imperative need for patient data privacy in healthcare applications. △ Less

Submitted 27 February, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: 28 pages, 5 figures

arXiv:2306.13203 [pdf, other]

Neural Network Pruning for Real-time Polyp Segmentation

Authors: Suman Sapkota, Pranav Poudel, Sudarshan Regmi, Bibek Panthi, Binod Bhattarai

Abstract: Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising… ▽ More Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2209.14932 [pdf, other]

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

Authors: Rudra P. K. Poudel, Harit Pandya, Roberto Cipolla

Abstract: In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and th… ▽ More In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2208.07359 [pdf, ps, other]

Stable Scheduling in Transactional Memory

Authors: Costas Busch, Bogdan S. Chlebus, Dariusz R. Kowalski, Pavan Poudel

Abstract: We study computer systems with transactions executed on a set of shared objects. Transactions arrive continually subjects to constrains that are framed as an adversarial model and impose limits on the average rate of transaction generation and the number of objects that transactions use. We show that no deterministic distributed scheduler in the queue-free model of transaction autonomy can provide… ▽ More We study computer systems with transactions executed on a set of shared objects. Transactions arrive continually subjects to constrains that are framed as an adversarial model and impose limits on the average rate of transaction generation and the number of objects that transactions use. We show that no deterministic distributed scheduler in the queue-free model of transaction autonomy can provide stability for any positive rate of transaction generation. Let a system consist of $m$ shared objects and an adversary be constrained such that each transaction may access at most $k$ shared objects. We prove that no scheduler can be stable if a generation rate is greater than $\max\bigl\{\frac{2}{k+1},\frac{2}{\lfloor \sqrt{2m} \rfloor}\bigr\}$. We develop a centralized scheduler that is stable if a transaction generation rate is at most $\max\bigl\{\frac{1}{4k}, \frac{1}{4\lceil\sqrt{m}\rceil} \bigr\}$. We design a distributed scheduler in the queue-based model of transaction autonomy, in which a transaction is assigned to an individual processor, that guarantees stability if the rate of transaction generation is less than $\max\bigl\{ \frac{1}{6k},\frac{1}{6\lceil\sqrt{m}\rceil}\bigr\}$. For each of the schedulers we give upper bounds on the queue size and transaction latency in the range of rates of transaction generation for which the scheduler is stable. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2204.03440 [pdf, other]

Task-Aware Active Learning for Endoscopic Image Analysis

Authors: Shrawan Kumar Thapa, Pranav Poudel, Binod Bhattarai, Danail Stoyanov

Abstract: Semantic segmentation of polyps and depth estimation are two important research problems in endoscopic image analysis. One of the main obstacles to conduct research on these research problems is lack of annotated data. Endoscopic annotations necessitate the specialist knowledge of expert endoscopists and due to this, it can be difficult to organise, expensive and time consuming. To address this pr… ▽ More Semantic segmentation of polyps and depth estimation are two important research problems in endoscopic image analysis. One of the main obstacles to conduct research on these research problems is lack of annotated data. Endoscopic annotations necessitate the specialist knowledge of expert endoscopists and due to this, it can be difficult to organise, expensive and time consuming. To address this problem, we investigate an active learning paradigm to reduce the number of training examples by selecting the most discriminative and diverse unlabelled examples for the task taken into consideration. Most of the existing active learning pipelines are task-agnostic in nature and are often sub-optimal to the end task. In this paper, we propose a novel task-aware active learning pipeline and applied for two important tasks in endoscopic image analysis: semantic segmentation and depth estimation. We compared our method with the competitive baselines. From the experimental results, we observe a substantial improvement over the compared baselines. Codes are available at https://github.com/thetna/endo-active-learn. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2201.12678 [pdf, ps, other]

A Stochastic Bundle Method for Interpolating Networks

Authors: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar

Abstract: We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This… ▽ More We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2009.05429 [pdf, other]

doi 10.1109/LRA.2020.3048662

Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments

Authors: Steven D. Morad, Roberto Mecca, Rudra P. K. Poudel, Stephan Liwicki, Roberto Cipolla

Abstract: We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigat… ▽ More We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images. Obstacle-avoiding policies and frozen feature networks support transfer to unseen real-world environments, without any modification or retraining requirements. We evaluate our policies in simulation, and in the real world on a ground robot and a quadrotor drone. Videos of real-world results are available in the supplementary material. △ Less

Submitted 6 January, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

arXiv:1902.04502 [pdf, other]

Fast-SCNN: Fast Semantic Segmentation Network

Authors: Rudra P K Poudel, Stephan Liwicki, Roberto Cipolla

Abstract: The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded… ▽ More The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications. △ Less

Submitted 12 February, 2019; originally announced February 2019.

arXiv:1805.04554 [pdf, other]

ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time

Authors: Rudra P K Poudel, Ujwal Bonde, Stephan Liwicki, Christopher Zach

Abstract: Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep n… ▽ More Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. ContextNet combines a deep network branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. We analyse our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.3 frames per second at full (1024x2048) resolution (41.9 fps with pipelined computations for streamed data). △ Less

Submitted 5 November, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: Published as a conference paper at British Machine Vision Conference (BMVC), 2018

arXiv:1612.02572 [pdf]

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Authors: James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana

Abstract: Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on… ▽ More Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on deep learning, and specifically convolutional neural networks (CNN), and applied to both pre-processed and raw T1-weighted MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted age using a large dataset of healthy adults (N = 2001). Next, we sought to establish the heritability of brain-predicted age using a sample of monozygotic and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and multi-centre reliability of brain-predicted age using two samples (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were generated and compared to a Gaussian Process Regression (GPR) approach, on all datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps generated by Statistical Parametric Map** (SPM) or raw data. Brain-predicted age represents an accurate, highly reliable and genetically-valid phenotype, that has potential to be used as a biomarker of brain ageing. Moreover, age predictions can be accurately generated on raw T1-MRI data, substantially reducing computation time for novel data, bringing the process closer to giving real-time information on brain health in clinical settings. △ Less

Submitted 8 December, 2016; originally announced December 2016.

arXiv:1608.03974 [pdf, other]

Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation

Authors: Rudra P K Poudel, Pablo Lamata, Giovanni Montana

Abstract: In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial depende… ▽ More In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial dependences through internal memory units. RFCN combines anatomical detection and segmentation into a single architecture that is trained end-to-end thus significantly reducing computational time, simplifying the segmentation pipeline, and potentially enabling real-time applications. We report on an investigation of RFCN using two datasets, including the publicly available MICCAI 2009 Challenge dataset. Comparisons have been carried out between fully convolutional networks and deep restricted Boltzmann machines, including a recurrent version that leverages inter-slice spatial correlation. Our studies suggest that RFCN produces state-of-the-art results and can substantially improve the delineation of contours near the apex of the heart. △ Less

Submitted 13 August, 2016; originally announced August 2016.

Comments: MICCAI Workshop RAMBO 2016

Showing 1–14 of 14 results for author: Poudel, P