Search | arXiv e-print repository

L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning

Authors: Yasar Abbas Ur Rehman, Yan Gao, Pedro Porto Buarque de Gusmão, Mina Alibeigi, Jiajun Shen, Nicholas D. Lane

Abstract: The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of the learned visual representations without needing to move data around. However, cli… ▽ More The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of the learned visual representations without needing to move data around. However, client bias and divergence during FL aggregation caused by data heterogeneity limits the performance of learned visual representations on downstream tasks. In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. The proposed method aggregates weights at the layer-level according to the measure of angular divergence between the clients' model and the global model. Extensive experiments with cross-silo and cross-device settings on CIFAR-10/100 and Tiny ImageNet datasets demonstrate that our methods are effective and obtain new SOTA performance on both contrastive and non-contrastive SSL approaches. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.17453 [pdf, other]

Pollen: High-throughput Federated Learning Simulation via Resource-Aware Client Placement

Authors: Lorenzo Sani, Pedro Porto Buarque de Gusmão, Alex Iacob, Wanru Zhao, Xinchi Qiu, Yan Gao, Javier Fernandez-Marques, Nicholas Donald Lane

Abstract: Federated Learning (FL) is a privacy-focused machine learning paradigm that collaboratively trains models directly on edge devices. Simulation plays an essential role in FL adoption, hel** develop novel aggregation and client sampling strategies. However, current simulators cannot emulate large-scale systems in a time-efficient manner, which limits their utility and casts doubts on generalizabil… ▽ More Federated Learning (FL) is a privacy-focused machine learning paradigm that collaboratively trains models directly on edge devices. Simulation plays an essential role in FL adoption, hel** develop novel aggregation and client sampling strategies. However, current simulators cannot emulate large-scale systems in a time-efficient manner, which limits their utility and casts doubts on generalizability. This work proposes Pollen, a novel resource-aware system for speeding up simulations. Pollen addresses two limiting factors from existing simulators: (a) communication inefficiency derived from pull-based client execution and (b) inadequate load balance when using heterogeneous hardware. Pollen executes high-throughput FL simulations at scale by (a) using a push-based client placement system, (b) learning how an adaptable scheduling of clients based on hardware statistics (c) estimating the optimal number of concurrent workers per GPU. We evaluate Pollen on four representative FL tasks and show that Pollen's placement model increases GPU utilization and reduces idle time. We compare Pollen to Flower, Flute, FedScale, Parrot, and pfl and show experimental speed-ups of days or weeks. △ Less

Submitted 20 May, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: 22 pages, 22 figures, 9 tables, under review

arXiv:2306.04040 [pdf, other]

FedVal: Different good or different bad in federated learning

Authors: Viktor Valadi, Xinchi Qiu, Pedro Porto Buarque de Gusmão, Nicholas D. Lane, Mina Alibeigi

Abstract: Federated learning (FL) systems are susceptible to attacks from malicious actors who might attempt to corrupt the training model through various poisoning attacks. FL also poses new challenges in addressing group bias, such as ensuring fair performance for different demographic groups. Traditional methods used to address such biases require centralized access to the data, which FL systems do not h… ▽ More Federated learning (FL) systems are susceptible to attacks from malicious actors who might attempt to corrupt the training model through various poisoning attacks. FL also poses new challenges in addressing group bias, such as ensuring fair performance for different demographic groups. Traditional methods used to address such biases require centralized access to the data, which FL systems do not have. In this paper, we present a novel approach FedVal for both robustness and fairness that does not require any additional information from clients that could raise privacy concerns and consequently compromise the integrity of the FL system. To this end, we propose an innovative score function based on a server-side validation method that assesses client updates and determines the optimal aggregation balance between locally-trained models. Our research shows that this approach not only provides solid protection against poisoning attacks but can also be used to reduce group bias and subsequently promote fairness while maintaining the system's capability for differential privacy. Extensive experiments on the CIFAR-10, FEMNIST, and PUMS ACSIncome datasets in different configurations demonstrate the effectiveness of our method, resulting in state-of-the-art performances. We have proven robustness in situations where 80% of participating clients are malicious. Additionally, we have shown a significant increase in accuracy for underrepresented labels from 32% to 53%, and increase in recall rate for underrepresented features from 19% to 50%. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: To appear in the proceedings of the USENIX Security Symposium 2023

arXiv:2305.11236 [pdf, other]

Efficient Vertical Federated Learning with Secure Aggregation

Authors: Xinchi Qiu, Heng Pan, Wanru Zhao, Chenyang Ma, Pedro Porto Buarque de Gusmão, Nicholas D. Lane

Abstract: The majority of work in privacy-preserving federated learning (FL) has been focusing on horizontally partitioned datasets where clients share the same sets of features and can train complete models independently. However, in many interesting problems, such as financial fraud detection and disease detection, individual data points are scattered across different clients/organizations in vertical fed… ▽ More The majority of work in privacy-preserving federated learning (FL) has been focusing on horizontally partitioned datasets where clients share the same sets of features and can train complete models independently. However, in many interesting problems, such as financial fraud detection and disease detection, individual data points are scattered across different clients/organizations in vertical federated learning. Solutions for this type of FL require the exchange of gradients between participants and rarely consider privacy and security concerns, posing a potential risk of privacy leakage. In this work, we present a novel design for training vertical FL securely and efficiently using state-of-the-art security modules for secure aggregation. We demonstrate empirically that our method does not impact training performance whilst obtaining 9.1e2 ~3.8e4 speedup compared to homomorphic encryption (HE). △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Federated Learning Systems (FLSys) Workshop @ MLSys 2023

arXiv:2209.15575 [pdf, other]

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

Authors: Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane

Abstract: Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train as it requires manipulating long input sequences that can only be handled by powerful centralised servers. Surprisingly, despite many attempts to increase trai… ▽ More Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train as it requires manipulating long input sequences that can only be handled by powerful centralised servers. Surprisingly, despite many attempts to increase training efficiency through model compression, the effects of truncating input sequence lengths to reduce computation have not been studied. In this paper, we provide the first empirical study of SSL pre-training for different specified sequence lengths and link this to various downstream tasks. We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks. This simple one-line change would promote the migration of SSL training from data centres to user-end edge devices for more realistic and personalised applications. △ Less

Submitted 22 November, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

arXiv:2207.01975 [pdf, other]

Federated Self-supervised Learning for Video Understanding

Authors: Yasar Abbas Ur Rehman, Yan Gao, Jiajun Shen, Pedro Porto Buarque de Gusmao, Nicholas Lane

Abstract: The ubiquity of camera-enabled mobile devices has lead to large amounts of unlabelled video data being produced at the edge. Although various self-supervised learning (SSL) methods have been proposed to harvest their latent spatio-temporal representations for task-specific training, practical challenges including privacy concerns and communication costs prevent SSL from being deployed at large sca… ▽ More The ubiquity of camera-enabled mobile devices has lead to large amounts of unlabelled video data being produced at the edge. Although various self-supervised learning (SSL) methods have been proposed to harvest their latent spatio-temporal representations for task-specific training, practical challenges including privacy concerns and communication costs prevent SSL from being deployed at large scales. To mitigate these issues, we propose the use of Federated Learning (FL) to the task of video SSL. In this work, we evaluate the performance of current state-of-the-art (SOTA) video-SSL techniques and identify their shortcomings when integrated into the large-scale FL setting simulated with kinetics-400 dataset. We follow by proposing a novel federated SSL framework for video, dubbed FedVSSL, that integrates different aggregation strategies and partial weight updating. Extensive experiments demonstrate the effectiveness and significance of FedVSSL as it outperforms the centralized SOTA for the downstream retrieval task by 6.66% on UCF-101 and 5.13% on HMDB-51. △ Less

Submitted 19 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2207.01053 [pdf, other]

doi 10.1145/3556557.3557950

Protea: Client Profiling within Federated Systems using Flower

Authors: Wanru Zhao, Xinchi Qiu, Javier Fernandez-Marques, Pedro P. B. de Gusmão, Nicholas D. Lane

Abstract: Federated Learning (FL) has emerged as a prospective solution that facilitates the training of a high-performing centralised model without compromising the privacy of users. While successful, research is currently limited by the possibility of establishing a realistic large-scale FL system at the early stages of experimentation. Simulation can help accelerate this process. To facilitate efficient… ▽ More Federated Learning (FL) has emerged as a prospective solution that facilitates the training of a high-performing centralised model without compromising the privacy of users. While successful, research is currently limited by the possibility of establishing a realistic large-scale FL system at the early stages of experimentation. Simulation can help accelerate this process. To facilitate efficient scalable FL simulation of heterogeneous clients, we design and implement Protea, a flexible and lightweight client profiling component within federated systems using the FL framework Flower. It allows automatically collecting system-level statistics and estimating the resources needed for each client, thus running the simulation in a resource-aware fashion. The results show that our design successfully increases parallelism for 1.66 $\times$ faster wall-clock time and 2.6$\times$ better GPU utilisation, which enables large-scale experiments on heterogeneous clients. △ Less

Submitted 31 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: 6 pages, 5 figures, Accepted at ACM MobiCom FedEdge Workshop, 2022

Journal ref: 1st ACM Workshop on Data Privacy and Federated Learning Technologies for Mobile Edge Network (FedEdge'22), October 17,2022,Sydney, NSW, Australia

arXiv:2205.06117 [pdf, other]

Secure Aggregation for Federated Learning in Flower

Authors: Kwing Hei Li, Pedro Porto Buarque de Gusmão, Daniel J. Beutel, Nicholas D. Lane

Abstract: Federated Learning (FL) allows parties to learn a shared prediction model by delegating the training computation to clients and aggregating all the separately trained models on the server. To prevent private information being inferred from local models, Secure Aggregation (SA) protocols are used to ensure that the server is unable to inspect individual trained models as it aggregates them. However… ▽ More Federated Learning (FL) allows parties to learn a shared prediction model by delegating the training computation to clients and aggregating all the separately trained models on the server. To prevent private information being inferred from local models, Secure Aggregation (SA) protocols are used to ensure that the server is unable to inspect individual trained models as it aggregates them. However, current implementations of SA in FL frameworks have limitations, including vulnerability to client dropouts or configuration difficulties. In this paper, we present Salvia, an implementation of SA for Python users in the Flower FL framework. Based on the SecAgg(+) protocols for a semi-honest threat model, Salvia is robust against client dropouts and exposes a flexible and easy-to-use API that is compatible with various machine learning frameworks. We show that Salvia's experimental performance is consistent with SecAgg(+)'s theoretical computation and communication complexities. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted to appear in the 2nd International Workshop on Distributed Machine Learning

arXiv:2104.14297 [pdf, other]

End-to-End Speech Recognition from Federated Acoustic Models

Authors: Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

Abstract: Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data di… ▽ More Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises. We present the first empirical study on attention-based sequence-to-sequence End-to-End (E2E) ASR model with three aggregation weighting strategies -- standard FedAvg, loss-based aggregation and a novel word error rate (WER)-based aggregation, compared in two realistic FL scenarios: cross-silo with 10 clients and cross-device with 2K and 4K clients. Our analysis on E2E ASR from heterogeneous and realistic federated acoustic models provides the foundations for future research and development of realistic FL-based ASR applications. △ Less

Submitted 9 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

arXiv:2104.07196 [pdf, other]

Graph-based Thermal-Inertial SLAM with Probabilistic Neural Networks

Authors: Muhamad Risqi U. Saputra, Chris Xiaoxuan Lu, Pedro P. B. de Gusmao, Bing Wang, Andrew Markham, Niki Trigoni

Abstract: Simultaneous Localization and Map** (SLAM) system typically employ vision-based sensors to observe the surrounding environment. However, the performance of such systems highly depends on the ambient illumination conditions. In scenarios with adverse visibility or in the presence of airborne particulates (e.g. smoke, dust, etc.), alternative modalities such as those based on thermal imaging and i… ▽ More Simultaneous Localization and Map** (SLAM) system typically employ vision-based sensors to observe the surrounding environment. However, the performance of such systems highly depends on the ambient illumination conditions. In scenarios with adverse visibility or in the presence of airborne particulates (e.g. smoke, dust, etc.), alternative modalities such as those based on thermal imaging and inertial sensors are more promising. In this paper, we propose the first complete thermal-inertial SLAM system which combines neural abstraction in the SLAM front end with robust pose graph optimization in the SLAM back end. We model the sensor abstraction in the front end by employing probabilistic deep learning parameterized by Mixture Density Networks (MDN). Our key strategies to successfully model this encoding from thermal imagery are the usage of normalized 14-bit radiometric data, the incorporation of hallucinated visual (RGB) features, and the inclusion of feature selection to estimate the MDN parameters. To enable a full SLAM system, we also design an efficient global image descriptor which is able to detect loop closures from thermal embedding vectors. We performed extensive experiments and analysis using three datasets, namely self-collected ground robot and handheld data taken in indoor environment, and one public dataset (SubT-tunnel) collected in underground tunnel. Finally, we demonstrate that an accurate thermal-inertial SLAM system can be realized in conditions of both benign and adverse visibility. △ Less

Submitted 29 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted to IEEE Transactions on Robotics

arXiv:2104.03042 [pdf, other]

On-device Federated Learning with Flower

Authors: Akhil Mathur, Daniel J. Beutel, Pedro Porto Buarque de Gusmão, Javier Fernandez-Marques, Taner Topal, Xinchi Qiu, Titouan Parcollet, Yan Gao, Nicholas D. Lane

Abstract: Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while kee** their training data on the device, thereby decoupling the ability to do machine learning from the need to store data in the cloud. Despite the algorithmic advancements in FL, the support for on-device training of FL algorithms on edge devices remains poor. In this paper, we present an explo… ▽ More Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while kee** their training data on the device, thereby decoupling the ability to do machine learning from the need to store data in the cloud. Despite the algorithmic advancements in FL, the support for on-device training of FL algorithms on edge devices remains poor. In this paper, we present an exploration of on-device FL on various smartphones and embedded devices using the Flower framework. We also evaluate the system costs of on-device FL and discuss how this quantification could be used to design more efficient FL algorithms. △ Less

Submitted 7 April, 2021; originally announced April 2021.

Comments: Accepted at the 2nd On-device Intelligence Workshop @ MLSys 2021. arXiv admin note: substantial text overlap with arXiv:2007.14390

ACM Class: I.0

Journal ref: On-device Intelligence Workshop at the Fourth Conference on Machine Learning and Systems (MLSys), April 9, 2021

arXiv:2102.07627 [pdf, other]

A first look into the carbon footprint of federated learning

Authors: Xinchi Qiu, Titouan Parcollet, Javier Fernandez-Marques, Pedro Porto Buarque de Gusmao, Yan Gao, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane

Abstract: Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting to be deployed at a global scale by companies that must adhere to new legal demands an… ▽ More Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and social groups advocating for privacy protection. \textit{However, the potential environmental impact related to FL remains unclear and unexplored. This paper offers the first-ever systematic study of the carbon footprint of FL.} First, we propose a rigorous model to quantify the carbon footprint, hence facilitating the investigation of the relationship between FL design and carbon emissions. Then, we compare the carbon footprint of FL to traditional centralized learning. Our findings show that, depending on the configuration, FL can emit up to two order of magnitude more carbon than centralized machine learning. However, in certain settings, it can be comparable to centralized learning due to the reduced energy consumption of embedded devices. We performed extensive experiments across different types of datasets, settings and various deep learning models with FL. Finally, we highlight and connect the reported results to the future challenges and trends in FL to reduce its environmental impact, including algorithms efficiency, hardware capabilities, and stronger industry transparency. △ Less

Submitted 22 May, 2023; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2010.06537

arXiv:2007.14390 [pdf, other]

Flower: A Friendly Federated Learning Research Framework

Authors: Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, Nicholas D. Lane

Abstract: Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while kee** their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are… ▽ More Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while kee** their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are a number of research frameworks available to simulate FL algorithms, they do not support the study of scalable FL workloads on heterogeneous edge devices. In this paper, we present Flower -- a comprehensive FL framework that distinguishes itself from existing platforms by offering new facilities to execute large-scale FL experiments and consider richly heterogeneous FL device scenarios. Our experiments show Flower can perform FL experiments up to 15M in client size using only a pair of high-end GPUs. Researchers can then seamlessly migrate experiments to real devices to examine other parts of the design space. We believe Flower provides the community with a critical new tool for FL study and development. △ Less

Submitted 5 March, 2022; v1 submitted 28 July, 2020; originally announced July 2020.

Comments: Open-Source, mobile-friendly Federated Learning framework

arXiv:2006.02266 [pdf, other]

milliEgo: Single-chip mmWave Radar Aided Egomotion Estimation via Deep Sensor Fusion

Authors: Chris Xiaoxuan Lu, Muhamad Risqi U. Saputra, Peijun Zhao, Yasin Almalioglu, Pedro P. B. de Gusmao, Changhao Chen, Ke Sun, Niki Trigoni, Andrew Markham

Abstract: Robust and accurate trajectory estimation of mobile agents such as people and robots is a key requirement for providing spatial awareness for emerging capabilities such as augmented reality or autonomous interaction. Although currently dominated by optical techniques e.g., visual-inertial odometry, these suffer from challenges with scene illumination or featureless surfaces. As an alternative, we… ▽ More Robust and accurate trajectory estimation of mobile agents such as people and robots is a key requirement for providing spatial awareness for emerging capabilities such as augmented reality or autonomous interaction. Although currently dominated by optical techniques e.g., visual-inertial odometry, these suffer from challenges with scene illumination or featureless surfaces. As an alternative, we propose milliEgo, a novel deep-learning approach to robust egomotion estimation which exploits the capabilities of low-cost mmWave radar. Although mmWave radar has a fundamental advantage over monocular cameras of being metric i.e., providing absolute scale or depth, current single chip solutions have limited and sparse imaging resolution, making existing point-cloud registration techniques brittle. We propose a new architecture that is optimized for solving this challenging pose transformation problem. Secondly, to robustly fuse mmWave pose estimates with additional sensors, e.g. inertial or visual sensors we introduce a mixed attention approach to deep fusion. Through extensive experiments, we demonstrate our proposed system is able to achieve 1.3% 3D error drift and generalizes well to unseen environments. We also show that the neural architecture can be made highly efficient and suitable for real-time embedded applications. △ Less

Submitted 19 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Appear at the ACM Conference on Embedded Networked Sensor Systems (SenSys 2020)

arXiv:1911.09968 [pdf, other]

SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation

Authors: Yasin Almalioglu, Mehmet Turan, Alp Eren Sari, Muhamad Risqi U. Saputra, Pedro P. B. de Gusmão, Andrew Markham, Niki Trigoni

Abstract: In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a nove… ▽ More In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns to jointly estimate 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabeled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without the need for IMU intrinsic parameters and/or the extrinsic calibration between the IMU and the camera. estimation and single-view depth recovery network. We provide comprehensive quantitative and qualitative evaluations of the proposed framework comparing its performance with state-of-the-art VIO, VO, and visual simultaneous localization and map** (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature. △ Less

Submitted 23 July, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: 15 pages, submitted to The IEEE Transactions on Robotics (T-RO) journal, under review

arXiv:1909.07231 [pdf, other]

DeepTIO: A Deep Thermal-Inertial Odometry with Visual Hallucination

Authors: Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Chris Xiaoxuan Lu, Yasin Almalioglu, Stefano Rosa, Changhao Chen, Johan Wahlström, Wei Wang, Andrew Markham, Niki Trigoni

Abstract: Visual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is a… ▽ More Visual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is as a result of the sensor measuring the ambient temperature profile rather than scene appearance and geometry. To overcome this issue, we propose a Deep Neural Network model for thermal-inertial odometry (DeepTIO) by incorporating a visual hallucination network to provide the thermal network with complementary information. The hallucination network is taught to predict fake visual features from thermal images by using Huber loss. We also employ selective fusion to attentively fuse the features from three different modalities, i.e thermal, hallucination, and inertial features. Extensive experiments are performed in hand-held and mobile robot data in benign and smoke-filled environments, showing the efficacy of the proposed model. △ Less

Submitted 19 January, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: Accepted to IEEE Robotics and Automation Letters (RAL)

arXiv:1908.00858 [pdf, other]

Distilling Knowledge From a Deep Pose Regressor Network

Authors: Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Yasin Almalioglu, Andrew Markham, Niki Trigoni

Abstract: This paper presents a novel method to distill knowledge from a deep pose regressor network for efficient Visual Odometry (VO). Standard distillation relies on "dark knowledge" for successful knowledge transfer. As this knowledge is not available in pose regression and the teacher prediction is not always accurate, we propose to emphasize the knowledge transfer only when we trust the teacher. We ac… ▽ More This paper presents a novel method to distill knowledge from a deep pose regressor network for efficient Visual Odometry (VO). Standard distillation relies on "dark knowledge" for successful knowledge transfer. As this knowledge is not available in pose regression and the teacher prediction is not always accurate, we propose to emphasize the knowledge transfer only when we trust the teacher. We achieve this by using teacher loss as a confidence score which places variable relative importance on the teacher prediction. We inject this confidence score to the main training task via Attentive Imitation Loss (AIL) and when learning the intermediate representation of the teacher through Attentive Hint Training (AHT) approach. To the best of our knowledge, this is the first work which successfully distill the knowledge from a deep pose regression network. Our evaluation on the KITTI and Malaga dataset shows that we can keep the student prediction close to the teacher with up to 92.95% parameter reduction and 2.12x faster in computation time. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019

arXiv:1903.10543 [pdf, other]

Learning Monocular Visual Odometry through Geometry-Aware Curriculum Learning

Authors: Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Sen Wang, Andrew Markham, Niki Trigoni

Abstract: Inspired by the cognitive process of humans and animals, Curriculum Learning (CL) trains a model by gradually increasing the difficulty of the training data. In this paper, we study whether CL can be applied to complex geometry problems like estimating monocular Visual Odometry (VO). Unlike existing CL approaches, we present a novel CL strategy for learning the geometry of monocular VO by graduall… ▽ More Inspired by the cognitive process of humans and animals, Curriculum Learning (CL) trains a model by gradually increasing the difficulty of the training data. In this paper, we study whether CL can be applied to complex geometry problems like estimating monocular Visual Odometry (VO). Unlike existing CL approaches, we present a novel CL strategy for learning the geometry of monocular VO by gradually making the learning objective more difficult during training. To this end, we propose a novel geometry-aware objective function by jointly optimizing relative and composite transformations over small windows via bounded pose regression loss. A cascade optical flow network followed by recurrent network with a differentiable windowed composition layer, termed CL-VO, is devised to learn the proposed objective. Evaluation on three real-world datasets shows superior performance of CL-VO over state-of-the-art feature-based and learning-based VO. △ Less

Submitted 5 November, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: accepted in IEEE ICRA 2019

arXiv:1809.05786 [pdf, other]

doi 10.1109/ICRA.2019.8793512

GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks

Authors: Yasin Almalioglu, Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Andrew Markham, Niki Trigoni

Abstract: In the last decade, supervised deep learning approaches have been extensively employed in visual odometry (VO) applications, which is not feasible in environments where labelled data is not abundant. On the other hand, unsupervised deep learning approaches for localization and map** in unknown environments from unlabelled data have received comparatively less attention in VO research. In this st… ▽ More In the last decade, supervised deep learning approaches have been extensively employed in visual odometry (VO) applications, which is not feasible in environments where labelled data is not abundant. On the other hand, unsupervised deep learning approaches for localization and map** in unknown environments from unlabelled data have received comparatively less attention in VO research. In this study, we propose a generative unsupervised learning framework that predicts 6-DoF pose camera motion and monocular depth map of the scene from unlabelled RGB image sequences, using deep convolutional Generative Adversarial Networks (GANs). We create a supervisory signal by war** view sequences and assigning the re-projection minimization to the objective loss function that is adopted in multi-view pose estimation and single-view depth generation network. Detailed quantitative and qualitative evaluations of the proposed framework on the KITTI and Cityscapes datasets show that the proposed method outperforms both existing traditional and unsupervised deep VO methods providing better results for both pose estimation and depth recovery. △ Less

Submitted 5 March, 2019; v1 submitted 15 September, 2018; originally announced September 2018.

Comments: ICRA 2019 - accepted

Journal ref: 2019 International Conference on Robotics and Automation (ICRA)

arXiv:1610.03623 [pdf, other]

Fast Training of Convolutional Neural Networks via Kernel Rescaling

Authors: Pedro Porto Buarque de Gusmão, Gianluca Francini, Skjalg Lepsøy, Enrico Magli

Abstract: Training deep Convolutional Neural Networks (CNN) is a time consuming task that may take weeks to complete. In this article we propose a novel, theoretically founded method for reducing CNN training time without incurring any loss in accuracy. The basic idea is to begin training with a pre-train network using lower-resolution kernels and input images, and then refine the results at the full resolu… ▽ More Training deep Convolutional Neural Networks (CNN) is a time consuming task that may take weeks to complete. In this article we propose a novel, theoretically founded method for reducing CNN training time without incurring any loss in accuracy. The basic idea is to begin training with a pre-train network using lower-resolution kernels and input images, and then refine the results at the full resolution by exploiting the spatial scaling property of convolutions. We apply our method to the ImageNet winner OverFeat and to the more recent ResNet architecture and show a reduction in training time of nearly 20% while test set accuracy is preserved in both cases. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Showing 1–20 of 20 results for author: de Gusmao, P P B