Search | arXiv e-print repository

Autonomous robotic re-alignment for face-to-face underwater human-robot interaction

Authors: Demetrious T. Kutzke, Ashwin Wariar, Junaed Sattar

Abstract: The use of autonomous underwater vehicles (AUVs) to accomplish traditionally challenging and dangerous tasks has proliferated thanks to advances in sensing, navigation, manipulation, and on-board computing technologies. Utilizing AUVs in underwater human-robot interaction (UHRI) has witnessed comparatively smaller levels of growth due to limitations in bi-directional communication and significant… ▽ More The use of autonomous underwater vehicles (AUVs) to accomplish traditionally challenging and dangerous tasks has proliferated thanks to advances in sensing, navigation, manipulation, and on-board computing technologies. Utilizing AUVs in underwater human-robot interaction (UHRI) has witnessed comparatively smaller levels of growth due to limitations in bi-directional communication and significant technical hurdles to bridge the gap between analogies with terrestrial interaction strategies and those that are possible in the underwater domain. A necessary component to support UHRI is establishing a system for safe robotic-diver approach to establish face-to-face communication that considers non-standard human body pose. In this work, we introduce a stereo vision system for enhancing UHRI that utilizes three-dimensional reconstruction from stereo image pairs and machine learning for localizing human joint estimates. We then establish a convention for a coordinate system that encodes the direction the human is facing with respect to the camera coordinate frame. This allows automatic setpoint computation that preserves human body scale and can be used as input to an image-based visual servo control scheme. We show that our setpoint computations tend to agree both quantitatively and qualitatively with experimental setpoint baselines. The methodology introduced shows promise for enhancing UHRI by improving robotic perception of human orientation underwater. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Submitted to the Proceedings of the 2024 IEEE Conference on Robotics & Automation (ICRA)

arXiv:2311.14848 [pdf, other]

Robotic Detection and Estimation of Single Scuba Diver Respiration Rate from Underwater Video

Authors: Demetrious T. Kutzke, Junaed Sattar

Abstract: Human respiration rate (HRR) is an important physiological metric for diagnosing a variety of health conditions from stress levels to heart conditions. Estimation of HRR is well-studied in controlled terrestrial environments, yet robotic estimation of HRR as an indicator of diver stress in underwater for underwater human robot interaction (UHRI) scenarios is to our knowledge unexplored. We introdu… ▽ More Human respiration rate (HRR) is an important physiological metric for diagnosing a variety of health conditions from stress levels to heart conditions. Estimation of HRR is well-studied in controlled terrestrial environments, yet robotic estimation of HRR as an indicator of diver stress in underwater for underwater human robot interaction (UHRI) scenarios is to our knowledge unexplored. We introduce a novel system for robotic estimation of HRR from underwater visual data by utilizing bubbles from exhalation cycles in scuba diving to time respiration rate. We introduce a fuzzy labeling system that utilizes audio information to label a diverse dataset of diver breathing data on which we compare four different methods for characterizing the presence of bubbles in images. Ultimately we show that our method is effective at estimating HRR by comparing the respiration rate output with human analysts. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.11536 [pdf, other]

Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication

Authors: Chelsey Edge, Demetrious Kutzke, Megdalia Bromhal, Junaed Sattar

Abstract: This paper presents Diver Interest via Pointing in Three Dimensions (DIP-3D), a method to relay an object of interest from a diver to an autonomous underwater vehicle (AUV) by pointing that includes three-dimensional distance information to discriminate between multiple objects in the AUV's camera image. Traditional dense stereo vision for distance estimation underwater is challenging because of t… ▽ More This paper presents Diver Interest via Pointing in Three Dimensions (DIP-3D), a method to relay an object of interest from a diver to an autonomous underwater vehicle (AUV) by pointing that includes three-dimensional distance information to discriminate between multiple objects in the AUV's camera image. Traditional dense stereo vision for distance estimation underwater is challenging because of the relative lack of saliency of scene features and degraded lighting conditions. Yet, including distance information is necessary for robotic perception of diver pointing when multiple objects appear within the robot's image plane. We subvert the challenges of underwater distance estimation by using sparse reconstruction of keypoints to perform pose estimation on both the left and right images from the robot's stereo camera. Triangulated pose keypoints, along with a classical object detection method, enable DIP-3D to infer the location of an object of interest when multiple objects are in the AUV's field of view. By allowing the scuba diver to point at an arbitrary object of interest and enabling the AUV to autonomously decide which object the diver is pointing to, this method will permit more natural interaction between AUVs and human scuba divers in underwater-human robot collaborative tasks. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Under Review International Conference of Robotics and Automation 2024

arXiv:2310.02944 [pdf, other]

Adaptive Landmark Color for AUV Docking in Visually Dynamic Environments

Authors: Corey Knutson, Zhipeng Cao, Junaed Sattar

Abstract: Autonomous Underwater Vehicles (AUVs) conduct missions underwater without the need for human intervention. A docking station (DS) can extend mission times of an AUV by providing a location for the AUV to recharge its batteries and receive updated mission information. Various methods for locating and tracking a DS exist, but most rely on expensive acoustic sensors, or are vision-based, which is sig… ▽ More Autonomous Underwater Vehicles (AUVs) conduct missions underwater without the need for human intervention. A docking station (DS) can extend mission times of an AUV by providing a location for the AUV to recharge its batteries and receive updated mission information. Various methods for locating and tracking a DS exist, but most rely on expensive acoustic sensors, or are vision-based, which is significantly affected by water quality. In this \doctype, we present a vision-based method that utilizes adaptive color LED markers and dynamic color filtering to maximize landmark visibility in varying water conditions. Both AUV and DS utilize cameras to determine the water background color in order to calculate the desired marker color. No communication between AUV and DS is needed to determine marker color. Experiments conducted in a pool and lake show our method performs 10 times better than static color thresholding methods as background color varies. DS detection is possible at a range of 5 meters in clear water with minimal false positives. △ Less

Submitted 19 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Submitted to ICRA 2024 for review

arXiv:2310.00146 [pdf, other]

Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration

Authors: Jungseok Hong, Sadman Sakib Enan, Junaed Sattar

Abstract: Recent advances in efficient design, perception algorithms, and computing hardware have made it possible to create improved human-robot interaction (HRI) capabilities for autonomous underwater vehicles (AUVs). To conduct secure missions as underwater human-robot teams, AUVs require the ability to accurately identify divers. However, this remains an open problem due to divers' challenging visual fe… ▽ More Recent advances in efficient design, perception algorithms, and computing hardware have made it possible to create improved human-robot interaction (HRI) capabilities for autonomous underwater vehicles (AUVs). To conduct secure missions as underwater human-robot teams, AUVs require the ability to accurately identify divers. However, this remains an open problem due to divers' challenging visual features, mainly caused by similar-looking scuba gear. In this paper, we present a novel algorithm that can perform diver identification using either pre-trained models or models trained during deployment. We exploit anthropometric data obtained from diver pose estimates to generate robust features that are invariant to changes in distance and photometric conditions. We also propose an embedding network that maximizes inter-class distances in the feature space and minimizes those for the intra-class features, which significantly improves classification performance. Furthermore, we present an end-to-end diver identification framework that operates on an AUV and evaluate the accuracy of the proposed algorithm. Quantitative results in controlled-water experiments show that our algorithm achieves a high level of accuracy in diver identification. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2212.01205 [pdf, other]

Diver Interest via Pointing: Human-Directed Object Inspection for AUVs

Authors: Chelsey Edge, Junaed Sattar

Abstract: In this paper, we present the Diver Interest via Pointing (DIP) algorithm, a highly modular method for conveying a diver's area of interest to an autonomous underwater vehicle (AUV) using pointing gestures for underwater human-robot collaborative tasks. DIP uses a single monocular camera and exploits human body pose, even with complete dive gear, to extract underwater human pointing gesture poses… ▽ More In this paper, we present the Diver Interest via Pointing (DIP) algorithm, a highly modular method for conveying a diver's area of interest to an autonomous underwater vehicle (AUV) using pointing gestures for underwater human-robot collaborative tasks. DIP uses a single monocular camera and exploits human body pose, even with complete dive gear, to extract underwater human pointing gesture poses and their directions. By extracting 2D scene geometry based on the human body pose and density of salient feature points along the direction of pointing, using a low-level feature detector, the DIP algorithm is able to locate objects of interest as indicated by the diver. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Under submission at ICRA23

arXiv:2211.02946 [pdf, other]

HREyes: Design, Development, and Evaluation of a Novel Method for AUVs to Communicate Information and Gaze Direction

Authors: Michael Fulton, Aditya Prabhu, Junaed Sattar

Abstract: We present the design, development, and evaluation of HREyes: biomimetic communication devices which use light to communicate information and, for the first time, gaze direction from AUVs to humans. First, we introduce two types of information displays using the HREye devices: active lucemes and ocular lucemes. Active lucemes communicate information explicitly through animations, while ocular luce… ▽ More We present the design, development, and evaluation of HREyes: biomimetic communication devices which use light to communicate information and, for the first time, gaze direction from AUVs to humans. First, we introduce two types of information displays using the HREye devices: active lucemes and ocular lucemes. Active lucemes communicate information explicitly through animations, while ocular lucemes communicate gaze direction implicitly by mimicking human eyes. We present a human study in which our system is compared to the use of an embedded digital display that explicitly communicates information to a diver by displaying text. Our results demonstrate accurate recognition of active lucemes for trained interactants, limited intuitive understanding of these lucemes for untrained interactants, and relatively accurate perception of gaze direction for all interactants. The results on active luceme recognition demonstrate more accurate recognition than previous light-based communication systems for AUVs (albeit with different phrase sets). Additionally, the ocular lucemes we introduce in this work represent the first method for communicating gaze direction from an AUV, a critical aspect of nonverbal communication used in collaborative work. With readily available hardware as well as open-source and easily re-configurable programming, HREyes can be easily integrated into any AUV with the physical space for the devices and used to communicate effectively with divers in any underwater environment with appropriate visibility. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: Under submission at ICRA23

arXiv:2209.14447 [pdf, other]

Visual Detection of Diver Attentiveness for Underwater Human-Robot Interaction

Authors: Sadman Sakib Enan, Junaed Sattar

Abstract: Many underwater tasks, such as cable-and-wreckage inspection, search-and-rescue, benefit from robust human-robot interaction (HRI) capabilities. With the recent advancements in vision-based underwater HRI methods, autonomous underwater vehicles (AUVs) can communicate with their human partners even during a mission. However, these interactions usually require active participation especially from hu… ▽ More Many underwater tasks, such as cable-and-wreckage inspection, search-and-rescue, benefit from robust human-robot interaction (HRI) capabilities. With the recent advancements in vision-based underwater HRI methods, autonomous underwater vehicles (AUVs) can communicate with their human partners even during a mission. However, these interactions usually require active participation especially from humans (e.g., one must keep looking at the robot during an interaction). Therefore, an AUV must know when to start interacting with a human partner, i.e., if the human is paying attention to the AUV or not. In this paper, we present a diver attention estimation framework for AUVs to autonomously detect the attentiveness of a diver and then navigate and reorient itself, if required, with respect to the diver to initiate an interaction. The core element of the framework is a deep neural network (called DATT-Net) which exploits the geometric relation among 10 facial keypoints of the divers to determine their head orientation. Our on-the-bench experimental evaluations (using unseen data) demonstrate that the proposed DATT-Net architecture can determine the attentiveness of human divers with promising accuracy. Our real-world experiments also confirm the efficacy of DATT-Net which enables real-time inference and allows the AUV to position itself for an AUV-diver interaction. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 7 pages, 6 figures, 2 tables

arXiv:2207.05331 [pdf, other]

Robotic Detection of a Human-Comprehensible Gestural Language for Underwater Multi-Human-Robot Collaboration

Authors: Sadman Sakib Enan, Michael Fulton, Junaed Sattar

Abstract: In this paper, we present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers. We design a gestural language for AUV-to-AUV communication which can be easily understood by divers observing the conversation unlike typical radio frequency, light, or audio based AUV communication. To allow AUVs to visually u… ▽ More In this paper, we present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers. We design a gestural language for AUV-to-AUV communication which can be easily understood by divers observing the conversation unlike typical radio frequency, light, or audio based AUV communication. To allow AUVs to visually understand a gesture from another AUV, we propose a deep network (RRCommNet) which exploits a self-attention mechanism to learn to recognize each message by extracting maximally discriminative spatio-temporal features. We train this network on diverse simulated and real-world data. Our experimental evaluations, both in simulation and in closed-water robot trials, demonstrate that the proposed RRCommNet architecture is able to decipher gesture-based messages with an average accuracy of 88-94% on simulated data, 73-83% on real data (depending on the version of the model used). Further, by performing a message transcription study with human participants, we also show that the proposed language can be understood by humans, with an overall transcription accuracy of 88%. Finally, we discuss the inference runtime of RRCommNet on embedded GPU hardware, for real-time use on board AUVs in the field. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2112.01890 [pdf, other]

Fast Direct Stereo Visual SLAM

Authors: Jiawei Mo, Md Jahidul Islam, Junaed Sattar

Abstract: We propose a novel approach for fast and accurate stereo visual Simultaneous Localization and Map** (SLAM) independent of feature detection and matching. We extend monocular Direct Sparse Odometry (DSO) to a stereo system by optimizing the scale of the 3D points to minimize photometric error for the stereo configuration, which yields a computationally efficient and robust method compared to conv… ▽ More We propose a novel approach for fast and accurate stereo visual Simultaneous Localization and Map** (SLAM) independent of feature detection and matching. We extend monocular Direct Sparse Odometry (DSO) to a stereo system by optimizing the scale of the 3D points to minimize photometric error for the stereo configuration, which yields a computationally efficient and robust method compared to conventional stereo matching. We further extend it to a full SLAM system with loop closure to reduce accumulated errors. With the assumption of forward camera motion, we imitate a LiDAR scan using the 3D points obtained from the visual odometry and adapt a LiDAR descriptor for place recognition to facilitate more efficient detection of loop closures. Afterward, we estimate the relative pose using direct alignment by minimizing the photometric error for potential loop closures. Optionally, further improvement over direct alignment is achieved by using the Iterative Closest Point (ICP) algorithm. Lastly, we optimize a pose graph to improve SLAM accuracy globally. By avoiding feature detection or matching in our SLAM system, we ensure high computational efficiency and robustness. Thorough experimental validations on public datasets demonstrate its effectiveness compared to the state-of-the-art approaches. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.03712 [pdf, other]

Using Monocular Vision and Human Body Priors for AUVs to Autonomously Approach Divers

Authors: Michael Fulton, Jungseok Hong, Junaed Sattar

Abstract: Direct communication between humans and autonomous underwater vehicles (AUVs) is a relatively underexplored area in human-robot interaction (HRI) research, although many tasks (\eg surveillance, inspection, and search-and-rescue) require close diver-robot collaboration. Many core functionalities in this domain are in need of further study to improve robotic capabilities for ease of interaction. On… ▽ More Direct communication between humans and autonomous underwater vehicles (AUVs) is a relatively underexplored area in human-robot interaction (HRI) research, although many tasks (\eg surveillance, inspection, and search-and-rescue) require close diver-robot collaboration. Many core functionalities in this domain are in need of further study to improve robotic capabilities for ease of interaction. One of these is the challenge of autonomous robots approaching and positioning themselves relative to divers to initiate and facilitate interactions. Suboptimal AUV positioning can lead to poor quality interaction and lead to excessive cognitive and physical load for divers. In this paper, we introduce a novel method for AUVs to autonomously navigate and achieve diver-relative positioning to begin interaction. Our method is based only on monocular vision, requires no global localization, and is computationally efficient. We present our algorithm along with an implementation of said algorithm on board both a simulated and physical AUV, performing extensive evaluations in the form of closed-water tests in a controlled pool. Analysis of our results show that the proposed monocular vision-based algorithm performs reliably and efficiently operating entirely on-board the AUV. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: 14 pages, under review for ICRA22-RAL

arXiv:2109.09035 [pdf, other]

Continuous-Time Spline Visual-Inertial Odometry

Authors: Jiawei Mo, Junaed Sattar

Abstract: We propose a continuous-time spline-based formulation for visual-inertial odometry (VIO). Specifically, we model the poses as a cubic spline, whose temporal derivatives are used to synthesize linear acceleration and angular velocity, which are compared to the measurements from the inertial measurement unit (IMU) for optimal state estimation. The spline boundary conditions create constraints betwee… ▽ More We propose a continuous-time spline-based formulation for visual-inertial odometry (VIO). Specifically, we model the poses as a cubic spline, whose temporal derivatives are used to synthesize linear acceleration and angular velocity, which are compared to the measurements from the inertial measurement unit (IMU) for optimal state estimation. The spline boundary conditions create constraints between the camera and the IMU, with which we formulate VIO as a constrained nonlinear optimization problem. Continuous-time pose representation makes it possible to address many VIO challenges, e.g., rolling shutter distortion and sensors that may lack synchronization. We conduct experiments on two publicly available datasets that demonstrate the state-of-the-art accuracy and real-time computational efficiency of our method. △ Less

Submitted 18 February, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: ICRA 2022

arXiv:2109.07134 [pdf, other]

ROW-SLAM: Under-Canopy Cornfield Semantic SLAM

Authors: Jiacheng Yuan, Jungseok Hong, Junaed Sattar, Volkan Isler

Abstract: We study a semantic SLAM problem faced by a robot tasked with autonomous weeding under the corn canopy. The goal is to detect corn stalks and localize them in a global coordinate frame. This is a challenging setup for existing algorithms because there is very little space between the camera and the plants, and the camera motion is primarily restricted to be along the row. To overcome these challen… ▽ More We study a semantic SLAM problem faced by a robot tasked with autonomous weeding under the corn canopy. The goal is to detect corn stalks and localize them in a global coordinate frame. This is a challenging setup for existing algorithms because there is very little space between the camera and the plants, and the camera motion is primarily restricted to be along the row. To overcome these challenges, we present a multi-camera system where a side camera (facing the plants) is used for detection whereas front and back cameras are used for motion estimation. Next, we show how semantic features in the environment (corn stalks, ground, and crop planes) can be used to develop a robust semantic SLAM solution and present results from field trials performed throughout the growing season across various cornfields. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: 7 pages, 6 figures

arXiv:2107.06401 [pdf, other]

Semantically-Aware Strategies for Stereo-Visual Robotic Obstacle Avoidance

Authors: Jungseok Hong, Karin de Langis, Cole Wyeth, Christopher Walaszek, Junaed Sattar

Abstract: Mobile robots in unstructured, mapless environments must rely on an obstacle avoidance module to navigate safely. The standard avoidance techniques estimate the locations of obstacles with respect to the robot but are unaware of the obstacles' identities. Consequently, the robot cannot take advantage of semantic information about obstacles when making decisions about how to navigate. We propose an… ▽ More Mobile robots in unstructured, mapless environments must rely on an obstacle avoidance module to navigate safely. The standard avoidance techniques estimate the locations of obstacles with respect to the robot but are unaware of the obstacles' identities. Consequently, the robot cannot take advantage of semantic information about obstacles when making decisions about how to navigate. We propose an obstacle avoidance module that combines visual instance segmentation with a depth map to classify and localize objects in the scene. The system avoids obstacles differentially, based on the identity of the objects: for example, the system is more cautious in response to unpredictable objects such as humans. The system can also navigate closer to harmless obstacles and ignore obstacles that pose no collision danger, enabling it to navigate more efficiently. We validate our approach in two simulated environments: one terrestrial and one underwater. Results indicate that our approach is feasible and can enable more efficient navigation strategies. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2012.05990 [pdf, other]

A Generative Approach for Detection-driven Underwater Image Enhancement

Authors: Chelsey Edge, Md Jahidul Islam, Christopher Morse, Junaed Sattar

Abstract: In this paper, we introduce a generative model for image enhancement specifically for improving diver detection in the underwater domain. In particular, we present a model that integrates generative adversarial network (GAN)-based image enhancement with the diver detection task. Our proposed approach restructures the GAN objective function to include information from a pre-trained diver detector w… ▽ More In this paper, we introduce a generative model for image enhancement specifically for improving diver detection in the underwater domain. In particular, we present a model that integrates generative adversarial network (GAN)-based image enhancement with the diver detection task. Our proposed approach restructures the GAN objective function to include information from a pre-trained diver detector with the goal to generate images which would enhance the accuracy of the detector in adverse visual conditions. By incorporating the detector output into both the generator and discriminator networks, our model is able to focus on enhancing images beyond aesthetic qualities and specifically to improve robotic detection of scuba divers. We train our network on a large dataset of scuba divers, using a state-of-the-art diver detector, and demonstrate its utility on images collected from oceanic explorations of human-robot teams. Experimental evaluations demonstrate that our approach significantly improves diver detection performance over raw, unenhanced images, and even outperforms detection performance on the output of state-of-the-art underwater image enhancement algorithms. Finally, we demonstrate the inference performance of our network on embedded devices to highlight the feasibility of operating on board mobile robotic platforms. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: Under review for ICRA 2021

arXiv:2012.05701 [pdf, other]

An Analysis of Deep Object Detectors For Diver Detection

Authors: Karin de Langis, Michael Fulton, Junaed Sattar

Abstract: With the end goal of selecting and using diver detection models to support human-robot collaboration capabilities such as diver following, we thoroughly analyze a large set of deep neural networks for diver detection. We begin by producing a dataset of approximately 105,000 annotated images of divers sourced from videos -- one of the largest and most varied diver detection datasets ever created. U… ▽ More With the end goal of selecting and using diver detection models to support human-robot collaboration capabilities such as diver following, we thoroughly analyze a large set of deep neural networks for diver detection. We begin by producing a dataset of approximately 105,000 annotated images of divers sourced from videos -- one of the largest and most varied diver detection datasets ever created. Using this dataset, we train a variety of state-of-the-art deep neural networks for object detection, including SSD with Mobilenet, Faster R-CNN, and YOLO. Along with these single-frame detectors, we also train networks designed for detection of objects in a video stream, using temporal information as well as single-frame image information. We evaluate these networks on typical accuracy and efficiency metrics, as well as on the temporal stability of their detections. Finally, we analyze the failures of these detectors, pointing out the most common scenarios of failure. Based on our results, we recommend SSDs or Tiny-YOLOv4 for real-time applications on robots and recommend further investigation of video object detection methods. △ Less

Submitted 24 November, 2020; originally announced December 2020.

Comments: 14 pages, submitted for ICRA 21

arXiv:2011.09556 [pdf, other]

Visual Diver Face Recognition for Underwater Human-Robot Interaction

Authors: Jungseok Hong, Sadman Sakib Enan, Christopher Morse, Junaed Sattar

Abstract: This paper presents a deep-learned facial recognition method for underwater robots to identify scuba divers. Specifically, the proposed method is able to recognize divers underwater with faces heavily obscured by scuba masks and breathing apparatus. Our contribution in this research is towards robust facial identification of individuals under significant occlusion of facial features and image degr… ▽ More This paper presents a deep-learned facial recognition method for underwater robots to identify scuba divers. Specifically, the proposed method is able to recognize divers underwater with faces heavily obscured by scuba masks and breathing apparatus. Our contribution in this research is towards robust facial identification of individuals under significant occlusion of facial features and image degradation from underwater optical distortions. With the ability to correctly recognize divers, autonomous underwater vehicles (AUV) will be able to engage in collaborative tasks with the correct person in human-robot teams and ensure that instructions are accepted from only those authorized to command the robots. We demonstrate that our proposed framework is able to learn discriminative features from real-world diver faces through different data augmentation and generation techniques. Experimental evaluations show that this framework achieves a 3-fold increase in prediction accuracy compared to the state-of-the-art (SOTA) algorithms and is well-suited for embedded inference on robotic platforms. △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2011.06252 [pdf, other]

SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots

Authors: Md Jahidul Islam, Ruobing Wang, Junaed Sattar

Abstract: This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and… ▽ More This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and top-down learning within two separate branches of the network while sharing the same encoding layers. We design dedicated spatial attention modules (SAMs) along these learning pathways to exploit the coarse-level and fine-level semantic features for SOD at four stages of abstractions. The bottom-up branch performs a rough yet reasonably accurate saliency estimation at a fast rate, whereas the deeper top-down branch incorporates a residual refinement module (RRM) that provides fine-grained localization of the salient objects. Extensive performance evaluation of SVAM-Net on benchmark datasets clearly demonstrates its effectiveness for underwater SOD. We also validate its generalization performance by several ocean trials' data that include test images of diverse underwater scenes and waterbodies, and also images with unseen natural objects. Moreover, we analyze its computational feasibility for robotic deployments and demonstrate its utility in several important use cases of visual attention modeling. △ Less

Submitted 14 April, 2022; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.03106 [pdf, other]

IMU-Assisted Learning of Single-View Rolling Shutter Correction

Authors: Jiawei Mo, Md Jahidul Islam, Junaed Sattar

Abstract: Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (I… ▽ More Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach. △ Less

Submitted 14 September, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2007.08097 [pdf, other]

TrashCan: A Semantically-Segmented Dataset towards Visual Detection of Marine Debris

Authors: Jungseok Hong, Michael Fulton, Junaed Sattar

Abstract: This paper presents TrashCan, a large dataset comprised of images of underwater trash collected from a variety of sources, annotated both using bounding boxes and segmentation labels, for development of robust detectors of marine debris. The dataset has two versions, TrashCan-Material and TrashCan-Instance, corresponding to different object class configurations. The eventual goal is to develop eff… ▽ More This paper presents TrashCan, a large dataset comprised of images of underwater trash collected from a variety of sources, annotated both using bounding boxes and segmentation labels, for development of robust detectors of marine debris. The dataset has two versions, TrashCan-Material and TrashCan-Instance, corresponding to different object class configurations. The eventual goal is to develop efficient and accurate trash detection methods suitable for onboard robot deployment. Along with information about the construction and sourcing of the TrashCan dataset, we present initial results of instance segmentation from Mask R-CNN and object detection from Faster R-CNN. These do not represent the best possible detection results but provides an initial baseline for future work in instance segmentation and object detection on the TrashCan dataset. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2004.01241 [pdf, other]

Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

Authors: Md Jahidul Islam, Chelsey Edge, Yuyang Xiao, Peigen Luo, Muntaqim Mehtaz, Christopher Morse, Sadman Sakib Enan, Junaed Sattar

Abstract: In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborati… ▽ More In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics. In addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that balances the trade-off between performance and computational efficiency. It offers competitive performance while ensuring fast end-to-end inference, which is essential for its use in the autonomy pipeline of visually-guided underwater robots. In particular, we demonstrate its usability benefits for visual servoing, saliency prediction, and detailed scene understanding. With a variety of use cases, the proposed model and benchmark dataset open up promising opportunities for future research in underwater robot vision. △ Less

Submitted 13 September, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.09041 [pdf, other]

Design and Experiments with LoCO AUV: A Low Cost Open-Source Autonomous Underwater Vehicle

Authors: Chelsey Edge, Sadman Sakib Enan, Michael Fulton, Jungseok Hong, Jiawei Mo, Kimberly Barthelemy, Hunter Bashaw, Berik Kallevig, Corey Knutson, Kevin Orpen, Junaed Sattar

Abstract: In this paper we present LoCO AUV, a Low-Cost, Open Autonomous Underwater Vehicle. LoCO is a general-purpose, single-person-deployable, vision-guided AUV, rated to a depth of 100 meters. We discuss the open and expandable design of this underwater robot, as well as the design of a simulator in Gazebo. Additionally, we explore the platform's preliminary local motion control and state estimation abi… ▽ More In this paper we present LoCO AUV, a Low-Cost, Open Autonomous Underwater Vehicle. LoCO is a general-purpose, single-person-deployable, vision-guided AUV, rated to a depth of 100 meters. We discuss the open and expandable design of this underwater robot, as well as the design of a simulator in Gazebo. Additionally, we explore the platform's preliminary local motion control and state estimation abilities, which enable it to perform maneuvers autonomously. In order to demonstrate its usefulness for a variety of tasks, we implement a variety of our previously presented human-robot interaction capabilities on LoCO, including gestural control, diver following, and robot communication via motion. Finally, we discuss the practical concerns of deployment and our experiences in using this robot in pools, lakes, and the ocean. All design details, instructions on assembly, and code will be released under a permissive, open-source license. △ Less

Submitted 19 March, 2020; originally announced March 2020.

Comments: 13 pages, 11 figures

arXiv:2002.01155 [pdf, other]

Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception

Authors: Md Jahidul Islam, Peigen Luo, Junaed Sattar

Abstract: In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by… ▽ More In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. It is also supervised to learn salient foreground regions in the image, which in turn guides the network to learn global contrast enhancement. We design an end-to-end training pipeline to jointly learn the saliency prediction and SESR on a shared hierarchical feature space for fast inference. Moreover, we present UFO-120, the first dataset to facilitate large-scale SESR learning; it contains over 1500 training samples and a benchmark test set of 120 samples. By thorough experimental evaluation on the UFO-120 and other standard datasets, we demonstrate that Deep SESR outperforms the existing solutions for underwater image enhancement and super-resolution. We also validate its generalization performance on several test cases that include underwater images with diverse spectral and spatial degradation levels, and also terrestrial images with unseen natural objects. Lastly, we analyze its computational feasibility for single-board deployments and demonstrate its operational benefits for visually-guided underwater robots. The model and dataset information will be available at: https://github.com/xahidbuffon/Deep-SESR. △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:1910.09636 [pdf, other]

Real-Time Multi-Diver Tracking and Re-identification for Underwater Human-Robot Collaboration

Authors: Karin de Langis, Junaed Sattar

Abstract: Autonomous underwater robots working with teams of human divers may need to distinguish between different divers, e.g. to recognize a lead diver or to follow a specific team member. This paper describes a technique that enables autonomous underwater robots to track divers in real time as well as to reidentify them. The approach is an extension of Simple Online Realtime Tracking (SORT) with an appe… ▽ More Autonomous underwater robots working with teams of human divers may need to distinguish between different divers, e.g. to recognize a lead diver or to follow a specific team member. This paper describes a technique that enables autonomous underwater robots to track divers in real time as well as to reidentify them. The approach is an extension of Simple Online Realtime Tracking (SORT) with an appearance metric (deep SORT). Initial diver detection is performed with a custom CNN designed for realtime diver detection, and appearance features are subsequently extracted for each detected diver. Next, realtime tracking-by-detection is performed with an extension of the deep SORT algorithm. We evaluate this technique on a series of videos of divers performing human-robot collaborative tasks and show that our methods result in more divers being accurately identified during tracking. We also discuss the practical considerations of applying multi-person tracking to on-board autonomous robot operations, and we consider how failure cases can be addressed during on-board tracking. △ Less

Submitted 21 October, 2019; originally announced October 2019.

arXiv:1910.04754 [pdf, other]

A Generative Approach Towards Improved Robotic Detection of Marine Litter

Authors: Jungseok Hong, Michael Fulton, Junaed Sattar

Abstract: This paper presents an approach to address data scarcity problems in underwater image datasets for visual detection of marine debris. The proposed approach relies on a two-stage variational autoencoder (VAE) and a binary classifier to evaluate the generated imagery for quality and realism. From the images generated by the two-stage VAE, the binary classifier selects "good quality" images and augme… ▽ More This paper presents an approach to address data scarcity problems in underwater image datasets for visual detection of marine debris. The proposed approach relies on a two-stage variational autoencoder (VAE) and a binary classifier to evaluate the generated imagery for quality and realism. From the images generated by the two-stage VAE, the binary classifier selects "good quality" images and augments the given dataset with them. Lastly, a multi-class classifier is used to evaluate the impact of the augmentation process by measuring the accuracy of an object detector trained on combinations of real and generated trash images. Our results show that the classifier trained with the augmented data outperforms the one trained only with the real data. This approach will not only be valid for the underwater trash classification problem presented in this paper, but it will also be useful for any data-dependent task for which collecting more images is challenging or infeasible. △ Less

Submitted 10 October, 2019; originally announced October 2019.

arXiv:1909.09437 [pdf, other]

Underwater Image Super-Resolution using Deep Residual Multipliers

Authors: Md Jahidul Islam, Sadman Sakib Enan, Peigen Luo, Junaed Sattar

Abstract: We present a deep residual network-based generative model for single image super-resolution (SISR) of underwater imagery for use by autonomous underwater robots. We also provide an adversarial training pipeline for learning SISR from paired data. In order to supervise the training, we formulate an objective function that evaluates the \textit{perceptual quality} of an image based on its global con… ▽ More We present a deep residual network-based generative model for single image super-resolution (SISR) of underwater imagery for use by autonomous underwater robots. We also provide an adversarial training pipeline for learning SISR from paired data. In order to supervise the training, we formulate an objective function that evaluates the \textit{perceptual quality} of an image based on its global content, color, and local style information. Additionally, we present USR-248, a large-scale dataset of three sets of underwater images of 'high' (640x480) and 'low' (80x60, 160x120, and 320x240) spatial resolution. USR-248 contains paired instances for supervised training of 2x, 4x, or 8x SISR models. Furthermore, we validate the effectiveness of our proposed model through qualitative and quantitative experiments and compare the results with several state-of-the-art models' performances. We also analyze its practical feasibility for applications such as scene understanding and attention modeling in noisy visual conditions. △ Less

Submitted 24 February, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

arXiv:1909.07267 [pdf, other]

A Fast and Robust Place Recognition Approach for Stereo Visual Odometry Using LiDAR Descriptors

Authors: Jiawei Mo, Junaed Sattar

Abstract: Place recognition is a core component of Simultaneous Localization and Map** (SLAM) algorithms. Particularly in visual SLAM systems, previously-visited places are recognized by measuring the appearance similarity between images representing these locations. However, such approaches are sensitive to visual appearance change and also can be computationally expensive. In this paper, we propose an a… ▽ More Place recognition is a core component of Simultaneous Localization and Map** (SLAM) algorithms. Particularly in visual SLAM systems, previously-visited places are recognized by measuring the appearance similarity between images representing these locations. However, such approaches are sensitive to visual appearance change and also can be computationally expensive. In this paper, we propose an alternative approach adapting LiDAR descriptors for 3D points obtained from stereo-visual odometry for place recognition. 3D points are potentially more reliable than 2D visual cues (e.g., 2D features) against environmental changes (e.g., variable illumination) and this may benefit visual SLAM systems in long-term deployment scenarios. Stereo-visual odometry generates 3D points with an absolute scale, which enables us to use LiDAR descriptors for place recognition with high computational efficiency. Through extensive evaluations on standard benchmark datasets, we demonstrate the accuracy, efficiency, and robustness of using 3D points for place recognition over 2D methods. △ Less

Submitted 26 July, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: Accepted by IROS2020

arXiv:1905.12723 [pdf, other]

Extending Monocular Visual Odometry to Stereo Camera Systems by Scale Optimization

Authors: Jiawei Mo, Junaed Sattar

Abstract: This paper proposes a novel approach for extending monocular visual odometry to a stereo camera system. The proposed method uses an additional camera to accurately estimate and optimize the scale of the monocular visual odometry, rather than triangulating 3D points from stereo matching. Specifically, the 3D points generated by the monocular visual odometry are projected onto the other camera of th… ▽ More This paper proposes a novel approach for extending monocular visual odometry to a stereo camera system. The proposed method uses an additional camera to accurately estimate and optimize the scale of the monocular visual odometry, rather than triangulating 3D points from stereo matching. Specifically, the 3D points generated by the monocular visual odometry are projected onto the other camera of the stereo pair, and the scale is recovered and optimized by directly minimizing the photometric error. It is computationally efficient, adding minimal overhead to the stereo vision system compared to straightforward stereo matching, and is robust to repetitive texture. Additionally, direct scale optimization enables stereo visual odometry to be purely based on the direct method. Extensive evaluation on public datasets (e.g., KITTI), and outdoor environments (both terrestrial and underwater) demonstrates the accuracy and efficiency of a stereo visual odometry approach extended by scale optimization, and its robustness in environments with challenging textures. △ Less

Submitted 17 September, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1903.09766 [pdf, other]

Fast Underwater Image Enhancement for Improved Visual Perception

Authors: Md Jahidul Islam, Youya Xia, Junaed Sattar

Abstract: In this paper, we present a conditional generative adversarial network-based model for real-time underwater image enhancement. To supervise the adversarial training, we formulate an objective function that evaluates the perceptual image quality based on its global content, color, local texture, and style information. We also present EUVP, a large-scale dataset of a paired and unpaired collection o… ▽ More In this paper, we present a conditional generative adversarial network-based model for real-time underwater image enhancement. To supervise the adversarial training, we formulate an objective function that evaluates the perceptual image quality based on its global content, color, local texture, and style information. We also present EUVP, a large-scale dataset of a paired and unpaired collection of underwater images (of `poor' and `good' quality) that are captured using seven different cameras over various visibility conditions during oceanic explorations and human-robot collaborative experiments. In addition, we perform several qualitative and quantitative evaluations which suggest that the proposed model can learn to enhance underwater image quality from both paired and unpaired training. More importantly, the enhanced images provide improved performances of standard models for underwater object detection, human pose estimation, and saliency prediction. These results validate that it is suitable for real-time preprocessing in the autonomy pipeline by visually-guided underwater robots. The model and associated training pipelines are available at https://github.com/xahidbuffon/funie-gan. △ Less

Submitted 8 February, 2020; v1 submitted 23 March, 2019; originally announced March 2019.

arXiv:1903.03134 [pdf, other]

By Land, Air, or Sea: Multi-Domain Robot Communication Via Motion

Authors: Michael Fulton, Mustaf Ahmed, Junaed Sattar

Abstract: In this paper, we explore the use of motion for robot-to-human communication on three robotic platforms: the 5 degrees-of-freedom (DOF) Aqua autonomous underwater vehicle (AUV), a 3-DOF camera gimbal mounted on a Matrice 100 drone, and a 3-DOF Turtlebot2 terrestrial robot. While we previously explored the use of body language-like motion (called kinemes) versus other methods of communication for t… ▽ More In this paper, we explore the use of motion for robot-to-human communication on three robotic platforms: the 5 degrees-of-freedom (DOF) Aqua autonomous underwater vehicle (AUV), a 3-DOF camera gimbal mounted on a Matrice 100 drone, and a 3-DOF Turtlebot2 terrestrial robot. While we previously explored the use of body language-like motion (called kinemes) versus other methods of communication for the Aqua AUV, we now extend those concepts to robots in two new and different domains. We evaluate all three platforms using a small interaction study where participants use gestures to communicate with the robot, receive information from the robot via kinemes, and then take actions based on the information. To compare the three domains we consider the accuracy of these interactions, the time it takes to complete them, and how confident users feel in the success of their interactions. The kineme systems perform with reasonable accuracy for all robots and experience gained in this study is used to form a set of prescriptions for further development of kineme systems. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: 15 pages, submitted for publication at IROS 2019

arXiv:1903.00820 [pdf, other]

Robot-to-Robot Relative Pose Estimation using Humans as Markers

Authors: Md Jahidul Islam, Jiawei Mo, Junaed Sattar

Abstract: In this paper, we propose a method to determine the 3D relative pose of pairs of communicating robots by using human pose-based key-points as correspondences. We adopt a 'leader-follower' framework, where at first, the leader robot visually detects and triangulates the key-points using the state-of-the-art pose detector named OpenPose. Afterward, the follower robots match the corresponding 2D proj… ▽ More In this paper, we propose a method to determine the 3D relative pose of pairs of communicating robots by using human pose-based key-points as correspondences. We adopt a 'leader-follower' framework, where at first, the leader robot visually detects and triangulates the key-points using the state-of-the-art pose detector named OpenPose. Afterward, the follower robots match the corresponding 2D projections on their respective calibrated cameras and find their relative poses by solving the perspective-n-point (PnP) problem. In the proposed method, we design an efficient person re-identification technique for associating the mutually visible humans in the scene. Additionally, we present an iterative optimization algorithm to refine the associated key-points based on their local structural properties in the image space. We demonstrate that these refinement processes are essential to establish accurate key-point correspondences across viewpoints. Furthermore, we evaluate the performance of the proposed relative pose estimation system through several experiments conducted in terrestrial and underwater environments. Finally, we discuss the relevant operational challenges of this approach and analyze its feasibility for multi-robot cooperative systems in human-dominated social settings and feature-deprived environments such as underwater. △ Less

Submitted 6 September, 2020; v1 submitted 2 March, 2019; originally announced March 2019.

arXiv:1810.03963

DSVO: Direct Stereo Visual Odometry

Authors: Jiawei Mo, Junaed Sattar

Abstract: This paper proposes a novel approach to stereo visual odometry without stereo matching. It is particularly robust in scenes of repetitive high-frequency textures. Referred to as DSVO (Direct Stereo Visual Odometry), it operates directly on pixel intensities, without any explicit feature matching, and is thus efficient and more accurate than the state-of-the-art stereo-matching-based methods. It ap… ▽ More This paper proposes a novel approach to stereo visual odometry without stereo matching. It is particularly robust in scenes of repetitive high-frequency textures. Referred to as DSVO (Direct Stereo Visual Odometry), it operates directly on pixel intensities, without any explicit feature matching, and is thus efficient and more accurate than the state-of-the-art stereo-matching-based methods. It applies a semi-direct monocular visual odometry running on one camera of the stereo pair, tracking the camera pose and map** the environment simultaneously; the other camera is used to optimize the scale of monocular visual odometry. We evaluate DSVO in a number of challenging scenes to evaluate its performance and present comparisons with the state-of-the-art stereo visual odometry algorithms. △ Less

Submitted 16 September, 2019; v1 submitted 19 September, 2018; originally announced October 2018.

Comments: Rewritten to "Extending Monocular Visual Odometry to Stereo Camera Systems by Scale Optimization" arXiv:1905.12723

arXiv:1809.10201 [pdf, other]

Visual Diver Recognition for Underwater Human-Robot Collaboration

Authors: Youya Xia, Junaed Sattar

Abstract: This paper presents an approach for autonomous underwater robots to visually detect and identify divers. The proposed approach enables an autonomous underwater robot to detect multiple divers in a visual scene and distinguish between them. Such methods are useful for robots to identify a human leader, for example, in multi-human/robot teams where only designated individuals are allowed to command… ▽ More This paper presents an approach for autonomous underwater robots to visually detect and identify divers. The proposed approach enables an autonomous underwater robot to detect multiple divers in a visual scene and distinguish between them. Such methods are useful for robots to identify a human leader, for example, in multi-human/robot teams where only designated individuals are allowed to command or lean a team of robots. Initial diver identification is performed using the Faster R-CNN algorithm with a region proposal network which produces bounding boxes around the divers' locations. Subsequently, a suite of spatial and frequency domain descriptors are extracted from the bounding boxes to create a feature vector. A K-Means clustering algorithm, with k set to the number of detected bounding boxes, thereafter identifies the detected divers based on these feature vectors. We evaluate the performance of the proposed approach on video footage of divers swimming in front of a mobile robot and demonstrate its accuracy. △ Less

Submitted 19 September, 2018; originally announced September 2018.

Comments: submitted for ICRA 2019

arXiv:1809.08076 [pdf, ps, other]

An Evaluation of Bayesian Methods for Bathymetry-based Localization of Autonomous Underwater Robots

Authors: Jungseok Hong, Michael Fulton, Junaed Sattar

Abstract: This paper presents an evaluation of a number of probabilistic algorithms for localization of autonomous underwater vehicles (AUVs) using bathymetry data. The algorithms, based on the principles of the Bayes filter, work by fusing bathymetry information with depth and altitude data from an AUV. Four different Bayes filter-based algorithms are used to design the localization algorithms: the Extende… ▽ More This paper presents an evaluation of a number of probabilistic algorithms for localization of autonomous underwater vehicles (AUVs) using bathymetry data. The algorithms, based on the principles of the Bayes filter, work by fusing bathymetry information with depth and altitude data from an AUV. Four different Bayes filter-based algorithms are used to design the localization algorithms: the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), Particle Filter (PF), and Marginalized Particle Filter (MPF). We evaluate the performance of these four Bayesian bathymetry-based AUV localization approaches under variable conditions and available computational resources. The localization algorithms overcome unique challenges of the underwater domain, including visual distortion and radio frequency (RF) signal attenuation, which often make landmark-based localization infeasible. Evaluation results on real-world bathymetric data show the effectiveness of each algorithm under a variety of conditions, with the MPF being the most accurate. △ Less

Submitted 10 October, 2019; v1 submitted 21 September, 2018; originally announced September 2018.

arXiv:1809.07948 [pdf, other]

Robot Communication Via Motion: Closing the Underwater Human-Robot Interaction Loop

Authors: Michael Fulton, Chelsey Edge, Junaed Sattar

Abstract: In this paper, we propose a novel method for underwater robot-to-human communication using the motion of the robot as "body language". To evaluate this system, we develop simulated examples of the system's body language gestures, called kinemes, and compare them to a baseline system using flashing colored lights through a user study. Our work shows evidence that motion can be used as a successful… ▽ More In this paper, we propose a novel method for underwater robot-to-human communication using the motion of the robot as "body language". To evaluate this system, we develop simulated examples of the system's body language gestures, called kinemes, and compare them to a baseline system using flashing colored lights through a user study. Our work shows evidence that motion can be used as a successful communication vector which is accurate, easy to learn, and quick enough to be used, all without requiring any additional hardware to be added to our platform. We thus contribute to "closing the loop" for human-robot interaction underwater by proposing and testing this system, suggesting a library of possible body language gestures for underwater robots, and offering insight on the design of nonverbal robot-to-human communication methods. △ Less

Submitted 21 September, 2018; originally announced September 2018.

Comments: Under review for ICRA 2019

arXiv:1809.06849 [pdf, other]

Towards a Generic Diver-Following Algorithm: Balancing Robustness and Efficiency in Deep Visual Detection

Authors: Md Jahidul Islam, Michael Fulton, Junaed Sattar

Abstract: This paper explores the design and development of a class of robust diver-following algorithms for autonomous underwater robots. By considering the operational challenges for underwater visual tracking in diverse real-world settings, we formulate a set of desired features of a generic diver following algorithm. We attempt to accommodate these features and maximize general tracking performance by e… ▽ More This paper explores the design and development of a class of robust diver-following algorithms for autonomous underwater robots. By considering the operational challenges for underwater visual tracking in diverse real-world settings, we formulate a set of desired features of a generic diver following algorithm. We attempt to accommodate these features and maximize general tracking performance by exploiting the state-of-the-art deep object detection models. We fine-tune the building blocks of these models with a goal of balancing the trade-off between robustness and efficiency in an onboard setting under real-time constraints. Subsequently, we design an architecturally simple Convolutional Neural Network (CNN)-based diver-detection model that is much faster than the state-of-the-art deep models yet provides comparable detection performances. In addition, we validate the performance and effectiveness of the proposed diver-following modules through a number of field experiments in closed-water and open-water environments. △ Less

Submitted 18 September, 2018; originally announced September 2018.

arXiv:1807.11575 [pdf, other]

SafeDrive: Enhancing Lane Appearance for Autonomous and Assisted Driving Under Limited Visibility

Authors: Jiawei Mo, Junaed Sattar

Abstract: Autonomous detection of lane markers improves road safety, and purely visual tracking is desirable for widespread vehicle compatibility and reducing sensor intrusion, cost, and energy consumption. However, visual approaches are often ineffective because of a number of factors; e.g., occlusion, poor weather conditions, and paint wear-off. We present an approach to enhance lane marker appearance for… ▽ More Autonomous detection of lane markers improves road safety, and purely visual tracking is desirable for widespread vehicle compatibility and reducing sensor intrusion, cost, and energy consumption. However, visual approaches are often ineffective because of a number of factors; e.g., occlusion, poor weather conditions, and paint wear-off. We present an approach to enhance lane marker appearance for assisted and autonomous driving, particularly under poor visibility. Our method, named SafeDrive, attempts to improve visual lane detection approaches in drastically degraded visual conditions. SafeDrive finds lane markers in alternate imagery of the road at the vehicle's location and reconstructs a sparse 3D model of the surroundings. By estimating the geometric relationship between this 3D model and the current view, the lane markers are projected onto the visual scene; any lane detection algorithm can be subsequently used to detect lanes in the resulting image. SafeDrive does not require additional sensors other than vision and location data. We demonstrate the effectiveness of our approach on a number of test cases obtained from actual driving data recorded in urban settings. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: arXiv admin note: text overlap with arXiv:1701.08449

arXiv:1804.01079 [pdf, other]

Robotic Detection of Marine Litter Using Deep Visual Detection Models

Authors: Michael Fulton, Jungseok Hong, Md Jahidul Islam, Junaed Sattar

Abstract: Trash deposits in aquatic environments have a destructive effect on marine ecosystems and pose a long-term economic and environmental threat. Autonomous underwater vehicles (AUVs) could very well contribute to the solution of this problem by finding and eventually removing trash. This paper evaluates a number of deep-learning algorithms preforming the task of visually detecting trash in realistic… ▽ More Trash deposits in aquatic environments have a destructive effect on marine ecosystems and pose a long-term economic and environmental threat. Autonomous underwater vehicles (AUVs) could very well contribute to the solution of this problem by finding and eventually removing trash. This paper evaluates a number of deep-learning algorithms preforming the task of visually detecting trash in realistic underwater environments, with the eventual goal of exploration, map**, and extraction of such debris by using AUVs. A large and publicly-available dataset of actual debris in open-water locations is annotated for training a number of convolutional neural network architectures for object detection. The trained networks are then evaluated on a set of images from other portions of that dataset, providing insight into approaches for develo** the detection capabilities of an AUV for underwater trash removal. In addition, the evaluation is performed on three different platforms of varying processing power, which serves to assess these algorithms' fitness for real-time applications. △ Less

Submitted 21 September, 2018; v1 submitted 3 April, 2018; originally announced April 2018.

Comments: Under review for ICRA 2019

arXiv:1803.08202 [pdf, other]

Person Following by Autonomous Robots: A Categorical Overview

Authors: Md Jahidul Islam, Jungseok Hong, Junaed Sattar

Abstract: A wide range of human-robot collaborative applications in diverse domains such as manufacturing, health care, the entertainment industry, and social interactions, require an autonomous robot to follow its human companion. Different working environments and applications pose diverse challenges by adding constraints on the choice of sensors, the degree of autonomy, and dynamics of a person-following… ▽ More A wide range of human-robot collaborative applications in diverse domains such as manufacturing, health care, the entertainment industry, and social interactions, require an autonomous robot to follow its human companion. Different working environments and applications pose diverse challenges by adding constraints on the choice of sensors, the degree of autonomy, and dynamics of a person-following robot. Researchers have addressed these challenges in many ways and contributed to the development of a large body of literature. This paper provides a comprehensive overview of the literature by categorizing different aspects of person-following by autonomous robots. Also, the corresponding operational challenges are identified based on various design choices for ground, underwater, and aerial scenarios. In addition, state-of-the-art methods for perception, planning, control, and interaction are elaborately discussed and their applicability in varied operational scenarios are presented. Then, some of the prominent methods are qualitatively compared, corresponding practicalities are illustrated, and their feasibility is analyzed for various use-cases. Furthermore, several prospective application areas are identified, and open problems are highlighted for future research. △ Less

Submitted 17 September, 2019; v1 submitted 21 March, 2018; originally announced March 2018.

arXiv:1801.04011 [pdf, other]

Enhancing Underwater Imagery using Generative Adversarial Networks

Authors: Cameron Fabbri, Md Jahidul Islam, Junaed Sattar

Abstract: Autonomous underwater vehicles (AUVs) rely on a variety of sensors - acoustic, inertial and visual - for intelligent decision making. Due to its non-intrusive, passive nature, and high information content, vision is an attractive sensing modality, particularly at shallower depths. However, factors such as light refraction and absorption, suspended particles in the water, and color distortion affec… ▽ More Autonomous underwater vehicles (AUVs) rely on a variety of sensors - acoustic, inertial and visual - for intelligent decision making. Due to its non-intrusive, passive nature, and high information content, vision is an attractive sensing modality, particularly at shallower depths. However, factors such as light refraction and absorption, suspended particles in the water, and color distortion affect the quality of visual data, resulting in noisy and distorted images. AUVs that rely on visual sensing thus face difficult challenges, and consequently exhibit poor performance on vision-driven tasks. This paper proposes a method to improve the quality of visual underwater scenes using Generative Adversarial Networks (GANs), with the goal of improving input to vision-driven behaviors further down the autonomy pipeline. Furthermore, we show how recently proposed methods are able to generate a dataset for the purpose of such underwater image restoration. For any visually-guided underwater robots, this improvement can result in increased safety and reliability through robust visual perception. To that effect, we present quantitative and qualitative data which demonstrates that images corrected through the proposed approach generate more visually appealing images, and also provide increased accuracy for a diver tracking algorithm. △ Less

Submitted 11 January, 2018; originally announced January 2018.

Comments: Submitted to ICRA 2018

arXiv:1709.08772 [pdf, other]

Dynamic Reconfiguration of Mission Parameters in Underwater Human-Robot Collaboration

Authors: Md Jahidul Islam, Marc Ho, Junaed Sattar

Abstract: This paper presents a real-time programming and parameter reconfiguration method for autonomous underwater robots in human-robot collaborative tasks. Using a set of intuitive and meaningful hand gestures, we develop a syntactically simple framework that is computationally more efficient than a complex, grammar-based approach. In the proposed framework, a convolutional neural network is trained to… ▽ More This paper presents a real-time programming and parameter reconfiguration method for autonomous underwater robots in human-robot collaborative tasks. Using a set of intuitive and meaningful hand gestures, we develop a syntactically simple framework that is computationally more efficient than a complex, grammar-based approach. In the proposed framework, a convolutional neural network is trained to provide accurate hand gesture recognition; subsequently, a finite-state machine-based deterministic model performs efficient gesture-to-instruction map**, and further improves robustness of the interaction scheme. The key aspect of this framework is that it can be easily adopted by divers for communicating simple instructions to underwater robots without using artificial tags such as fiducial markers, or requiring them to memorize a potentially complex set of language rules. Extensive experiments are performed both on field-trial data and through simulation, which demonstrate the robustness, efficiency, and portability of this framework in a number of different scenarios. Finally, a user interaction study is presented that illustrates the gain in usability of our proposed interaction framework compared to the existing methods for underwater domains. △ Less

Submitted 20 February, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

arXiv:1709.08292 [pdf, other]

Underwater Multi-Robot Convoying using Visual Tracking by Detection

Authors: Florian Shkurti, Wei-Di Chang, Peter Henderson, Md Jahidul Islam, Juan Camilo Gamboa Higuera, Jimmy Li, Travis Manderson, Anqi Xu, Gregory Dudek, Junaed Sattar

Abstract: We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitiga… ▽ More We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot's independent motion. △ Less

Submitted 24 September, 2017; originally announced September 2017.

Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

arXiv:1701.08449 [pdf, other]

SafeDrive: A Robust Lane Tracking System for Autonomous and Assisted Driving Under Limited Visibility

Authors: Junaed Sattar, Jiawei Mo

Abstract: We present an approach towards robust lane tracking for assisted and autonomous driving, particularly under poor visibility. Autonomous detection of lane markers improves road safety, and purely visual tracking is desirable for widespread vehicle compatibility and reducing sensor intrusion, cost, and energy consumption. However, visual approaches are often ineffective because of a number of factor… ▽ More We present an approach towards robust lane tracking for assisted and autonomous driving, particularly under poor visibility. Autonomous detection of lane markers improves road safety, and purely visual tracking is desirable for widespread vehicle compatibility and reducing sensor intrusion, cost, and energy consumption. However, visual approaches are often ineffective because of a number of factors, including but not limited to occlusion, poor weather conditions, and paint wear-off. Our method, named SafeDrive, attempts to improve visual lane detection approaches in drastically degraded visual conditions without relying on additional active sensors. In scenarios where visual lane detection algorithms are unable to detect lane markers, the proposed approach uses location information of the vehicle to locate and access alternate imagery of the road and attempts detection on this secondary image. Subsequently, by using a combination of feature-based and pixel-based alignment, an estimated location of the lane marker is found in the current scene. We demonstrate the effectiveness of our system on actual driving data from locations in the United States with Google Street View as the source of alternate imagery. △ Less

Submitted 29 January, 2017; originally announced January 2017.

Showing 1–43 of 43 results for author: Sattar, J