Search | arXiv e-print repository

arXiv:2407.02099 [pdf, other]

Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior

Authors: Pedro Henrique Luz de Araujo, Benjamin Roth

Abstract: One way to personalize and steer generations from large language models (LLM) is to assign a persona: a role that describes how the user expects the LLM to behave (e.g., a helpful assistant, a teacher, a woman). This paper investigates how personas affect diverse aspects of model behavior. We assign to seven LLMs 162 personas from 12 categories spanning variables like gender, sexual orientation, a… ▽ More One way to personalize and steer generations from large language models (LLM) is to assign a persona: a role that describes how the user expects the LLM to behave (e.g., a helpful assistant, a teacher, a woman). This paper investigates how personas affect diverse aspects of model behavior. We assign to seven LLMs 162 personas from 12 categories spanning variables like gender, sexual orientation, and occupation. We prompt them to answer questions from five datasets covering objective (e.g., questions about math and history) and subjective tasks (e.g., questions about beliefs and values). We also compare persona's generations to two baseline settings: a control persona setting with 30 paraphrases of "a helpful assistant" to control for models' prompt sensitivity, and an empty persona setting where no persona is assigned. We find that for all models and datasets, personas show greater variability than the control setting and that some measures of persona behavior generalize across models. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 20 pages, 12 figures

arXiv:2406.18589 [pdf, other]

Text-Guided Alternative Image Clustering

Authors: Andreas Stephan, Lukas Miklautz, Collin Leiber, Pedro Henrique Luz de Araujo, Dominik Répás, Claudia Plant, Benjamin Roth

Abstract: Traditional image clustering techniques only find a single grou** within visual data. In particular, they do not provide a possibility to explicitly define multiple types of clustering. This work explores the potential of large vision-language models to facilitate alternative image clustering. We propose Text-Guided Alternative Image Consensus Clustering (TGAICC), a novel approach that leverages… ▽ More Traditional image clustering techniques only find a single grou** within visual data. In particular, they do not provide a possibility to explicitly define multiple types of clustering. This work explores the potential of large vision-language models to facilitate alternative image clustering. We propose Text-Guided Alternative Image Consensus Clustering (TGAICC), a novel approach that leverages user-specified interests via prompts to guide the discovery of diverse clusterings. To achieve this, it generates a clustering for each prompt, groups them using hierarchical clustering, and then aggregates them using consensus clustering. TGAICC outperforms image- and text-based baselines on four alternative image clustering benchmark datasets. Furthermore, using count-based word statistics, we are able to obtain text-based explanations of the alternative clusterings. In conclusion, our research illustrates how contemporary large vision-language models can transform explanatory data analysis, enabling the generation of insightful, customizable, and diverse image clusterings. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.03004 [pdf, other]

Exploring prompts to elicit memorization in masked language model-based named entity recognition

Authors: Yuxi Xia, Anastasiia Sedova, Pedro Henrique Luz de Araujo, Vasiliki Kougia, Lisa Nußbaumer, Benjamin Roth

Abstract: Training data memorization in language models impacts model capability (generalization) and safety (privacy risk). This paper focuses on analyzing prompts' impact on detecting the memorization of 6 masked language model-based named entity recognition models. Specifically, we employ a diverse set of 400 automatically generated prompts, and a pairwise dataset where each pair consists of one person's… ▽ More Training data memorization in language models impacts model capability (generalization) and safety (privacy risk). This paper focuses on analyzing prompts' impact on detecting the memorization of 6 masked language model-based named entity recognition models. Specifically, we employ a diverse set of 400 automatically generated prompts, and a pairwise dataset where each pair consists of one person's name from the training set and another name out of the set. A prompt completed with a person's name serves as input for getting the model's confidence in predicting this name. Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set. We show that the performance of different prompts varies by as much as 16 percentage points on the same model, and prompt engineering further increases the gap. Moreover, our experiments demonstrate that prompt performance is model-dependent but does generalize across different name sets. A comprehensive analysis indicates how prompt performance is influenced by prompt properties, contained tokens, and the model's self-attention weights on the prompt. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2403.08425 [pdf, other]

Specification Overfitting in Artificial Intelligence

Authors: Benjamin Roth, Pedro Henrique Luz de Araujo, Yuxi Xia, Saskia Kaltenbrunner, Christoph Korab

Abstract: Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, i… ▽ More Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification metrics in several AI fields (e.g., natural language processing, computer vision, reinforcement learning). Using a keyword-based search on papers from major AI conferences and journals between 2018 and mid-2023, we identify and analyze 74 papers that propose or optimize specification metrics. We find that although most papers implicitly address specification overfitting (e.g., by reporting more than one specification metric), they rarely discuss which role specification metrics should play in system development or explicitly define the scope and assumptions behind metric formulations. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 40 pages, 2 figures

arXiv:2402.07586 [pdf, other]

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Authors: Teresa Salazar, João Gama, Helder Araújo, Pedro Henriques Abreu

Abstract: In the evolving field of machine learning, ensuring fairness has become a critical concern, prompting the development of algorithms designed to mitigate discriminatory outcomes in decision-making processes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept… ▽ More In the evolving field of machine learning, ensuring fairness has become a critical concern, prompting the development of algorithms designed to mitigate discriminatory outcomes in decision-making processes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time while another does not, leading to a decrease in fairness even if accuracy remains fairly stable. Within the framework of federated learning, where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. One of the significant contributions of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the realm of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift which utilizes a multi-model approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning. △ Less

Submitted 13 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

MSC Class: 68T01 ACM Class: I.2.m

arXiv:2311.08481 [pdf, other]

Functionality learning through specification instructions

Authors: Pedro Henrique Luz de Araujo, Benjamin Roth

Abstract: Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. They enable fine-grained evaluations of model aspects that would otherwise go unnoticed in standard evaluation datasets, but they do not address the problem of how to fix the failure cases. Previous work has e… ▽ More Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. They enable fine-grained evaluations of model aspects that would otherwise go unnoticed in standard evaluation datasets, but they do not address the problem of how to fix the failure cases. Previous work has explored functionality learning by fine-tuning models on suite data. While this improves performance on seen functionalities, it often does not generalize to unseen ones and can harm general performance. This paper analyses a fine-tuning-free approach to functionality learning. For each functionality in a suite, we generate a specification instruction that encodes it. We combine the obtained specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data to generate suite predictions. A core aspect of our analysis is to measure the effect that including a set of specifications has on a held-out set of unseen, qualitatively different specifications. Our experiments across four tasks and models ranging from 80M to 175B parameters show that smaller models struggle to follow specification instructions. However, larger models (> 3B params.) can benefit from specifications and even generalize desirable behaviors across functionalities. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 33 pages, 8 figures

arXiv:2305.12951 [pdf, other]

doi 10.1162/tacl_a_00590

Cross-functional Analysis of Generalisation in Behavioural Learning

Authors: Pedro Henrique Luz de Araujo, Benjamin Roth

Abstract: In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. Ho… ▽ More In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of the behavioural test suite controlled to leave out specific phenomena. An aggregate score measures generalisation to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification and reading comprehension) and compare the impact of a diverse set of regularisation and domain generalisation methods on generalisation performance. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 16 pages, 1 figure. To be published in the Transactions of the Association for Computational Linguistics (TACL). This preprint is a pre-MIT Press publication version

Journal ref: Transactions of the Association for Computational Linguistics 11, 2023, 1066-1081

arXiv:2210.15365 [pdf, other]

Li3DeTr: A LiDAR based 3D Detection Transformer

Authors: Gopi Krishna Erabati, Helder Araujo

Abstract: Inspired by recent advances in vision transformers for object detection, we propose Li3DeTr, an end-to-end LiDAR based 3D Detection Transformer for autonomous driving, that inputs LiDAR point clouds and regresses 3D bounding boxes. The LiDAR local and global features are encoded using sparse convolution and multi-scale deformable attention respectively. In the decoder head, firstly, in the novel L… ▽ More Inspired by recent advances in vision transformers for object detection, we propose Li3DeTr, an end-to-end LiDAR based 3D Detection Transformer for autonomous driving, that inputs LiDAR point clouds and regresses 3D bounding boxes. The LiDAR local and global features are encoded using sparse convolution and multi-scale deformable attention respectively. In the decoder head, firstly, in the novel Li3DeTr cross-attention block, we link the LiDAR global features to 3D predictions leveraging the sparse set of object queries learnt from the data. Secondly, the object query interactions are formulated using multi-head self-attention. Finally, the decoder layer is repeated $L_{dec}$ number of times to refine the object queries. Inspired by DETR, we employ set-to-set loss to train the Li3DeTr network. Without bells and whistles, the Li3DeTr network achieves 61.3% mAP and 67.6% NDS surpassing the state-of-the-art methods with non-maximum suppression (NMS) on the nuScenes dataset and it also achieves competitive performance on the KITTI dataset. We also employ knowledge distillation (KD) using a teacher and student model that slightly improves the performance of our network. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2210.15316 [pdf, other]

MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving

Authors: Gopi Krishna Erabati, Helder Araujo

Abstract: 3D object detection is a significant task for autonomous driving. Recently with the progress of vision transformers, the 2D object detection problem is being treated with the set-to-set loss. Inspired by these approaches on 2D object detection and an approach for multi-view 3D object detection DETR3D, we propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and… ▽ More 3D object detection is a significant task for autonomous driving. Recently with the progress of vision transformers, the 2D object detection problem is being treated with the set-to-set loss. Inspired by these approaches on 2D object detection and an approach for multi-view 3D object detection DETR3D, we propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy. Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes. Firstly, we link the object queries learnt from data to the image and LiDAR features using a novel MSF3DDETR cross-attention block. Secondly, the object queries interacts with each other in multi-head self-attention block. Finally, MSF3DDETR block is repeated for $L$ number of times to refine the object queries. The MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR. We present both quantitative and qualitative results which are competitive to the state-of-the-art approaches. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted at the ICPR 2022 Workshop DLVDR2022

arXiv:2209.13678 [pdf, other]

doi 10.1007/978-3-031-35995-8_37

FAIR-FATE: Fair Federated Learning with Momentum

Authors: Teresa Salazar, Miguel Fernandes, Helder Araujo, Pedro Henriques Abreu

Abstract: While fairness-aware machine learning algorithms have been receiving increasing attention, the focus has been on centralized machine learning, leaving decentralized methods underexplored. Federated Learning is a decentralized form of machine learning where clients train local models with a server aggregating them to obtain a shared global model. Data heterogeneity amongst clients is a common chara… ▽ More While fairness-aware machine learning algorithms have been receiving increasing attention, the focus has been on centralized machine learning, leaving decentralized methods underexplored. Federated Learning is a decentralized form of machine learning where clients train local models with a server aggregating them to obtain a shared global model. Data heterogeneity amongst clients is a common characteristic of Federated Learning, which may induce or exacerbate discrimination of unprivileged groups defined by sensitive attributes such as race or gender. In this work we propose FAIR-FATE: a novel FAIR FederATEd Learning algorithm that aims to achieve group fairness while maintaining high utility via a fairness-aware aggregation method that computes the global model by taking into account the fairness of the clients. To achieve that, the global model update is computed by estimating a fair model update using a Momentum term that helps to overcome the oscillations of non-fair gradients. To the best of our knowledge, this is the first approach in machine learning that aims to achieve fairness using a fair Momentum estimate. Experimental results on real-world datasets demonstrate that FAIR-FATE outperforms state-of-the-art fair Federated Learning algorithms under different levels of data heterogeneity. △ Less

Submitted 2 July, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in ICCS 2023 - Lecture Notes in Computer Science, vol 14073, Springer, and is available online at https://doi.org/10.1007/978-3-031-35995-8_37

MSC Class: 68T07 ACM Class: I.2.m

Journal ref: Computational Science - ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073. Springer, Cham

arXiv:2207.00748 [pdf, other]

doi 10.1007/s10032-022-00406-7

Sequence-aware multimodal page classification of Brazilian legal documents

Authors: Pedro H. Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio A. Braz, Nilton C. da Silva, Flavio de Barros Vidal, Teofilo E. de Campos

Abstract: The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate ou… ▽ More The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages. △ Less

Submitted 15 July, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 11 pages, 6 figures. This preprint, which was originally written on 8 April 2021, has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in the International Journal on Document Analysis and Recognition, and is available online at https://doi.org/10.1007/s10032-022-00406-7 and https://rdcu.be/cRvvV

Journal ref: International Journal on Document Analysis and Recognition.2022

arXiv:2204.04042 [pdf, other]

doi 10.18653/v1/2022.nlppower-1.8

Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection

Authors: Pedro Henrique Luz de Araujo, Benjamin Roth

Abstract: Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage t… ▽ More Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage them for model training and development. With this in mind, we explore behaviour-aware learning by examining several fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems. To address potential pitfalls of training on data originally intended for evaluation, we train and evaluate models on different configurations of HateCheck by holding out categories of test cases, which enables us to estimate performance on potentially overlooked system properties. The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups, suggesting that models can potentially generalise to overlooked functionalities. However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class and that the procedure led to overfitting to the HateCheck data distribution. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 9 pages, 5 figures. Accepted at the First Workshop on Efficient Benchmarking in NLP (NLP Power!)

Journal ref: In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, 2022, pages 75-83, Dublin, Ireland. Association for Computational Linguistics

arXiv:2006.16670 [pdf, other]

EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner

Authors: Kutsev Bengisu Ozyoruk, Guliz Irem Gokceler, Gulfize Coskun, Kagan Incetan, Yasin Almalioglu, Faisal Mahmood, Eva Curto, Luis Perdigoto, Marina Oliveira, Hasan Sahin, Helder Araujo, Henrique Alexandrino, Nicholas J. Durr, Hunter B. Gilbert, Mehmet Turan

Abstract: Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings as well as… ▽ More Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings as well as synthetically generated data. A Panda robotic arm, two commercially available capsule endoscopes, two conventional endoscopes with different camera properties, and two high precision 3D scanners were employed to collect data from 8 ex-vivo porcine gastrointestinal (GI)-tract organs. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-dataset for colon, 12 sub-datasets for stomach and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. Synthetic capsule endoscopy frames from GI-tract with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible through https://www.youtube.com/watch?v=G_LCe0aWWdQ. △ Less

Submitted 1 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

Comments: 27 pages, 16 figures

arXiv:1803.01048 [pdf, other]

Magnetic-Visual Sensor Fusion-based Dense 3D Reconstruction and Localization for Endoscopic Capsule Robots

Authors: Mehmet Turan, Yasin Almalioglu, Evin Pinar Ornek, Helder Araujo, Mehmet Fatih Yanik, Metin Sitti

Abstract: Reliable and real-time 3D reconstruction and localization functionality is a crucial prerequisite for the navigation of actively controlled capsule endoscopic robots as an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a fully dense, non-rigidly deformable, strictly real-time, intraoperative map fusion approa… ▽ More Reliable and real-time 3D reconstruction and localization functionality is a crucial prerequisite for the navigation of actively controlled capsule endoscopic robots as an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a fully dense, non-rigidly deformable, strictly real-time, intraoperative map fusion approach for actively controlled endoscopic capsule robot applications which combines magnetic and vision-based localization, with non-rigid deformations based frame-to-model map fusion. The performance of the proposed method is demonstrated using four different ex-vivo porcine stomach models. Across different trajectories of varying speed and complexity, and four different endoscopic cameras, the root mean square surface reconstruction errors 1.58 to 2.17 cm. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: submitted to IROS 2018

arXiv:1709.06451 [pdf, other]

3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images

Authors: Tiago Dias, Helder Araujo, Pedro Miraldo

Abstract: In this paper we analyze and compare approaches for 3D reconstruction from low-resolution (250x250), high radial distortion stereo images, which are acquired with small baseline (approximately 1mm). These images are acquired with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo cameras have also small apertures, which means that high levels of illumination are required. The goa… ▽ More In this paper we analyze and compare approaches for 3D reconstruction from low-resolution (250x250), high radial distortion stereo images, which are acquired with small baseline (approximately 1mm). These images are acquired with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo cameras have also small apertures, which means that high levels of illumination are required. The goal was to develop an approach yielding accurate reconstructions, with a low computational cost, i.e., avoiding non-linear numerical optimization algorithms. In particular we focused on the analysis and comparison of radial distortion models. To perform the analysis and comparison, we defined a baseline method based on available software and methods, such as the Bouguet toolbox [2] or the Computer Vision Toolbox from Matlab. The approaches tested were based on the use of the polynomial model of radial distortion, and on the application of the division model. The issue of the center of distortion was also addressed within the framework of the application of the division model. We concluded that the division model with a single radial distortion parameter has limitations. △ Less

Submitted 19 September, 2017; originally announced September 2017.

Journal ref: ACM Int'l Conf. Distributed Smart Cameras (ICDSC), 2016

arXiv:1709.03401 [pdf, other]

EndoSensorFusion: Particle Filtering-Based Multi-sensory Data Fusion with Switching State-Space Model for Endoscopic Capsule Robots

Authors: Mehmet Turan, Yasin Almalioglu, Hunter Gilbert, Helder Araujo, Taylan Cemgil, Metin Sitti

Abstract: A reliable, real time multi-sensor fusion functionality is crucial for localization of actively controlled capsule endoscopy robots, which are an emerging, minimally invasive diagnostic and therapeutic technology for the gastrointestinal (GI) tract. In this study, we propose a novel multi-sensor fusion approach based on a particle filter that incorporates an online estimation of sensor reliability… ▽ More A reliable, real time multi-sensor fusion functionality is crucial for localization of actively controlled capsule endoscopy robots, which are an emerging, minimally invasive diagnostic and therapeutic technology for the gastrointestinal (GI) tract. In this study, we propose a novel multi-sensor fusion approach based on a particle filter that incorporates an online estimation of sensor reliability and a non-linear kinematic model learned by a recurrent neural network. Our method sequentially estimates the true robot pose from noisy pose observations delivered by multiple sensors. We experimentally test the method using 5 degree-of-freedom (5-DoF) absolute pose measurement by a magnetic localization system and a 6-DoF relative pose measurement by visual odometry. In addition, the proposed method is capable of detecting and handling sensor failures by ignoring corrupted data, providing the robustness expected of a medical device. Detailed analyses and evaluations are presented using ex-vivo experiments on a porcine stomach model prove that our system achieves high translational and rotational accuracies for different types of endoscopic capsule robot trajectories. △ Less

Submitted 25 September, 2017; v1 submitted 8 September, 2017; originally announced September 2017.

Comments: submitted to ICRA 2018. arXiv admin note: text overlap with arXiv:1705.06196

arXiv:1708.09740 [pdf, other]

doi 10.1007/s00138-017-0905-8

Sparse-then-Dense Alignment based 3D Map Reconstruction Method for Endoscopic Capsule Robots

Authors: Mehmet Turan, Yusuf Yigit Pilavci, Ipek Ganiyusufoglu, Helder Araujo, Ender Konukoglu, Metin Sitti

Abstract: Since the development of capsule endoscopcy technology, substantial progress were made in converting passive capsule endoscopes to robotic active capsule endoscopes which can be controlled by the doctor. However, robotic capsule endoscopy still has some challenges. In particular, the use of such devices to generate a precise and globally consistent three-dimensional (3D) map of the entire inner or… ▽ More Since the development of capsule endoscopcy technology, substantial progress were made in converting passive capsule endoscopes to robotic active capsule endoscopes which can be controlled by the doctor. However, robotic capsule endoscopy still has some challenges. In particular, the use of such devices to generate a precise and globally consistent three-dimensional (3D) map of the entire inner organ remains an unsolved problem. Such global 3D maps of inner organs would help doctors to detect the location and size of diseased areas more accurately, precisely, and intuitively, thus permitting more accurate and intuitive diagnoses. The proposed 3D reconstruction system is built in a modular fashion including preprocessing, frame stitching, and shading-based 3D reconstruction modules. We propose an efficient scheme to automatically select the key frames out of the huge quantity of raw endoscopic images. Together with a bundle fusion approach that aligns all the selected key frames jointly in a globally consistent way, a significant improvement of the mosaic and 3D map accuracy was reached. To the best of our knowledge, this framework is the first complete pipeline for an endoscopic capsule robot based 3D map reconstruction containing all of the necessary steps for a reliable and accurate endoscopic 3D map. For the qualitative evaluations, a real pig stomach is employed. Moreover, for the first time in literature, a detailed and comprehensive quantitative analysis of each proposed pipeline modules is performed using a non-rigid esophagus gastro duodenoscopy simulator, four different endoscopic cameras, a magnetically activated soft capsule robot (MASCE), a sub-millimeter precise optical motion tracker and a fine-scale 3D optical scanner. △ Less

Submitted 29 August, 2017; originally announced August 2017.

Comments: arXiv admin note: text overlap with arXiv:1705.06524

arXiv:1708.06822 [pdf, other]

doi 10.1016/j.neucom.2017.10.014

Deep EndoVO: A Recurrent Convolutional Neural Network (RCNN) based Visual Odometry Approach for Endoscopic Capsule Robots

Authors: Mehmet Turan, Yasin Almalioglu, Helder Araujo, Ender Konukoglu, Metin Sitti

Abstract: Ingestible wireless capsule endoscopy is an emerging minimally invasive diagnostic technology for inspection of the GI tract and diagnosis of a wide range of diseases and pathologies. Medical device companies and many research groups have recently made substantial progresses in converting passive capsule endoscopes to active capsule robots, enabling more accurate, precise, and intuitive detection… ▽ More Ingestible wireless capsule endoscopy is an emerging minimally invasive diagnostic technology for inspection of the GI tract and diagnosis of a wide range of diseases and pathologies. Medical device companies and many research groups have recently made substantial progresses in converting passive capsule endoscopes to active capsule robots, enabling more accurate, precise, and intuitive detection of the location and size of the diseased areas. Since a reliable real time pose estimation functionality is crucial for actively controlled endoscopic capsule robots, in this study, we propose a monocular visual odometry (VO) method for endoscopic capsule robot operations. Our method lies on the application of the deep Recurrent Convolutional Neural Networks (RCNNs) for the visual odometry task, where Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are used for the feature extraction and inference of dynamics across the frames, respectively. Detailed analyses and evaluations made on a real pig stomach dataset proves that our system achieves high translational and rotational accuracies for different types of endoscopic capsule robot trajectories. △ Less

Submitted 8 September, 2017; v1 submitted 22 August, 2017; originally announced August 2017.

arXiv:1705.06524 [pdf, other]

A fully dense and globally consistent 3D map reconstruction approach for GI tract to enhance therapeutic relevance of the endoscopic capsule robot

Authors: Mehmet Turan, Yusuf Yigit Pilavci, Redhwan Jamiruddin, Helder Araujo, Ender Konukoglu, Metin Sitti

Abstract: In the gastrointestinal (GI) tract endoscopy field, ingestible wireless capsule endoscopy is emerging as a novel, minimally invasive diagnostic technology for inspection of the GI tract and diagnosis of a wide range of diseases and pathologies. Since the development of this technology, medical device companies and many research groups have made substantial progress in converting passive capsule en… ▽ More In the gastrointestinal (GI) tract endoscopy field, ingestible wireless capsule endoscopy is emerging as a novel, minimally invasive diagnostic technology for inspection of the GI tract and diagnosis of a wide range of diseases and pathologies. Since the development of this technology, medical device companies and many research groups have made substantial progress in converting passive capsule endoscopes to robotic active capsule endoscopes with most of the functionality of current active flexible endoscopes. However, robotic capsule endoscopy still has some challenges. In particular, the use of such devices to generate a precise three-dimensional (3D) map** of the entire inner organ remains an unsolved problem. Such global 3D maps of inner organs would help doctors to detect the location and size of diseased areas more accurately and intuitively, thus permitting more reliable diagnoses. To our knowledge, this paper presents the first complete pipeline for a complete 3D visual map reconstruction of the stomach. The proposed pipeline is modular and includes a preprocessing module, an image registration module, and a final shape-from-shading-based 3D reconstruction module; the 3D map is primarily generated by a combination of image stitching and shape-from-shading techniques, and is updated in a frame-by-frame iterative fashion via capsule motion inside the stomach. A comprehensive quantitative analysis of the proposed 3D reconstruction method is performed using an esophagus gastro duodenoscopy simulator, three different endoscopic cameras, and a 3D optical scanner. △ Less

Submitted 18 May, 2017; originally announced May 2017.

arXiv:1705.06196 [pdf, other]

Magnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule Robot

Authors: Mehmet Turan, Yasin Almalioglu, Hunter Gilbert, Helder Araujo, Ender Konukoglu, Metin Sitti

Abstract: A reliable, real-time simultaneous localization and map** (SLAM) method is crucial for the navigation of actively controlled capsule endoscopy robots. These robots are an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a dense, non-rigidly deformable, and real-time map fusion approach for actively controlled… ▽ More A reliable, real-time simultaneous localization and map** (SLAM) method is crucial for the navigation of actively controlled capsule endoscopy robots. These robots are an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a dense, non-rigidly deformable, and real-time map fusion approach for actively controlled endoscopic capsule robot applications. The method combines magnetic and vision based localization, and makes use of frame-to-model fusion and model-to-model loop closure. The performance of the method is demonstrated using an ex-vivo porcine stomach model. Across four trajectories of varying speed and complexity, and across three cameras, the root mean square localization errors range from 0.42 to 1.92 cm, and the root mean square surface reconstruction errors range from 1.23 to 2.39 cm. △ Less

Submitted 5 November, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

arXiv:1705.05444 [pdf, other]

doi 10.1007/s41315-017-0036-4

A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots

Authors: Mehmet Turan, Yasin Almalioglu, Helder Araujo, Ender Konukoglu, Metin Sitti

Abstract: In the gastrointestinal (GI) tract endoscopy field, ingestible wireless capsule endoscopy is considered as a minimally invasive novel diagnostic technology to inspect the entire GI tract and to diagnose various diseases and pathologies. Since the development of this technology, medical device companies and many groups have made significant progress to turn such passive capsule endoscopes into robo… ▽ More In the gastrointestinal (GI) tract endoscopy field, ingestible wireless capsule endoscopy is considered as a minimally invasive novel diagnostic technology to inspect the entire GI tract and to diagnose various diseases and pathologies. Since the development of this technology, medical device companies and many groups have made significant progress to turn such passive capsule endoscopes into robotic active capsule endoscopes to achieve almost all functions of current active flexible endoscopes. However, the use of robotic capsule endoscopy still has some challenges. One such challenge is the precise localization of such active devices in 3D world, which is essential for a precise three-dimensional (3D) map** of the inner organ. A reliable 3D map of the explored inner organ could assist the doctors to make more intuitive and correct diagnosis. In this paper, we propose to our knowledge for the first time in literature a visual simultaneous localization and map** (SLAM) method specifically developed for endoscopic capsule robots. The proposed RGB-Depth SLAM method is capable of capturing comprehensive dense globally consistent surfel-based maps of the inner organs explored by an endoscopic capsule robot in real time. This is achieved by using dense frame-to-model camera tracking and windowed surfelbased fusion coupled with frequent model refinement through non-rigid surface deformations. △ Less

Submitted 15 May, 2017; originally announced May 2017.

arXiv:1602.05990 [pdf, ps, other]

Plücker Correction Problem: Analysis and Improvements in Efficiency

Authors: João R. Cardoso, Pedro Miraldo, Helder Araujo

Abstract: A given six dimensional vector represents a 3D straight line in Plucker coordinates if its coordinates satisfy the Klein quadric constraint. In many problems aiming to find the Plucker coordinates of lines, noise in the data and other type of errors contribute for obtaining 6D vectors that do not correspond to lines, because of that constraint. A common procedure to overcome this drawback is to… ▽ More A given six dimensional vector represents a 3D straight line in Plucker coordinates if its coordinates satisfy the Klein quadric constraint. In many problems aiming to find the Plucker coordinates of lines, noise in the data and other type of errors contribute for obtaining 6D vectors that do not correspond to lines, because of that constraint. A common procedure to overcome this drawback is to find the Plucker coordinates of the lines that are closest to those vectors. This is known as the Plucker correction problem. In this article we propose a simple, closed-form, and global solution for this problem. When compared with the state-of-the-art method, one can conclude that our algorithm is easier and requires much less operations than previous techniques (it does not require Singular Value Decomposition techniques). △ Less

Submitted 18 February, 2016; originally announced February 2016.

Showing 1–22 of 22 results for author: Araujo, H