Search | arXiv e-print repository

doi 10.1117/12.3006855

Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation

Authors: Marcos Fernández-Rodríguez, Bruno Silva, Sandro Queirós, Helena R. Torres, Bruno Oliveira, Pedro Morais, Lukas R. Buschle, Jorge Correia-Pinto, Estevão Lima, João L. Vilaça

Abstract: Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including it… ▽ More Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Journal ref: Proceedings Volume 12928, Medical Imaging 2024: Image-Guided Procedures, Robotic Interventions, and Modeling; 1292827 (2024)

arXiv:2403.05756 [pdf, other]

Model-Free Local Recalibration of Neural Networks

Authors: R. Torres, D. J. Nott, S. A. Sisson, T. Rodrigues, J. G. Reis, G. S. Rodrigues

Abstract: Artificial neural networks (ANNs) are highly flexible predictive models. However, reliably quantifying uncertainty for their predictions is a continuing challenge. There has been much recent work on "recalibration" of predictive distributions for ANNs, so that forecast probabilities for events of interest are consistent with certain frequency evaluations of them. Uncalibrated probabilistic forecas… ▽ More Artificial neural networks (ANNs) are highly flexible predictive models. However, reliably quantifying uncertainty for their predictions is a continuing challenge. There has been much recent work on "recalibration" of predictive distributions for ANNs, so that forecast probabilities for events of interest are consistent with certain frequency evaluations of them. Uncalibrated probabilistic forecasts are of limited use for many important decision-making tasks. To address this issue, we propose a localized recalibration of ANN predictive distributions using the dimension-reduced representation of the input provided by the ANN hidden layers. Our novel method draws inspiration from recalibration techniques used in the literature on approximate Bayesian computation and likelihood-free inference methods. Most existing calibration methods for ANNs can be thought of as calibrating either on the input layer, which is difficult when the input is high-dimensional, or the output layer, which may not be sufficiently flexible. Through a simulation study, we demonstrate that our method has good performance compared to alternative approaches, and explore the benefits that can be achieved by localizing the calibration based on different layers of the network. Finally, we apply our proposed method to a diamond price prediction problem, demonstrating the potential of our approach to improve prediction and uncertainty quantification in real-world applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 25 pages, 5 figures

MSC Class: 62G07 (Primary); 68T07; 68T37 (Secondary); 68Q10 ACM Class: G.3; I.5.1; I.6.4

arXiv:2309.13920 [pdf]

Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular Expressions

Authors: Alberto Pacheco-Gonzalez, Raymundo Torres, Raul Chacon, Isidro Robledo

Abstract: In emergency situations, the high-speed movement of an ambulance through the city streets can be hindered by vehicular traffic. This work presents a method for detecting emergency vehicle sirens in real time. To obtain the audio fingerprint of a Hi-Lo siren, DSP and signal symbolization techniques were applied, which were contrasted against an audio classifier based on a deep neural network, using… ▽ More In emergency situations, the high-speed movement of an ambulance through the city streets can be hindered by vehicular traffic. This work presents a method for detecting emergency vehicle sirens in real time. To obtain the audio fingerprint of a Hi-Lo siren, DSP and signal symbolization techniques were applied, which were contrasted against an audio classifier based on a deep neural network, using the same 280 audios of ambient sounds and 52 Hi-Lo siren audios dataset. In both methods, some classification accuracy metrics were evaluated based on its confusion matrix, resulting in the DSP algorithm having a slightly lower accuracy than the DNN model, however, it offers a self-explanatory, adjustable, portable, high performance and lower energy and consumption that makes it a more viable lower cost ADAS implementation to identify Hi-Lo sirens in real time. △ Less

Submitted 23 June, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: in Spanish language

ACM Class: I.5.5

Journal ref: Revista Electro, Vol. 45, pp. 184-189, 2023

arXiv:2308.05254 [pdf, other]

Data-driven Intra-Autonomous Systems Graph Generator

Authors: Caio Vinicius Dadauto, Nelson Luis Saldanha da Fonseca, Ricardo da Silva Torres

Abstract: Accurate modeling of realistic network topologies is essential for evaluating novel Internet solutions. Current topology generators, notably scale-free-based models, fail to capture multiple properties of intra-AS topologies. While scale-free networks encode node-degree distribution, they overlook crucial graph properties like betweenness, clustering, and assortativity. The limitations of existing… ▽ More Accurate modeling of realistic network topologies is essential for evaluating novel Internet solutions. Current topology generators, notably scale-free-based models, fail to capture multiple properties of intra-AS topologies. While scale-free networks encode node-degree distribution, they overlook crucial graph properties like betweenness, clustering, and assortativity. The limitations of existing generators pose challenges for training and evaluating deep learning models in communication networks, emphasizing the need for advanced topology generators encompassing diverse Internet topology characteristics. This paper introduces a novel deep-learning-based generator of synthetic graphs representing intra-autonomous in the Internet, named Deep-Generative Graphs for the Internet (DGGI). It also presents a novel massive dataset of real intra-AS graphs extracted from the project ITDK, called IGraphs. It is shown that DGGI creates synthetic graphs that accurately reproduce the properties of centrality, clustering, assortativity, and node degree. The DGGI generator overperforms existing Internet topology generators. On average, DGGI improves the MMD metric $84.4\%$, $95.1\%$, $97.9\%$, and $94.7\%$ for assortativity, betweenness, clustering, and node degree, respectively. △ Less

Submitted 26 February, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: 14 pages, 15 figures

arXiv:2308.01666 [pdf, other]

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

Authors: Ivo S. Fins, Heather Davies, Sean Farrell, Jose R. Torres, Gina Pinchbeck, Alan D. Radford, Peter-John Noble

Abstract: Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either Reg… ▽ More Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large language models create diverse opportunities and, whilst complex, present an intuitive interface to information but require careful implementation to avoid unpredictable errors. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: Supplementary Material: The data that support the findings of this study are available in the ancillary files of this submission. 5 pages, 2 figures (textboxes)

arXiv:2306.02172 [pdf, other]

On the Generalized Mean Densest Subgraph Problem: Complexity and Algorithms

Authors: Chandra Chekuri, Manuel R. Torres

Abstract: Dense subgraph discovery is an important problem in graph mining and network analysis with several applications. Two canonical problems here are to find a maxcore (subgraph of maximum min degree) and to find a densest subgraph (subgraph of maximum average degree). Both of these problems can be solved in polynomial time. Veldt, Benson, and Kleinberg [VBK21] introduced the generalized $p$-mean dense… ▽ More Dense subgraph discovery is an important problem in graph mining and network analysis with several applications. Two canonical problems here are to find a maxcore (subgraph of maximum min degree) and to find a densest subgraph (subgraph of maximum average degree). Both of these problems can be solved in polynomial time. Veldt, Benson, and Kleinberg [VBK21] introduced the generalized $p$-mean densest subgraph problem which captures the maxcore problem when $p=-\infty$ and the densest subgraph problem when $p=1$. They observed that the objective leads to a supermodular function when $p \ge 1$ and hence can be solved in polynomial time; for this case, they also developed a simple greedy peeling algorithm with a bounded approximation ratio. In this paper, we make several contributions. First, we prove that for any $p \in (-\frac{1}{8}, 0) \cup (0, \frac{1}{4})$ the problem is NP-Hard and for any $p \in (-3,0) \cup (0,1)$ the weighted version of the problem is NP-Hard, partly resolving a question left open in [VBK21]. Second, we describe two simple $1/2$-approximation algorithms for all $p < 1$, and show that our analysis of these algorithms is tight. For $p > 1$ we develop a fast near-linear time implementation of the greedy peeling algorithm from [VBK21]. This allows us to plug it into the iterative peeling algorithm that was shown to converge to an optimum solution [CQT22]. We demonstrate the efficacy of our algorithms by running extensive experiments on large graphs. Together, our results provide a comprehensive understanding of the complexity of the $p$-mean densest subgraph problem and lead to fast and provably good algorithms for the full range of $p$. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2303.17719 [pdf, other]

Why is the winner the best?

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz , et al. (100 additional authors not shown)

Abstract: International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre… ▽ More International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: accepted to CVPR 2023

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2207.00709 [pdf, other]

Language statistics at different spatial, temporal, and grammatical scales

Authors: Fernanda Sánchez-Puig, Rogelio Lozano-Aranda, Dante Pérez-Méndez, Ewan Colman, Alfredo J. Morales-Guzmán, Carlos Pineda, Pedro Juan Rivera Torres, Carlos Gershenson

Abstract: Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and gra… ▽ More Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and grammatical (from monograms to pentagrams). We find that all three scales are relevant. However, the greatest changes come from variations in the grammatical scale. At the lowest grammatical scale (monograms), the rank diversity curves are most similar, independently on the values of other scales, languages, and countries. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales, as well as on the language and country. We also study the statistics of Twitter-specific tokens: emojis, hashtags, and user mentions. These particular type of tokens show a sigmoid kind of behaviour as a rank diversity function. Our results are helpful to quantify aspects of language statistics that seem universal and what may lead to variations. △ Less

Submitted 26 July, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

arXiv:2204.09573 [pdf]

doi 10.1016/j.media.2023.102833

Fetal Brain Tissue Annotation and Segmentation Challenge Results

Authors: Kelly Payette, Hongwei Li, Priscille de Dumast, Roxane Licandro, Hui Ji, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Hao Liu, Yuchen Pei, Lisheng Wang, Ying Peng, Juanying Xie, Huiquan Zhang, Guiming Dong, Hao Fu, Guotai Wang, ZunHyan Rieu, Donghyeon Kim, Hyun Gi Kim, Davood Karimi, Ali Gholipour, Helena R. Torres, Bruno Oliveira, João L. Vilaça , et al. (33 additional authors not shown)

Abstract: In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the develo** human brain. Automatic segmentation of the develo** fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variabili… ▽ More In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the develo** human brain. Automatic segmentation of the develo** fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variability. Therefore, we organized the Fetal Tissue Annotation (FeTA) Challenge in 2021 in order to encourage the development of automatic segmentation algorithms on an international level. The challenge utilized FeTA Dataset, an open dataset of fetal brain MRI reconstructions segmented into seven different tissues (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, brainstem, deep grey matter). 20 international teams participated in this challenge, submitting a total of 21 algorithms for evaluation. In this paper, we provide a detailed analysis of the results from both a technical and clinical perspective. All participants relied on deep learning methods, mainly U-Nets, with some variability present in the network architecture, optimization, and image pre- and post-processing. The majority of teams used existing medical imaging deep learning frameworks. The main differences between the submissions were the fine tuning done during training, and the specific pre- and post-processing steps performed. The challenge results showed that almost all submissions performed similarly. Four of the top five teams used ensemble learning methods. However, one team's algorithm performed significantly superior to the other submissions, and consisted of an asymmetrical U-Net network architecture. This paper provides a first of its kind benchmark for future automatic multi-tissue segmentation algorithms for the develo** human brain in utero. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: Results from FeTA Challenge 2021, held at MICCAI; Manuscript submitted

arXiv:2204.04746 [pdf, other]

doi 10.1016/j.media.2023.102803

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Authors: Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan, Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang, Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla, Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao , et al. (37 additional authors not shown)

Abstract: Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in… ▽ More Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery. △ Less

Submitted 29 December, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: CholecTriplet2021 challenge report. Paper accepted at Elsevier journal of Medical Image Analysis. 22 pages, 8 figures, 11 tables. Challenge website: https://cholectriplet2021.grand-challenge.org

Journal ref: Medical Image Analysis 86 (2023) 102803

arXiv:2201.06444 [pdf, other]

doi 10.1007/s00521-022-08100-9

Black-box Error Diagnosis in Deep Neural Networks for Computer Vision: a Survey of Tools

Authors: Piero Fraternali, Federico Milani, Rocio Nahime Torres, Niccolò Zangrando

Abstract: The application of Deep Neural Networks (DNNs) to a broad variety of tasks demands methods for co** with the complex and opaque nature of these architectures. When a gold standard is available, performance assessment treats the DNN as a black box and computes standard metrics based on the comparison of the predictions with the ground truth. A deeper understanding of performances requires going b… ▽ More The application of Deep Neural Networks (DNNs) to a broad variety of tasks demands methods for co** with the complex and opaque nature of these architectures. When a gold standard is available, performance assessment treats the DNN as a black box and computes standard metrics based on the comparison of the predictions with the ground truth. A deeper understanding of performances requires going beyond such evaluation metrics to diagnose the model behavior and the prediction errors. This goal can be pursued in two complementary ways. On one side, model interpretation techniques "open the box" and assess the relationship between the input, the inner layers and the output, so as to identify the architecture modules most likely to cause the performance loss. On the other hand, black-box error diagnosis techniques study the correlation between the model response and some properties of the input not used for training, so as to identify the features of the inputs that make the model fail. Both approaches give hints on how to improve the architecture and/or the training process. This paper focuses on the application of DNNs to Computer Vision (CV) tasks and presents a survey of the tools that support the black-box performance diagnosis paradigm. It illustrates the features and gaps of the current proposals, discusses the relevant research directions and provides a brief overview of the diagnosis tools in sectors other than CV. △ Less

Submitted 22 December, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: Published in Springer Neural Computing and Applications, https://link.springer.com/article/10.1007/s00521-022-08100-9

arXiv:2106.06505 [pdf, other]

Efficient Deep Learning Architectures for Fast Identification of Bacterial Strains in Resource-Constrained Devices

Authors: R. Gallardo García, S. Jarquín Rodríguez, B. Beltrán Martínez, C. Hernández Gracidas, R. Martínez Torres

Abstract: This work presents twelve fine-tuned deep learning architectures to solve the bacterial classification problem over the Digital Image of Bacterial Species Dataset. The base architectures were mainly published as mobile or efficient solutions to the ImageNet challenge, and all experiments presented in this work consisted of making several modifications to the original designs, in order to make them… ▽ More This work presents twelve fine-tuned deep learning architectures to solve the bacterial classification problem over the Digital Image of Bacterial Species Dataset. The base architectures were mainly published as mobile or efficient solutions to the ImageNet challenge, and all experiments presented in this work consisted of making several modifications to the original designs, in order to make them able to solve the bacterial classification problem by using fine-tuning and transfer learning techniques. This work also proposes a novel data augmentation technique for this dataset, which is based on the idea of artificial zooming, strongly increasing the performance of every tested architecture, even doubling it in some cases. In order to get robust and complete evaluations, all experiments were performed with 10-fold cross-validation and evaluated with five different metrics: top-1 and top-5 accuracy, precision, recall, and F1 score. This paper presents a complete comparison of the twelve different architectures, cross-validated with the original and the augmented version of the dataset, the results are also compared with several literature methods. Overall, eight of the eleven architectures surpassed the 0.95 scores in top-1 accuracy with our data augmentation method, being 0.9738 the highest top-1 accuracy. The impact of the data augmentation technique is reported with relative improvement scores. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: 22 pages, 2 figures, 5 tables. Submitted to Multimedia Tools and Applications, issue 1218 - Engineering Tools and Applications in Medical Imaging (currently in reviewing process)

MSC Class: 68T07 (Primary); 68U10 (Secondary) ACM Class: I.4; J.3

arXiv:2104.10345 [pdf, other]

doi 10.1109/JSTARS.2021.3094053

Measuring economic activity from space: a case study using flying airplanes and COVID-19

Authors: Mauricio Pamplona Segundo, Allan Pinto, Rodrigo Minetto, Ricardo da Silva Torres, Sudeep Sarkar

Abstract: This work introduces a novel solution to measure economic activity through remote sensing for a wide range of spatial areas. We hypothesized that disturbances in human behavior caused by major life-changing events leave signatures in satellite imagery that allows devising relevant image-based indicators to estimate their impacts and support decision-makers. We present a case study for the COVID-19… ▽ More This work introduces a novel solution to measure economic activity through remote sensing for a wide range of spatial areas. We hypothesized that disturbances in human behavior caused by major life-changing events leave signatures in satellite imagery that allows devising relevant image-based indicators to estimate their impacts and support decision-makers. We present a case study for the COVID-19 coronavirus outbreak, which imposed severe mobility restrictions and caused worldwide disruptions, using flying airplane detection around the 30 busiest airports in Europe to quantify and analyze the lockdown's effects and post-lockdown recovery. Our solution won the Rapid Action Coronavirus Earth observation (RACE) upscaling challenge, sponsored by the European Space Agency and the European Commission, and now integrates the RACE dashboard. This platform combines satellite data and artificial intelligence to promote a progressive and safe reopening of essential activities. Code and CNN models are available at https://github.com/maups/covid19-custom-script-contest △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 11 pages, 11 figures

arXiv:2102.01297 [pdf]

Reinforcement Learning with Probabilistic Boolean Network Models of Smart Grid Devices

Authors: Pedro J. Rivera Torres, Carlos Gershenson García, Samir Kanaan Izquierdo

Abstract: The area of Smart Power Grids needs to constantly improve its efficiency and resilience, to pro-vide high quality electrical power, in a resistant grid, managing faults and avoiding failures. Achieving this requires high component reliability, adequate maintenance, and a studied failure occurrence. Correct system operation involves those activities, and novel methodologies to detect, classify, and… ▽ More The area of Smart Power Grids needs to constantly improve its efficiency and resilience, to pro-vide high quality electrical power, in a resistant grid, managing faults and avoiding failures. Achieving this requires high component reliability, adequate maintenance, and a studied failure occurrence. Correct system operation involves those activities, and novel methodologies to detect, classify, and isolate faults and failures, model and simulate processes with predictive algorithms and analytics (using data analysis and asset condition to plan and perform activities). We show-case the application of a complex-adaptive, self-organizing modeling method, Probabilistic Boolean Networks (PBN), as a way towards the understanding of the dynamics of smart grid devices, and to model and characterize their behavior. This work demonstrates that PBNs are is equivalent to the standard Reinforcement Learning Cycle, in which the agent/model has an inter-action with its environment and receives feedback from it in the form of a reward signal. Differ-ent reward structures were created in order to characterize preferred behavior. This information can be used to guide the PBN to avoid fault conditions and failures. △ Less

Submitted 1 February, 2021; originally announced February 2021.

arXiv:2011.05127 [pdf, other]

doi 10.3390/rs12142267

A Soft Computing Approach for Selecting and Combining Spectral Bands

Authors: Juan F. H. Albarracín, Rafael S. Oliveira, Marina Hirota, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of sam… ▽ More We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of samples from two different classes. Once the indices specialized for all the pairs of classes are obtained, they are used in pixelwise classification tasks. We used the GP-based solution to evaluate complex classification problems, such as those that are related to the discrimination of vegetation types within and between tropical biomes. Using time series defined in terms of the learned spectral indices, we show that the GP framework leads to superior results than other indices that are used to discriminate and classify tropical biomes. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: MDPI Remote Sensing - Special Issue "Current Limits and New Challenges and Opportunities in Soft Computing, Machine Learning and Computational Intelligence for Remote Sensing"

Journal ref: Remote Sens. 2020, 12(14), 2267

arXiv:2011.03194 [pdf, ps, other]

Fast Approximation Algorithms for Bounded Degree and Crossing Spanning Tree Problems

Authors: Chandra Chekuri, Kent Quanrud, Manuel R. Torres

Abstract: We develop fast approximation algorithms for the minimum-cost version of the Bounded-Degree MST problem (BD-MST) and its generalization the Crossing Spanning Tree problem (Crossing-ST). We solve the underlying LP to within a $(1+ε)$ approximation factor in near-linear time via the multiplicative weight update (MWU) technique. This yields, in particular, a near-linear time algorithm that outputs an… ▽ More We develop fast approximation algorithms for the minimum-cost version of the Bounded-Degree MST problem (BD-MST) and its generalization the Crossing Spanning Tree problem (Crossing-ST). We solve the underlying LP to within a $(1+ε)$ approximation factor in near-linear time via the multiplicative weight update (MWU) technique. This yields, in particular, a near-linear time algorithm that outputs an estimate $B$ such that $B \le B^* \le \lceil (1+ε)B \rceil +1$ where $B^*$ is the minimum-degree of a spanning tree of a given graph. To round the fractional solution, in our main technical contribution, we describe a fast near-linear time implementation of swap-rounding in the spanning tree polytope of a graph. The fractional solution can also be used to sparsify the input graph that can in turn be used to speed up existing combinatorial algorithms. Together, these ideas lead to significantly faster approximation algorithms than known before for the two problems of interest. In addition, a fast algorithm for swap rounding in the graphic matroid is a generic tool that has other applications, including to TSP and submodular function maximization. △ Less

Submitted 17 May, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: Updated to reflect overlap in results with arXiv:1811.07464

arXiv:2010.12059 [pdf, other]

doi 10.1007/978-3-030-86520-7_8

Principled Interpolation in Normalizing Flows

Authors: Samuel G. Fadel, Sebastian Mair, Ricardo da S. Torres, Ulf Brefeld

Abstract: Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples.… ▽ More Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples. This observation suggests that correcting the norm should generally result in better interpolations, but it is not clear how to correct the norm in an unambiguous way. In this paper, we solve this issue by enforcing a fixed norm and, hence, change the base distribution, to allow for a principled way of interpolation. Specifically, we use the Dirichlet and von Mises-Fisher base distributions. Our experimental results show superior performance in terms of bits per dimension, Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) scores for interpolation, while maintaining the same generative performance. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Under review

arXiv:2010.02680 [pdf, other]

doi 10.1109/ICIP40778.2020.9191168

Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation

Authors: Allan Pinto, Manuel A. Córdova, Luis G. L. Decker, Jose L. Flores-Campana, Marcos R. Souza, Andreza A. dos Santos, Jhonatas S. Conceição, Henrique F. Gagliardi, Diogo C. Luvizon, Ricardo da S. Torres, Helio Pedrini

Abstract: Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we p… ▽ More Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates

Journal ref: 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2020, pp. 1621-1625

arXiv:2006.00100 [pdf, other]

Automated Neuron Shape Analysis from Electron Microscopy

Authors: Sharmishtaa Seshamani, Leila Elabbady, Casey Schneider-Mizell, Gayathri Mahalingam, Sven Dorkenwald, Agnes Bodor, Thomas Macrina, Daniel Bumbarger, JoAnn Buchanan, Marc Takeno, Wen**g Yin, Derrick Brittain, Russel Torres, Daniel Kapner, Kisuk lee, Ran Lu, **peng Wu, Nuno daCosta, Clay Reid, Forrest Collman

Abstract: Morphology based analysis of cell types has been an area of great interest to the neuroscience community for several decades. Recently, high resolution electron microscopy (EM) datasets of the mouse brain have opened up opportunities for data analysis at a level of detail that was previously impossible. These datasets are very large in nature and thus, manual analysis is not a practical solution.… ▽ More Morphology based analysis of cell types has been an area of great interest to the neuroscience community for several decades. Recently, high resolution electron microscopy (EM) datasets of the mouse brain have opened up opportunities for data analysis at a level of detail that was previously impossible. These datasets are very large in nature and thus, manual analysis is not a practical solution. Of particular interest are details to the level of post synaptic structures. This paper proposes a fully automated framework for analysis of post-synaptic structure based neuron analysis from EM data. The processing framework involves shape extraction, representation with an autoencoder, and whole cell modeling and analysis based on shape distributions. We apply our novel framework on a dataset of 1031 neurons obtained from imaging a 1mm x 1mm x 40 micrometer volume of the mouse visual cortex and show the strength of our method in clustering and classification of neuronal shapes. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: 9 pages, 4 figures

ACM Class: I.4.9; I.5.3; J.3

arXiv:1912.10314 [pdf, other]

Multimodal Prediction based on Graph Representations

Authors: Icaro Cavalcante Dourado, Salvatore Tabbone, Ricardo da Silva Torres

Abstract: This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on… ▽ More This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on the encoding of multiple ranks for a query (or test sample), defined according to different criteria, into a graph. Later, we project the generated graph into an induced vector space, creating fusion vectors, targeting broader generality and efficiency. A fusion vector estimator is then built to infer whether a multimodal input object refers to a class or not. Our method is capable of promoting a fusion model better than early-fusion and late-fusion alternatives. Performed experiments in the context of multiple multimodal and visual datasets, as well as several descriptors and retrieval models, demonstrate that our learning model is highly effective for different prediction scenarios involving visual, textual, and multimodal features, yielding better effectiveness than state-of-the-art methods. △ Less

Submitted 3 July, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

arXiv:1906.06011 [pdf, other]

Fusion vectors: Embedding Graph Fusions for Efficient Unsupervised Rank Aggregation

Authors: Icaro Cavalcante Dourado, Ricardo da Silva Torres

Abstract: The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation of increasingly ingenious and effective rank aggregation functions. Although recently proposed rank aggregation functions are promising with respect to effective… ▽ More The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation of increasingly ingenious and effective rank aggregation functions. Although recently proposed rank aggregation functions are promising with respect to effectiveness, existing proposals in the area usually overlook efficiency aspects. We propose an innovative rank aggregation function that is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing of graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of fusion vectors, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while bringing novel aspects of multimodality and effectiveness. Consistent speedups are achieved against the recent baselines in all datasets considered. △ Less

Submitted 1 July, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1903.00774 [pdf, other]

doi 10.1109/LGRS.2019.2903194

Spatio-Temporal Vegetation Pixel Classification By Using Convolutional Networks

Authors: Keiller Nogueira, Jefersson A. dos Santos, Nathalia Menini, Thiago S. F. Silva, Leonor Patricia C. Morellato, Ricardo da S. Torres

Abstract: Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from… ▽ More Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from temporal dataset, the heterogeneity of temporal profiles, the variety of plant visual patterns, and the unclear definition of individuals' boundaries in plant communities. In this letter, we propose a novel method, suitable for phenological monitoring, based on Convolutional Networks (ConvNets) to perform spatio-temporal vegetation pixel-classification on high resolution images. We conducted a systematic evaluation using high-resolution vegetation image datasets associated with the Brazilian Cerrado biome. Experimental results show that the proposed approach is effective, overcoming other spatio-temporal pixel-classification strategies. △ Less

Submitted 2 March, 2019; originally announced March 2019.

arXiv:1902.08698 [pdf, ps, other]

$\ell_1$-sparsity Approximation Bounds for Packing Integer Programs

Authors: Chandra Chekuri, Kent Quanrud, Manuel R. Torres

Abstract: We consider approximation algorithms for packing integer programs (PIPs) of the form $\max\{\langle c, x\rangle : Ax \le b, x \in \{0,1\}^n\}$ where $c$, $A$, and $b$ are nonnegative. We let $W = \min_{i,j} b_i / A_{i,j}$ denote the width of $A$ which is at least $1$. Previous work by Bansal et al. \cite{bansal-sparse} obtained an $Ω(\frac{1}{Δ_0^{1/\lfloor W \rfloor}})$-approximation ratio where… ▽ More We consider approximation algorithms for packing integer programs (PIPs) of the form $\max\{\langle c, x\rangle : Ax \le b, x \in \{0,1\}^n\}$ where $c$, $A$, and $b$ are nonnegative. We let $W = \min_{i,j} b_i / A_{i,j}$ denote the width of $A$ which is at least $1$. Previous work by Bansal et al. \cite{bansal-sparse} obtained an $Ω(\frac{1}{Δ_0^{1/\lfloor W \rfloor}})$-approximation ratio where $Δ_0$ is the maximum number of nonzeroes in any column of $A$ (in other words the $\ell_0$-column sparsity of $A$). They raised the question of obtaining approximation ratios based on the $\ell_1$-column sparsity of $A$ (denoted by $Δ_1$) which can be much smaller than $Δ_0$. Motivated by recent work on covering integer programs (CIPs) \cite{cq,chs-16} we show that simple algorithms based on randomized rounding followed by alteration, similar to those of Bansal et al. \cite{bansal-sparse} (but with a twist), yield approximation ratios for PIPs based on $Δ_1$. First, following an integrality gap example from \cite{bansal-sparse}, we observe that the case of $W=1$ is as hard as maximum independent set even when $Δ_1 \le 2$. In sharp contrast to this negative result, as soon as width is strictly larger than one, we obtain positive results via the natural LP relaxation. For PIPs with width $W = 1 + ε$ where $ε\in (0,1]$, we obtain an $Ω(ε^2/Δ_1)$-approximation. In the large width regime, when $W \ge 2$, we obtain an $Ω((\frac{1}{1 + Δ_1/W})^{1/(W-1)})$-approximation. We also obtain a $(1-ε)$-approximation when $W = Ω(\frac{\log (Δ_1/ε)}{ε^2})$. △ Less

Submitted 22 February, 2019; originally announced February 2019.

Comments: To appear in IPCO 2019

arXiv:1901.05743 [pdf, other]

doi 10.1016/j.ipm.2019.03.008

Unsupervised Graph-based Rank Aggregation for Improved Retrieval

Authors: Icaro Cavalcante Dourado, Daniel Carlos Guimarães Pedronette, Ricardo da Silva Torres

Abstract: This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or… ▽ More This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions. △ Less

Submitted 18 March, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

arXiv:1901.05517 [pdf]

Survey of Bayesian Networks Applications to Intelligent Autonomous Vehicles

Authors: Rocío Díaz de León Torres, Martín Molina, Pascual Campoy

Abstract: This article reviews the applications of Bayesian Networks to Intelligent Autonomous Vehicles (IAV) from the decision making point of view, which represents the final step for fully Autonomous Vehicles (currently under discussion). Until now, when it comes making high level decisions for Autonomous Vehicles (AVs), humans have the last word. Based on the works cited in this article and analysis don… ▽ More This article reviews the applications of Bayesian Networks to Intelligent Autonomous Vehicles (IAV) from the decision making point of view, which represents the final step for fully Autonomous Vehicles (currently under discussion). Until now, when it comes making high level decisions for Autonomous Vehicles (AVs), humans have the last word. Based on the works cited in this article and analysis done here, the modules of a general decision making framework and its variables are inferred. Many efforts have been made in the labs showing Bayesian Networks as a promising computer model for decision making. Further research should go into the direction of testing Bayesian Network models in real situations. In addition to the applications, Bayesian Network fundamentals are introduced as elements to consider when develo** IAVs with the potential of making high level judgement calls. △ Less

Submitted 21 February, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

Comments: 34 pages, 2 figures, 3 tables

arXiv:1811.07174 [pdf, other]

Link Prediction in Dynamic Graphs for Recommendation

Authors: Samuel G. Fadel, Ricardo da S. Torres

Abstract: Recent advances in employing neural networks on graph domains helped push the state of the art in link prediction tasks, particularly in recommendation services. However, the use of temporal contextual information, often modeled as dynamic graphs that encode the evolution of user-item relationships over time, has been overlooked in link prediction problems. In this paper, we consider the hypothesi… ▽ More Recent advances in employing neural networks on graph domains helped push the state of the art in link prediction tasks, particularly in recommendation services. However, the use of temporal contextual information, often modeled as dynamic graphs that encode the evolution of user-item relationships over time, has been overlooked in link prediction problems. In this paper, we consider the hypothesis that leveraging such information enables models to make better predictions, proposing a new neural network approach for this. Our experiments, performed on the widely used ML-100k and ML-1M datasets, show that our approach produces better predictions in scenarios where the pattern of user-item relationships change over time. In addition, they suggest that existing approaches are significantly impacted by those changes. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: Workshop on Relational Representation Learning (R2L), NIPS 2018

arXiv:1808.00561 [pdf, other]

Geometric Fingerprint Recognition via Oriented Point-Set Pattern Matching

Authors: David Eppstein, Michael T. Goodrich, Jordan Jorgensen, Manuel R. Torres

Abstract: Motivated by the problem of fingerprint matching, we present geometric approximation algorithms for matching a pattern point set against a background point set, where the points have angular orientations in addition to their positions. Motivated by the problem of fingerprint matching, we present geometric approximation algorithms for matching a pattern point set against a background point set, where the points have angular orientations in addition to their positions. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Comments: 15 pages, 12 figures, to be presented at CCCG18

arXiv:1711.06809 [pdf, other]

A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization

Authors: Érico M. Pereira, Ricardo da S. Torres, Jefersson A. dos Santos

Abstract: Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approac… ▽ More Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approaches to learn image representations through the search for optimized quantization schemes, which lead to more effective feature extraction algorithms and compact representations. Our strategy employs Genetic Algorithm, a soft-computing apparatus successfully utilized in Information-retrieval-related optimization problems. We hypothesize that changing the quantization affects the quality of image description approaches, leading to effective and efficient representations. We evaluate our approaches in content-based image retrieval tasks, considering eight well-known datasets with different visual properties. Results indicate that the approach focused on representation effectiveness outperformed baselines in all tested scenarios. The other approach, which also considers the size of created representations, produced competitive results kee** or even reducing the dimensionality of feature vectors up to 25%. △ Less

Submitted 20 November, 2020; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: Submitted to Multimedia Tools and Applications

Report number: MTAP-D-19-02724R2

arXiv:1711.03564 [pdf, other]

doi 10.1109/LGRS.2018.2845549

Exploiting ConvNet Diversity for Flooding Identification

Authors: Keiller Nogueira, Samuel G. Fadel, Ícaro C. Dourado, Rafael de O. Werneck, Javier A. V. Muñoz, Otávio A. B. Penatti, Rodrigo T. Calumby, Lin Tzy Li, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing ima… ▽ More Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing images using deep learning. Specifically, some proposed techniques are based upon unique networks, such as dilated and deconvolutional ones, while other was conceived to exploit diversity of distinct networks in order to extract the maximum performance of each classifier. Evaluation of the proposed algorithms were conducted in a high-resolution remote sensing dataset. Results show that the proposed algorithms outperformed several state-of-the-art baselines, providing improvements ranging from 1 to 4% in terms of the Jaccard Index. △ Less

Submitted 5 June, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: Work winner of the Flood-Detection in Satellite Images, a subtask of 2017 Multimedia Satellite Task (MediaEval Benchmark) Accepted for publication in the Geoscience and Remote Sensing Letters (GRSL)

arXiv:1709.01433 [pdf, ps, other]

doi 10.1007/s10878-018-0254-1

An Exact Approach for the Balanced k-Way Partitioning Problem with Weight Constraints and its Application to Sports Team Realignment

Authors: Diego Recalde, Daniel Severín, Ramiro Torres, Polo Vaca

Abstract: In this work a balanced k-way partitioning problem with weight constraints is defined to model the sports team realignment. Sports teams must be partitioned into a fixed number of groups according to some regulations, where the total distance of the road trips that all teams must travel to play a Double Round Robin Tournament in each group is minimized. Two integer programming formulations for thi… ▽ More In this work a balanced k-way partitioning problem with weight constraints is defined to model the sports team realignment. Sports teams must be partitioned into a fixed number of groups according to some regulations, where the total distance of the road trips that all teams must travel to play a Double Round Robin Tournament in each group is minimized. Two integer programming formulations for this problem are introduced, and the validity of three families of inequalities associated to the polytope of these formulations is proved. The performance of a tabu search procedure and a Branch & Cut algorithm, which uses the valid inequalities as cuts, is evaluated over simulated and real-world instances. In particular, an optimal solution for the realignment of the Ecuadorian Football league is reported and the methodology can be suitable adapted for the realignment of other sports leagues. △ Less

Submitted 5 September, 2017; originally announced September 2017.

Comments: A preliminary version of this paper appeared at ISCO 2016

Journal ref: Journal of Combinatorial Optimization 36 (2018) 916-936

arXiv:1609.07239 [pdf, other]

doi 10.1145/2996913.2996976

A Topological Algorithm for Determining How Road Networks Evolve Over Time

Authors: M T Goodrich, Siddharth Gupta, Manuel R. Torres

Abstract: We provide an efficient algorithm for determining how a road network has evolved over time, given two snapshot instances from different dates. To allow for such determinations across different databases and even against hand drawn maps, we take a strictly topological approach in this paper, so that we compare road networks based strictly on graph-theoretic properties. Given two road networks of sa… ▽ More We provide an efficient algorithm for determining how a road network has evolved over time, given two snapshot instances from different dates. To allow for such determinations across different databases and even against hand drawn maps, we take a strictly topological approach in this paper, so that we compare road networks based strictly on graph-theoretic properties. Given two road networks of same region from two different dates, our approach allows one to match road network portions that remain intact and also point out added or removed portions. We analyze our algorithm both theoretically, showing that it runs in polynomial time for non-degenerate road networks even though a related problem is NP-complete, and experimentally, using dated road networks from the TIGER/Line archive of the U.S. Census Bureau. △ Less

Submitted 23 September, 2016; originally announced September 2016.

arXiv:1603.03627 [pdf, other]

Learning from Imbalanced Multiclass Sequential Data Streams Using Dynamically Weighted Conditional Random Fields

Authors: Roberto L. Shinmoto Torres, Damith C. Ranasinghe, Qinfeng Shi, Anton van den Hengel

Abstract: The present study introduces a method for improving the classification performance of imbalanced multiclass data streams from wireless body worn sensors. Data imbalance is an inherent problem in activity recognition caused by the irregular time distribution of activities, which are sequential and dependent on previous movements. We use conditional random fields (CRF), a graphical model for structu… ▽ More The present study introduces a method for improving the classification performance of imbalanced multiclass data streams from wireless body worn sensors. Data imbalance is an inherent problem in activity recognition caused by the irregular time distribution of activities, which are sequential and dependent on previous movements. We use conditional random fields (CRF), a graphical model for structured classification, to take advantage of dependencies between activities in a sequence. However, CRFs do not consider the negative effects of class imbalance during training. We propose a class-wise dynamically weighted CRF (dWCRF) where weights are automatically determined during training by maximizing the expected overall F-score. Our results based on three case studies from a healthcare application using a batteryless body worn sensor, demonstrate that our method, in general, improves overall and minority class F-score when compared to other CRF based classifiers and achieves similar or better overall and class-wise performance when compared to SVM based classifiers under conditions of limited training data. We also confirm the performance of our approach using an additional battery powered body worn sensor dataset, achieving similar results in cases of high class imbalance. △ Less

Submitted 11 March, 2016; originally announced March 2016.

Comments: 28 pages, 8 figures, 1 table

arXiv:1511.06704 [pdf, other]

Semantic Diversity versus Visual Diversity in Visual Dictionaries

Authors: Otávio A. B. Penatti, Sandra Avila, Eduardo Valle, Ricardo da S. Torres

Abstract: Visual dictionaries are a critical component for image classification/retrieval systems based on the bag-of-visual-words (BoVW) model. Dictionaries are usually learned without supervision from a training set of images sampled from the collection of interest. However, for large, general-purpose, dynamic image collections (e.g., the Web), obtaining a representative sample in terms of semantic concep… ▽ More Visual dictionaries are a critical component for image classification/retrieval systems based on the bag-of-visual-words (BoVW) model. Dictionaries are usually learned without supervision from a training set of images sampled from the collection of interest. However, for large, general-purpose, dynamic image collections (e.g., the Web), obtaining a representative sample in terms of semantic concepts is not straightforward. In this paper, we evaluate the impact of semantics in the dictionary quality, aiming at verifying the importance of semantic diversity in relation visual diversity for visual dictionaries. In the experiments, we vary the amount of classes used for creating the dictionary and then compute different BoVW descriptors, using multiple codebook sizes and different coding and pooling methods (standard BoVW and Fisher Vectors). Results for image classification show that as visual dictionaries are based on low-level visual appearances, visual diversity is more important than semantic diversity. Our conclusions open the opportunity to alleviate the burden in generating visual dictionaries as we need only a visually diverse set of images instead of the whole collection to create a good dictionary. △ Less

Submitted 20 November, 2015; originally announced November 2015.

arXiv:1209.0410 [pdf, other]

Approximate Similarity Search for Online Multimedia Services on Distributed CPU-GPU Platforms

Authors: George Teodoro, Eduardo Valle, Nathan Mariano, Ricardo Torres, Wagner Meira Jr, Joel H. Saltz

Abstract: Similarity search in high-dimentional spaces is a pivotal operation found a variety of database applications. Recently, there has been an increase interest in similarity search for online content-based multimedia services. Those services, however, introduce new challenges with respect to the very large volumes of data that have to be indexed/searched, and the need to minimize response times observ… ▽ More Similarity search in high-dimentional spaces is a pivotal operation found a variety of database applications. Recently, there has been an increase interest in similarity search for online content-based multimedia services. Those services, however, introduce new challenges with respect to the very large volumes of data that have to be indexed/searched, and the need to minimize response times observed by the end-users. Additionally, those users dynamically interact with the systems creating fluctuating query request rates, requiring the search algorithm to adapt in order to better utilize the underline hardware to reduce response times. In order to address these challenges, we introduce hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases, aiming at online content-based multimedia services. Hypercurves executes on hybrid CPU--GPU environments, and is able to employ those devices cooperatively to support massive query request rates. In order to keep the response times optimal as the request rates vary, it employs a novel dynamic scheduler to partition the work between CPU and GPU. Hypercurves was throughly evaluated using a large database of multimedia descriptors. Its cooperative CPU--GPU execution achieved performance improvements of up to 30x when compared to the single CPU-core version. The dynamic work partition mechanism reduces the observed query response times in about 50% when compared to the best static CPU--GPU task partition configuration. In addition, Hypercurves achieves superlinear scalability in distributed (multi-node) executions, while kee** a high guarantee of equivalence with its sequential version --- thanks to the proof of probabilistic equivalence, which supported its aggressive parallelization design. △ Less

Submitted 3 September, 2012; originally announced September 2012.

Comments: 25 pages

arXiv:1205.2663 [pdf, ps, other]

Are visual dictionaries generalizable?

Authors: Otavio A. B. Penatti, Eduardo Valle, Ricardo da S. Torres

Abstract: Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a re… ▽ More Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments. △ Less

Submitted 11 May, 2012; originally announced May 2012.

arXiv:1104.4723 [pdf, other]

doi 10.1145/2324796.2324815

Bayesian approach for near-duplicate image detection

Authors: Lucas Moutinho Bueno, Eduardo Valle, Ricardo da Silva Torres

Abstract: In this paper we propose a bayesian approach for near-duplicate image detection, and investigate how different probabilistic models affect the performance obtained. The task of identifying an image whose metadata are missing is often demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, investigation of latent cross-links in archives… ▽ More In this paper we propose a bayesian approach for near-duplicate image detection, and investigate how different probabilistic models affect the performance obtained. The task of identifying an image whose metadata are missing is often demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, investigation of latent cross-links in archives and libraries, duplicate elimination in storage management, etc. The majority of current solutions are based either on voting algorithms, which are very precise, but expensive; either on the use of visual dictionaries, which are efficient, but less precise. Our approach, uses local descriptors in a novel way, which by a careful application of decision theory, allows a very fine control of the compromise between precision and efficiency. In addition, the method attains a great compromise between those two axes, with more than 99% accuracy with less than 10 database operations. △ Less

Submitted 25 April, 2011; originally announced April 2011.

Showing 1–37 of 37 results for author: Torres, R