-
Towards a Unified Framework for Evaluating Explanations
Authors:
Juan D. Pinto,
Luc Paquette
Abstract:
The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability,…
▽ More
The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework of evaluation criteria and lay the groundwork for such a framework by articulating the relationships between existing criteria. We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models analyzed via post-hoc techniques. We further argue that useful explanations require both faithfulness and intelligibility. Explanation plausibility is a prerequisite for intelligibility, while stability is a prerequisite for explanation faithfulness. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Physics-Informed Neural Network for Multirotor Slung Load Systems Modeling
Authors:
Gil Serrano,
Marcelo Jacinto,
Jose Ribeiro-Gomes,
Joao Pinto,
Bruno J. Guerreiro,
Alexandre Bernardino,
Rita Cunha
Abstract:
Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not dens…
▽ More
Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not densely represented in training data. In this work, we explore the use of physics informed neural networks to learn an end-to-end model of the multirotor-slung-load system and, at a given time, estimate a sequence of the future system states. An LSTM encoder decoder with an attention mechanism is used to capture the dynamics of the system. To guarantee the cohesiveness between the multiple predicted states of the system, we propose the use of a physics-based term in the loss function, which includes a discretized physical model derived from first principles together with slack variables that allow for a small mismatch between expected and predicted values. To train the model, a dataset using a real-world quadrotor carrying a slung load was curated and is made available. Prediction results are presented and corroborate the feasibility of the approach. The proposed method outperforms both the first principles physical model and a comparable neural network model trained without the physics regularization proposed.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Deep Learning for Educational Data Science
Authors:
Juan D. Pinto,
Luc Paquette
Abstract:
With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science -- a field consisting of various interrelated research communities -- have turned their attention to leveraging these powerful algorithms within the domain of education. Use cases range from advanced knowledge tracing models that can leverage op…
▽ More
With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science -- a field consisting of various interrelated research communities -- have turned their attention to leveraging these powerful algorithms within the domain of education. Use cases range from advanced knowledge tracing models that can leverage open-ended student essays or snippets of code to automatic affect and behavior detectors that can identify when a student is frustrated or aimlessly trying to solve problems unproductively -- and much more. This chapter provides a brief introduction to deep learning, describes some of its advantages and limitations, presents a survey of its many uses in education, and discusses how it may further come to shape the field of educational data science.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions
Authors:
Daniel de S. Moraes,
Pedro T. C. Santos,
Polyana B. da Costa,
Matheus A. S. Pinto,
Ivan de J. P. Pinto,
Álvaro M. G. da Veiga,
Sergio Colcher,
Antonio J. G. Busson,
Rafael H. Rocha,
Rennan Gaio,
Rafael Miceli,
Gabriela Tourinho,
Marcos Rabaioli,
Leandro Santos,
Fellipe Marques,
David Favaro
Abstract:
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot promp…
▽ More
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
△ Less
Submitted 11 February, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Authors:
Juliano Pinto,
Georg Hess,
Yuxuan Xia,
Henk Wymeersch,
Lennart Svensson
Abstract:
Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational com…
▽ More
Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Authors:
Sander Schulhoff,
Jeremy Pinto,
Anaum Khan,
Louis-François Bouchard,
Chenglei Si,
Svetlina Anati,
Valen Tagliabue,
Anson Liu Kost,
Christopher Carnahan,
Jordan Boyd-Graber
Abstract:
Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant securit…
▽ More
Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.
△ Less
Submitted 2 March, 2024; v1 submitted 24 October, 2023;
originally announced November 2023.
-
Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation
Authors:
Marcelo Jacinto,
João Pinto,
Jay Patrikar,
John Keller,
Rita Cunha,
Sebastian Scherer,
António Pascoal
Abstract:
Develo** and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extensio…
▽ More
Develo** and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extension that enables real-time simulation of multiple multirotor vehicles in photo-realistic environments, while providing out-of-the-box integration with the widely adopted PX4-Autopilot and ROS2 through its modular implementation and intuitive graphical user interface. To demonstrate some of its capabilities, a nonlinear controller was implemented and simulation results for two drones performing aggressive flight maneuvers are presented. Code and documentation for this framework are also provided as supplementary material.
△ Less
Submitted 15 April, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring
Authors:
João Ribeiro Pinto
Abstract:
Artificially intelligent perception is increasingly present in the lives of every one of us. Vehicles are no exception, (...) In the near future, pattern recognition will have an even stronger role in vehicles, as self-driving cars will require automated ways to understand what is happening around (and within) them and act accordingly. (...) This doctoral work focused on advancing in-vehicle sensi…
▽ More
Artificially intelligent perception is increasingly present in the lives of every one of us. Vehicles are no exception, (...) In the near future, pattern recognition will have an even stronger role in vehicles, as self-driving cars will require automated ways to understand what is happening around (and within) them and act accordingly. (...) This doctoral work focused on advancing in-vehicle sensing through the research of novel computer vision and pattern recognition methodologies for both biometrics and wellbeing monitoring. The main focus has been on electrocardiogram (ECG) biometrics, a trait well-known for its potential for seamless driver monitoring. Major efforts were devoted to achieving improved performance in identification and identity verification in off-the-person scenarios, well-known for increased noise and variability. Here, end-to-end deep learning ECG biometric solutions were proposed and important topics were addressed such as cross-database and long-term performance, waveform relevance through explainability, and interlead conversion. Face biometrics, a natural complement to the ECG in seamless unconstrained scenarios, was also studied in this work. The open challenges of masked face recognition and interpretability in biometrics were tackled in an effort to evolve towards algorithms that are more transparent, trustworthy, and robust to significant occlusions. Within the topic of wellbeing monitoring, improved solutions to multimodal emotion recognition in groups of people and activity/violence recognition in in-vehicle scenarios were proposed. At last, we also proposed a novel way to learn template security within end-to-end models, dismissing additional separate encryption processes, and a self-supervised learning approach tailored to sequential data, in order to ensure data security and optimal performance. (...)
△ Less
Submitted 19 January, 2023; v1 submitted 8 January, 2023;
originally announced January 2023.
-
Causality-Inspired Taxonomy for Explainable Artificial Intelligence
Authors:
Pedro C. Neto,
Tiago Gonçalves,
João Ribeiro Pinto,
Wilson Silva,
Ana F. Sequeira,
Arun Ross,
Jaime S. Cardoso
Abstract:
As two sides of the same coin, causality and explainable artificial intelligence (xAI) were initially proposed and developed with different goals. However, the latter can only be complete when seen through the lens of the causality framework. As such, we propose a novel causality-inspired framework for xAI that creates an environment for the development of xAI approaches. To show its applicability…
▽ More
As two sides of the same coin, causality and explainable artificial intelligence (xAI) were initially proposed and developed with different goals. However, the latter can only be complete when seen through the lens of the causality framework. As such, we propose a novel causality-inspired framework for xAI that creates an environment for the development of xAI approaches. To show its applicability, biometrics was used as case study. For this, we have analysed 81 research papers on a myriad of biometric modalities and different tasks. We have categorised each of these methods according to our novel xAI Ladder and discussed the future directions of the field.
△ Less
Submitted 4 March, 2024; v1 submitted 19 August, 2022;
originally announced August 2022.
-
OCFR 2022: Competition on Occluded Face Recognition From Synthetically Generated Structure-Aware Occlusions
Authors:
Pedro C. Neto,
Fadi Boutros,
Joao Ribeiro Pinto,
Naser Damer,
Ana F. Sequeira,
Jaime S. Cardoso,
Messaoud Bengherabi,
Abderaouf Bousnat,
Sana Boucheta,
Nesrine Hebbadj,
Mustafa Ekrem Erakın,
Uğur Demir,
Hazım Kemal Ekenel,
Pedro Beber de Queiroz Vidal,
David Menotti
Abstract:
This work summarizes the IJCB Occluded Face Recognition Competition 2022 (IJCB-OCFR-2022) embraced by the 2022 International Joint Conference on Biometrics (IJCB 2022). OCFR-2022 attracted a total of 3 participating teams, from academia. Eventually, six valid submissions were submitted and then evaluated by the organizers. The competition was held to address the challenge of face recognition in th…
▽ More
This work summarizes the IJCB Occluded Face Recognition Competition 2022 (IJCB-OCFR-2022) embraced by the 2022 International Joint Conference on Biometrics (IJCB 2022). OCFR-2022 attracted a total of 3 participating teams, from academia. Eventually, six valid submissions were submitted and then evaluated by the organizers. The competition was held to address the challenge of face recognition in the presence of severe face occlusions. The participants were free to use any training data and the testing data was built by the organisers by synthetically occluding parts of the face images using a well-known dataset. The submitted solutions presented innovations and performed very competitively with the considered baseline. A major output of this competition is a challenging, realistic, and diverse, and publicly available occluded face recognition benchmark with well defined evaluation protocols.
△ Less
Submitted 15 August, 2022; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Can Deep Learning be Applied to Model-Based Multi-Object Tracking?
Authors:
Juliano Pinto,
Georg Hess,
William Ljungbergh,
Yuxuan Xia,
Henk Wymeersch,
Lennart Svensson
Abstract:
Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements, with important applications such as autonomous driving, tracking animal behavior, defense systems, and others. In recent years, deep learning (DL) has been increasingly used in MOT for improving tracking performance, but mostly in settings where the measuremen…
▽ More
Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements, with important applications such as autonomous driving, tracking animal behavior, defense systems, and others. In recent years, deep learning (DL) has been increasingly used in MOT for improving tracking performance, but mostly in settings where the measurements are high-dimensional and there are no available models of the measurement likelihood and the object dynamics. The model-based setting instead has not attracted as much attention, and it is still unclear if DL methods can outperform traditional model-based Bayesian methods, which are the state of the art (SOTA) in this context. In this paper, we propose a Transformer-based DL tracker and evaluate its performance in the model-based setting, comparing it to SOTA model-based Bayesian methods in a variety of different tasks. Our results show that the proposed DL method can match the performance of the model-based methods in simple tasks, while outperforming them when the task gets more complicated, either due to an increase in the data association complexity, or to stronger nonlinearities of the models of the environment.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition
Authors:
Pedro C. Neto,
Fadi Boutros,
João Ribeiro Pinto,
Naser Damer,
Ana F. Sequeira,
Jaime S. Cardoso
Abstract:
SARS-CoV-2 has presented direct and indirect challenges to the scientific community. One of the most prominent indirect challenges advents from the mandatory use of face masks in a large number of countries. Face recognition methods struggle to perform identity verification with similar accuracy on masked and unmasked individuals. It has been shown that the performance of these methods drops consi…
▽ More
SARS-CoV-2 has presented direct and indirect challenges to the scientific community. One of the most prominent indirect challenges advents from the mandatory use of face masks in a large number of countries. Face recognition methods struggle to perform identity verification with similar accuracy on masked and unmasked individuals. It has been shown that the performance of these methods drops considerably in the presence of face masks, especially if the reference image is unmasked. We propose FocusFace, a multi-task architecture that uses contrastive learning to be able to accurately perform masked face recognition. The proposed architecture is designed to be trained from scratch or to work on top of state-of-the-art face recognition methods without sacrificing the capabilities of a existing models in conventional face recognition tasks. We also explore different approaches to design the contrastive learning module. Results are presented in terms of masked-masked (M-M) and unmasked-masked (U-M) face verification performance. For both settings, the results are on par with published methods, but for M-M specifically, the proposed method was able to outperform all the solutions that it was compared to. We further show that when using our method on top of already existing methods the training computational costs decrease significantly while retaining similar performances. The implementation and the trained models are available at GitHub.
△ Less
Submitted 1 November, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
My Eyes Are Up Here: Promoting Focus on Uncovered Regions in Masked Face Recognition
Authors:
Pedro C. Neto,
Fadi Boutros,
João Ribeiro Pinto,
Mohsen Saffari,
Naser Damer,
Ana F. Sequeira,
Jaime S. Cardoso
Abstract:
The recent Covid-19 pandemic and the fact that wearing masks in public is now mandatory in several countries, created challenges in the use of face recognition systems (FRS). In this work, we address the challenge of masked face recognition (MFR) and focus on evaluating the verification performance in FRS when verifying masked vs unmasked faces compared to verifying only unmasked faces. We propose…
▽ More
The recent Covid-19 pandemic and the fact that wearing masks in public is now mandatory in several countries, created challenges in the use of face recognition systems (FRS). In this work, we address the challenge of masked face recognition (MFR) and focus on evaluating the verification performance in FRS when verifying masked vs unmasked faces compared to verifying only unmasked faces. We propose a methodology that combines the traditional triplet loss and the mean squared error (MSE) intending to improve the robustness of an MFR system in the masked-unmasked comparison mode. The results obtained by our proposed method show improvements in a detailed step-wise ablation study. The conducted study showed significant performance gains induced by our proposed training paradigm and modified triplet loss on two evaluation databases.
△ Less
Submitted 18 August, 2021; v1 submitted 2 August, 2021;
originally announced August 2021.
-
MFR 2021: Masked Face Recognition Competition
Authors:
Fadi Boutros,
Naser Damer,
Jan Niklas Kolf,
Kiran Raja,
Florian Kirchbuchner,
Raghavendra Ramachandra,
Arjan Kuijper,
Pengcheng Fang,
Chao Zhang,
Fei Wang,
David Montero,
Naiara Aginako,
Basilio Sierra,
Marcos Nieto,
Mustafa Ekrem Erakin,
Ugur Demir,
Hazim Kemal,
Ekenel,
Asaki Kataoka,
Kohei Ichikawa,
Shizuma Kubo,
Jie Zhang,
Mingjie He,
Dan Han,
Shiguang Shan
, et al. (10 additional authors not shown)
Abstract:
This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 vali…
▽ More
This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 valid solutions. The competition is designed to motivate solutions aiming at enhancing the face recognition accuracy of masked faces. Moreover, the competition considered the deployability of the proposed solutions by taking the compactness of the face recognition models into account. A private dataset representing a collaborative, multi-session, real masked, capture scenario is used to evaluate the submitted solutions. In comparison to one of the top-performing academic face recognition solutions, 10 out of the 18 submitted solutions did score higher masked face verification accuracy.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Encoder-Decoder Architectures for Clinically Relevant Coronary Artery Segmentation
Authors:
João Lourenço Silva,
Miguel Nobre Menezes,
Tiago Rodrigues,
Beatriz Silva,
Fausto J. Pinto,
Arlindo L. Oliveira
Abstract:
Coronary X-ray angiography is a crucial clinical procedure for the diagnosis and treatment of coronary artery disease, which accounts for roughly 16% of global deaths every year. However, the images acquired in these procedures have low resolution and poor contrast, making lesion detection and assessment challenging. Accurate coronary artery segmentation not only helps mitigate these problems, but…
▽ More
Coronary X-ray angiography is a crucial clinical procedure for the diagnosis and treatment of coronary artery disease, which accounts for roughly 16% of global deaths every year. However, the images acquired in these procedures have low resolution and poor contrast, making lesion detection and assessment challenging. Accurate coronary artery segmentation not only helps mitigate these problems, but also allows the extraction of relevant anatomical features for further analysis by quantitative methods. Although automated segmentation of coronary arteries has been proposed before, previous approaches have used non-optimal segmentation criteria, leading to less useful results. Most methods either segment only the major vessel, discarding important information from the remaining ones, or segment the whole coronary tree based mostly on contrast information, producing a noisy output that includes vessels that are not relevant for diagnosis. We adopt a better-suited clinical criterion and segment vessels according to their clinical relevance. Additionally, we simultaneously perform catheter segmentation, which may be useful for diagnosis due to the scale factor provided by the catheter's known diameter, and is a task that has not yet been performed with good results. To derive the optimal approach, we conducted an extensive comparative study of encoder-decoder architectures trained on a combination of focal loss and a variant of generalized dice loss. Based on the EfficientNet and the UNet++ architectures, we propose a line of efficient and high-performance segmentation models using a new decoder architecture, the EfficientUNet++, whose best-performing version achieved average dice scores of 0.8904 and 0.7526 for the artery and catheter classes, respectively, and an average generalized dice score of 0.9234.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Next Generation Multitarget Trackers: Random Finite Set Methods vs Transformer-based Deep Learning
Authors:
Juliano Pinto,
Georg Hess,
William Ljungbergh,
Yuxuan Xia,
Lennart Svensson,
Henk Wymeersch
Abstract:
Multitarget Tracking (MTT) is the problem of tracking the states of an unknown number of objects using noisy measurements, with important applications to autonomous driving, surveillance, robotics, and others. In the model-based Bayesian setting, there are conjugate priors that enable us to express the multi-object posterior in closed form, which could theoretically provide Bayes-optimal estimates…
▽ More
Multitarget Tracking (MTT) is the problem of tracking the states of an unknown number of objects using noisy measurements, with important applications to autonomous driving, surveillance, robotics, and others. In the model-based Bayesian setting, there are conjugate priors that enable us to express the multi-object posterior in closed form, which could theoretically provide Bayes-optimal estimates. However, the posterior involves a super-exponential growth of the number of hypotheses over time, forcing state-of-the-art methods to resort to approximations for remaining tractable, which can impact their performance in complex scenarios. Model-free methods based on deep-learning provide an attractive alternative, as they can, in principle, learn the optimal filter from data, but to the best of our knowledge were never compared to current state-of-the-art Bayesian filters, specially not in contexts where accurate models are available. In this paper, we propose a high-performing deep-learning method for MTT based on the Transformer architecture and compare it to two state-of-the-art Bayesian filters, in a setting where we assume the correct model is provided. Although this gives an edge to the model-based filters, it also allows us to generate unlimited training data. We show that the proposed model outperforms state-of-the-art Bayesian filters in complex scenarios, while matching their performance in simpler cases, which validates the applicability of deep-learning also in the model-based regime. The code for all our implementations is made available at https://github.com/JulianoLagana/MT3 .
△ Less
Submitted 4 June, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Automated Detection of Coronary Artery Stenosis in X-ray Angiography using Deep Neural Networks
Authors:
Dinis L. Rodrigues,
Miguel Nobre Menezes,
Fausto J. Pinto,
Arlindo L. Oliveira
Abstract:
Coronary artery disease leading up to stenosis, the partial or total blocking of coronary arteries, is a severe condition that affects millions of patients each year. Automated identification and classification of stenosis severity from minimally invasive procedures would be of great clinical value, but existing methods do not match the accuracy of experienced cardiologists, due to the complexity…
▽ More
Coronary artery disease leading up to stenosis, the partial or total blocking of coronary arteries, is a severe condition that affects millions of patients each year. Automated identification and classification of stenosis severity from minimally invasive procedures would be of great clinical value, but existing methods do not match the accuracy of experienced cardiologists, due to the complexity of the task. Although a number of computational approaches for quantitative assessment of stenosis have been proposed to date, the performance of these methods is still far from the required levels for clinical applications. In this paper, we propose a two-step deep-learning framework to partially automate the detection of stenosis from X-ray coronary angiography images. In the two steps, we used two distinct convolutional neural network architectures, one to automatically identify and classify the angle of view, and another to determine the bounding boxes of the regions of interest in frames where stenosis is visible. Transfer learning and data augmentation techniques were used to boost the performance of the system in both tasks. We achieved a 0.97 accuracy on the task of classifying the Left/Right Coronary Artery (LCA/RCA) angle view and 0.68/0.73 recall on the determination of the regions of interest, for LCA and RCA, respectively. These results compare favorably with previous results obtained using related approaches, and open the way to a fully automated method for the identification of stenosis severity from X-ray angiographies.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Fast Solver for Quasi-Periodic 2D-Helmholtz Scattering in Layered Media
Authors:
José Pinto,
Rubén Aylwin,
Carlos Jerez-Hanckes
Abstract:
We present a fast spectral Galerkin scheme for the discretization of boundary integral equations arising from two-dimensional Helmholtz transmission problems in multi-layered periodic structures or gratings. Employing suitably parametrized Fourier basis and excluding Rayleigh-Wood anomalies, we rigorously establish the well-posedness of both continuous and discrete problems, and prove super-algebr…
▽ More
We present a fast spectral Galerkin scheme for the discretization of boundary integral equations arising from two-dimensional Helmholtz transmission problems in multi-layered periodic structures or gratings. Employing suitably parametrized Fourier basis and excluding Rayleigh-Wood anomalies, we rigorously establish the well-posedness of both continuous and discrete problems, and prove super-algebraic error convergence rates for the proposed scheme. Through several numerical examples, we confirm our findings and show performances competitive to those attained via Nyström methods.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
DACS: Domain Adaptation via Cross-domain Mixed Sampling
Authors:
Wilhelm Tranheden,
Viktor Olsson,
Juliano Pinto,
Lennart Svensson
Abstract:
Semantic segmentation models based on convolutional neural networks have recently displayed remarkable performance for a multitude of applications. However, these models typically do not generalize well when applied on new domains, especially when going from synthetic to real data. In this paper we address the problem of unsupervised domain adaptation (UDA), which attempts to train on labelled dat…
▽ More
Semantic segmentation models based on convolutional neural networks have recently displayed remarkable performance for a multitude of applications. However, these models typically do not generalize well when applied on new domains, especially when going from synthetic to real data. In this paper we address the problem of unsupervised domain adaptation (UDA), which attempts to train on labelled data from one domain (source domain), and simultaneously learn from unlabelled data in the domain of interest (target domain). Existing methods have seen success by training on pseudo-labels for these unlabelled images. Multiple techniques have been proposed to mitigate low-quality pseudo-labels arising from the domain shift, with varying degrees of success. We propose DACS: Domain Adaptation via Cross-domain mixed Sampling, which mixes images from the two domains along with the corresponding labels and pseudo-labels. These mixed samples are then trained on, in addition to the labelled data itself. We demonstrate the effectiveness of our solution by achieving state-of-the-art results for GTA5 to Cityscapes, a common synthetic-to-real semantic segmentation benchmark for UDA.
△ Less
Submitted 29 November, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning
Authors:
Viktor Olsson,
Wilhelm Tranheden,
Juliano Pinto,
Lennart Svensson
Abstract:
The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for training, which sometimes requires hours of manual labor for a single image. Because of this, semi-supervised methods have been applied to this task, with varying d…
▽ More
The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for training, which sometimes requires hours of manual labor for a single image. Because of this, semi-supervised methods have been applied to this task, with varying degrees of success. A key challenge is that common augmentations used in semi-supervised classification are less effective for semantic segmentation. We propose a novel data augmentation mechanism called ClassMix, which generates augmentations by mixing unlabelled samples, by leveraging on the network's predictions for respecting object boundaries. We evaluate this augmentation technique on two common semi-supervised semantic segmentation benchmarks, showing that it attains state-of-the-art results. Lastly, we also provide extensive ablation studies comparing different design decisions and training regimes.
△ Less
Submitted 29 November, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Bayesian Linear Regression on Deep Representations
Authors:
John Moberg,
Lennart Svensson,
Juliano Pinto,
Henk Wymeersch
Abstract:
A simple approach to obtaining uncertainty-aware neural networks for regression is to do Bayesian linear regression (BLR) on the representation from the last hidden layer. Recent work [Riquelme et al., 2018, Azizzadenesheli et al., 2018] indicates that the method is promising, though it has been limited to homoscedastic noise. In this paper, we propose a novel variation that enables the method to…
▽ More
A simple approach to obtaining uncertainty-aware neural networks for regression is to do Bayesian linear regression (BLR) on the representation from the last hidden layer. Recent work [Riquelme et al., 2018, Azizzadenesheli et al., 2018] indicates that the method is promising, though it has been limited to homoscedastic noise. In this paper, we propose a novel variation that enables the method to flexibly model heteroscedastic noise. The method is benchmarked against two prominent alternative methods on a set of standard datasets, and finally evaluated as an uncertainty-aware model in model-based reinforcement learning. Our experiments indicate that the method is competitive with standard ensembling, and ensembles of BLR outperforms the methods we compared to.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
Authors:
Siddharth Gururani,
Kilol Gupta,
Dhaval Shah,
Zahra Shakeri,
Jervis Pinto
Abstract:
This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are…
▽ More
This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are extracted from reference audio and then used to condition a TC2 synthesizer. The trained model is evaluated using subjective listening tests and a novel objective evaluation of prosody transfer is proposed. Listening tests show that the synthesized speech is rated as highly natural and that prosody is successfully transferred from the reference speech signal to the synthesized signal.
△ Less
Submitted 15 May, 2020; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer
Authors:
Marcelo Ponce,
Ramses van Zon,
Scott Northrup,
Daniel Gruner,
Joseph Chen,
Fatih Ertinaz,
Alexey Fedoseev,
Leslie Groer,
Fei Mao,
Bruno C. Mundim,
Mike Nolta,
Jaime Pinto,
Marco Saldarriaga,
Vladimir Slavnic,
Erik Spence,
Ching-Hsing Yu,
W. Richard Peltier
Abstract:
Niagara is currently the fastest supercomputer accessible to academics in Canada. It was deployed at the beginning of 2018 and has been serving the research community ever since. This homogeneous 60,000-core cluster, owned by the University of Toronto and operated by SciNet, was intended to enable large parallel jobs and has a measured performance of 3.02 petaflops, debuting at #53 in the June 201…
▽ More
Niagara is currently the fastest supercomputer accessible to academics in Canada. It was deployed at the beginning of 2018 and has been serving the research community ever since. This homogeneous 60,000-core cluster, owned by the University of Toronto and operated by SciNet, was intended to enable large parallel jobs and has a measured performance of 3.02 petaflops, debuting at #53 in the June 2018 TOP500 list. It was designed to optimize throughput of a range of scientific codes running at scale, energy efficiency, and network and storage performance and capacity. It replaced two systems that SciNet operated for over 8 years, the Tightly Coupled System (TCS) and the General Purpose Cluster (GPC). In this paper we describe the transition process from these two systems, the procurement and deployment processes, as well as the unique features that make Niagara a one-of-a-kind machine in Canada.
△ Less
Submitted 31 July, 2019;
originally announced July 2019.
-
DeepAAA: clinically applicable and generalizable detection of abdominal aortic aneurysm using deep learning
Authors:
Jen-Tang Lu,
Rupert Brooks,
Stefan Hahn,
** Chen,
Varun Buch,
Gopal Kotecha,
Katherine P. Andriole,
Brian Ghoshhajra,
Joel Pinto,
Paul Vozila,
Mark Michalski,
Neil A. Tenenholtz
Abstract:
We propose a deep learning-based technique for detection and quantification of abdominal aortic aneurysms (AAAs). The condition, which leads to more than 10,000 deaths per year in the United States, is asymptomatic, often detected incidentally, and often missed by radiologists. Our model architecture is a modified 3D U-Net combined with ellipse fitting that performs aorta segmentation and AAA dete…
▽ More
We propose a deep learning-based technique for detection and quantification of abdominal aortic aneurysms (AAAs). The condition, which leads to more than 10,000 deaths per year in the United States, is asymptomatic, often detected incidentally, and often missed by radiologists. Our model architecture is a modified 3D U-Net combined with ellipse fitting that performs aorta segmentation and AAA detection. The study uses 321 abdominal-pelvic CT examinations performed by Massachusetts General Hospital Department of Radiology for training and validation. The model is then further tested for generalizability on a separate set of 57 examinations with differing patient demographics and acquisition characteristics than the original dataset. DeepAAA achieves high performance on both sets of data (sensitivity/specificity 0.91/0.95 and 0.85 / 1.0 respectively), on contrast and non-contrast CT scans and works with image volumes with varying numbers of images. We find that DeepAAA exceeds literature-reported performance of radiologists on incidental AAA detection. It is expected that the model can serve as an effective background detector in routine CT examinations to prevent incidental AAAs from being missed.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.
-
Winning Isn't Everything: Enhancing Game Development with Intelligent Agents
Authors:
Yunqi Zhao,
Igor Borovikov,
Fernando de Mesentier Silva,
Ahmad Beirami,
Jason Rupert,
Caedmon Somers,
Jesse Harder,
John Kolen,
Jervis Pinto,
Reza Pourabolghasem,
James Pestrak,
Harold Chaput,
Mohsen Sardari,
Long Lin,
Sundeep Narravula,
Navid Aghdaie,
Kazi Zaman
Abstract:
Recently, there have been several high-profile achievements of agents learning to play games against humans and beat them. In this paper, we study the problem of training intelligent agents in service of game development. Unlike the agents built to "beat the game", our agents aim to produce human-like behavior to help with game evaluation and balancing. We discuss two fundamental metrics based on…
▽ More
Recently, there have been several high-profile achievements of agents learning to play games against humans and beat them. In this paper, we study the problem of training intelligent agents in service of game development. Unlike the agents built to "beat the game", our agents aim to produce human-like behavior to help with game evaluation and balancing. We discuss two fundamental metrics based on which we measure the human-likeness of agents, namely skill and style, which are multi-faceted concepts with practical implications outlined in this paper. We report four case studies in which the style and skill requirements inform the choice of algorithms and metrics used to train agents; ranging from A* search to state-of-the-art deep reinforcement learning. We, further, show that the learning potential of state-of-the-art deep RL models does not seamlessly transfer from the benchmark environments to target ones without heavily tuning their hyperparameters, leading to linear scaling of the engineering efforts and computational cost with the number of target domains.
△ Less
Submitted 27 April, 2020; v1 submitted 25 March, 2019;
originally announced March 2019.
-
Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures
Authors:
Tiehang Duan,
José P. Pinto,
Xiaohui Xie
Abstract:
Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (1) the clu…
▽ More
Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (1) the clustering quality still needs to be improved; (2) most models need prior knowledge on number of clusters, which is not always available; (3) there is a demand for faster computational speed. Results: We propose to tackle these challenges with Parallel Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive clustering on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability: Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package
△ Less
Submitted 25 December, 2018;
originally announced December 2018.
-
Dolphin: a task orchestration language for autonomous vehicle networks
Authors:
Keila Lima,
Eduardo R. B. Marques,
José Pinto,
João B. Sousa
Abstract:
We present Dolphin, an extensible programming language for autonomous vehicle networks. A Dolphin program expresses an orchestrated execution of tasks defined compositionally for multiple vehicles. Building upon the base case of elementary one-vehicle tasks, the built-in operators include support for composing tasks in several forms, for instance according to concurrent, sequential, or event-based…
▽ More
We present Dolphin, an extensible programming language for autonomous vehicle networks. A Dolphin program expresses an orchestrated execution of tasks defined compositionally for multiple vehicles. Building upon the base case of elementary one-vehicle tasks, the built-in operators include support for composing tasks in several forms, for instance according to concurrent, sequential, or event-based task flow. The language is implemented as a Groovy DSL, facilitating extension and integration with external software packages, in particular robotic toolkits. The paper describes the Dolphin language, its integration with an open-source toolchain for autonomous vehicles, and results from field tests using unmanned underwater vehicles (UUVs) and unmanned aerial vehicles (UAVs).
△ Less
Submitted 26 July, 2018; v1 submitted 2 March, 2018;
originally announced March 2018.
-
Modelling the Impact of Organization Structure and Whistle Blowers on Intra-Organizational Corruption Contagion
Authors:
Maziar Nekovee,
Jonathan Pinto
Abstract:
We complement the rich conceptual work on organizational corruption by quantitatively modelling the spread of corruption within organizations. We systematically vary four organizational culture-related parameters, i.e., organization structure, location of bad apple, employees propensity to become corrupted (corruption probability), and number of whistle-blowers. Our simulation studies find that in…
▽ More
We complement the rich conceptual work on organizational corruption by quantitatively modelling the spread of corruption within organizations. We systematically vary four organizational culture-related parameters, i.e., organization structure, location of bad apple, employees propensity to become corrupted (corruption probability), and number of whistle-blowers. Our simulation studies find that in organizations with flatter structures, corruption permeates the organization at a lower threshold value of corruption probability compared to those with taller structures. However, the final proportion of corrupted individuals is higher in the latter as compared to the former. Also, we find that for a 1,000 strong organization, 5 percent of the workforce is a critical threshold in terms of the number of whistle-blowers needed to constrain the spread of corruption, and if this number is around 25 percent, the corruption contagion is negligible. Implications of our results are discussed.
△ Less
Submitted 18 February, 2019; v1 submitted 4 November, 2017;
originally announced December 2017.
-
A Gossip Algorithm based Clock Synchronization Scheme for Smart Grid Applications
Authors:
Imtiaz Parvez,
Arif I. Sarwat,
Jonathan Pinto,
Zakaria Parvez,
Mohammad Aqib Khandaker
Abstract:
The uprising interest in multi-agent based networked system, and the numerous number of applications in the distributed control of the smart grid leads us to address the problem of time synchronization in the smart grid. Utility companies look for new packet based time synchronization solutions with Global Positioning System (GPS) level accuracies beyond traditional packet methods such as Network…
▽ More
The uprising interest in multi-agent based networked system, and the numerous number of applications in the distributed control of the smart grid leads us to address the problem of time synchronization in the smart grid. Utility companies look for new packet based time synchronization solutions with Global Positioning System (GPS) level accuracies beyond traditional packet methods such as Network Time Proto- col (NTP). However GPS based solutions have poor reception in indoor environments and dense urban canyons as well as GPS antenna installation might be costly. Some smart grid nodes such as Phasor Measurement Units (PMUs), fault detection, Wide Area Measurement Systems (WAMS) etc., requires synchronous accuracy as low as 1 ms. On the other hand, 1 sec accuracy is acceptable in management information domain. Acknowledging this, in this study, we introduce gossip algorithm based clock synchronization method among network entities from the decision control and communication point of view. Our method synchronizes clock within dense network with a bandwidth limited environment. Our technique has been tested in different kinds of network topologies- complete, star and random geometric network and demonstrated satisfactory performance.
△ Less
Submitted 25 July, 2017;
originally announced July 2017.
-
A Single-Assignment Translation for Annotated Programs
Authors:
Cláudio Belo Lourenço,
Maria João Frade,
Jorge Sousa Pinto
Abstract:
We present a translation of While programs annotated with loop invariants into a dynamic single-assignment language with a dedicated iterating construct. We prove that the translation is sound and complete. This is a companion report to our paper Formalizing Single-assignment Program Verification: an Adaptation-complete Approach [6].
We present a translation of While programs annotated with loop invariants into a dynamic single-assignment language with a dedicated iterating construct. We prove that the translation is sound and complete. This is a companion report to our paper Formalizing Single-assignment Program Verification: an Adaptation-complete Approach [6].
△ Less
Submitted 5 May, 2016; v1 submitted 4 January, 2016;
originally announced January 2016.
-
On Termination of Integer Linear Loops
Authors:
Joël Ouaknine,
João Sousa Pinto,
James Worrell
Abstract:
A fundamental problem in program verification concerns the termination of simple linear loops of the form x := u ; while Bx >= b do {x := Ax + a} where x is a vector of variables, u, a, and c are integer vectors, and A and B are integer matrices. Assuming the matrix A is diagonalisable, we give a decision procedure for the problem of whether, for all initial integer vectors u, such a loop terminat…
▽ More
A fundamental problem in program verification concerns the termination of simple linear loops of the form x := u ; while Bx >= b do {x := Ax + a} where x is a vector of variables, u, a, and c are integer vectors, and A and B are integer matrices. Assuming the matrix A is diagonalisable, we give a decision procedure for the problem of whether, for all initial integer vectors u, such a loop terminates. The correctness of our algorithm relies on sophisticated tools from algebraic and analytic number theory, Diophantine geometry, and real algebraic geometry. To the best of our knowledge, this is the first substantial advance on a 10-year-old open problem of Tiwari (2004) and Braverman (2006).
△ Less
Submitted 13 October, 2014; v1 submitted 7 July, 2014;
originally announced July 2014.
-
Adaptation-Based Programming in Haskell
Authors:
Tim Bauer,
Martin Erwig,
Alan Fern,
Jervis Pinto
Abstract:
We present an embedded DSL to support adaptation-based programming (ABP) in Haskell. ABP is an abstract model for defining adaptive values, called adaptives, which adapt in response to some associated feedback. We show how our design choices in Haskell motivate higher-level combinators and constructs and help us derive more complicated compositional adaptives.
We also show an important specializ…
▽ More
We present an embedded DSL to support adaptation-based programming (ABP) in Haskell. ABP is an abstract model for defining adaptive values, called adaptives, which adapt in response to some associated feedback. We show how our design choices in Haskell motivate higher-level combinators and constructs and help us derive more complicated compositional adaptives.
We also show an important specialization of ABP is in support of reinforcement learning constructs, which optimize adaptive values based on a programmer-specified objective function. This permits ABP users to easily define adaptive values that express uncertainty anywhere in their programs. Over repeated executions, these adaptive values adjust to more efficient ones and enable the user's programs to self optimize.
The design of our DSL depends significantly on the use of type classes. We will illustrate, along with presenting our DSL, how the use of type classes can support the gradual evolution of DSLs.
△ Less
Submitted 4 September, 2011;
originally announced September 2011.
-
Iterators, Recursors and Interaction Nets
Authors:
Ian Mackie,
Jorge Sousa Pinto,
Miguel Vilaca
Abstract:
We propose a method for encoding iterators (and recursion operators in general) using interaction nets (INs). There are two main applications for this: the method can be used to obtain a visual nota- tion for functional programs; and it can be used to extend the existing translations of the lambda-calculus into INs to languages with recursive types.
We propose a method for encoding iterators (and recursion operators in general) using interaction nets (INs). There are two main applications for this: the method can be used to obtain a visual nota- tion for functional programs; and it can be used to extend the existing translations of the lambda-calculus into INs to languages with recursive types.
△ Less
Submitted 17 October, 2009;
originally announced October 2009.
-
Lissom, a Source Level Proof Carrying Code Platform
Authors:
Joao Gomes,
Daniel Martins,
Simao Melo de Sousa,
Jorge Sousa Pinto
Abstract:
This paper introduces a proposal for a Proof Carrying Code (PCC) architecture called Lissom. Started as a challenge for final year Computing students, Lissom was thought as a mean to prove to a sceptic community, and in particular to students, that formal verification tools can be put to practice in a realistic environment, and be used to solve complex and concrete problems. The attractiveness o…
▽ More
This paper introduces a proposal for a Proof Carrying Code (PCC) architecture called Lissom. Started as a challenge for final year Computing students, Lissom was thought as a mean to prove to a sceptic community, and in particular to students, that formal verification tools can be put to practice in a realistic environment, and be used to solve complex and concrete problems. The attractiveness of the problems that PCC addresses has already brought students to show interest in this project.
△ Less
Submitted 15 March, 2008;
originally announced March 2008.
-
Deriving Sorting Algorithms
Authors:
José Bacelar Almeida,
Jorge Sousa Pinto
Abstract:
This paper proposes new derivations of three well-known sorting algorithms, in their functional formulation. The approach we use is based on three main ingredients: first, the algorithms are derived from a simpler algorithm, i.e. the specification is already a solution to the problem (in this sense our derivations are program transformations). Secondly, a mixture of inductive and coinductive arg…
▽ More
This paper proposes new derivations of three well-known sorting algorithms, in their functional formulation. The approach we use is based on three main ingredients: first, the algorithms are derived from a simpler algorithm, i.e. the specification is already a solution to the problem (in this sense our derivations are program transformations). Secondly, a mixture of inductive and coinductive arguments are used in a uniform, algebraic style in our reasoning. Finally, the approach uses structural invariants so as to strengthen the equational reasoning with logical arguments that cannot be captured in the algebraic framework.
△ Less
Submitted 26 February, 2008;
originally announced February 2008.
-
Line and Word Matching in Old Documents
Authors:
A. Marcolino,
Vitorino Ramos,
Mario Ramalho,
J. R. Caldas Pinto
Abstract:
This paper is concerned with the problem of establishing an index based on word matching. It is assumed that the book was digitised as better as possible and some pre-processing techniques were already applied as line orientation correction and some noise removal. However two main factor are responsible for being not possible to apply ordinary optical character recognition techniques (OCR): the…
▽ More
This paper is concerned with the problem of establishing an index based on word matching. It is assumed that the book was digitised as better as possible and some pre-processing techniques were already applied as line orientation correction and some noise removal. However two main factor are responsible for being not possible to apply ordinary optical character recognition techniques (OCR): the presence of antique fonts and the degraded state of many characters due to unrecoverable original time degradation. In this paper we make a short introduction to word segmentation that involves finding the lines that characterise a word. After we discuss different approaches for word matching and how they can be combined to obtain an ordered list for candidate words for the matching. This discussion will be illustrated by examples.
△ Less
Submitted 17 December, 2004;
originally announced December 2004.