Search | arXiv e-print repository

Never mind the metrics -- what about the uncertainty? Visualising confusion matrix metric distributions

Authors: David Lovell, Dimity Miller, Jaiden Capra, Andrew Bradley

Abstract: There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them -- resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier pe… ▽ More There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them -- resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier performance metrics by highlighting their distributions under different models of uncertainty and showing how this uncertainty can easily eclipse differences in the empirical performance of classifiers. We begin by emphasising the fundamentally discrete nature of empirical confusion matrices and show how binary matrices can be meaningfully represented in a three dimensional compositional lattice, whose cross-sections form the basis of the space of receiver operating characteristic (ROC) curves. We develop equations, animations and interactive visualisations of the contours of performance metrics within (and beyond) this ROC space, showing how some are affected by class imbalance. We provide interactive visualisations that show the discrete posterior predictive probability mass functions of true and false positive rates in ROC space, and how these relate to uncertainty in performance metrics such as Balanced Accuracy (BA) and the Matthews Correlation Coefficient (MCC). Our hope is that these insights and visualisations will raise greater awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that classification model performance claims should be tempered by this understanding. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: 60 pages, 45 figures

arXiv:2111.14366 [pdf, other]

doi 10.1080/20961790.2021.2023418

Exploring technologies to better link physical evidence and digital information for disaster victim identification

Authors: David Lovell, Kellie Vella, Diego Muñoz, Matt McKague, Margot Brereton, Peter Ellis

Abstract: Disaster victim identification (DVI) entails a protracted process of evidence collection and data matching to reconcile physical remains with victim identity. Technology is critical to DVI by enabling the linkage of physical evidence to information. However, labelling physical remains and collecting data at the scene are dominated by low-technology paper-based practices. We ask, how can technology… ▽ More Disaster victim identification (DVI) entails a protracted process of evidence collection and data matching to reconcile physical remains with victim identity. Technology is critical to DVI by enabling the linkage of physical evidence to information. However, labelling physical remains and collecting data at the scene are dominated by low-technology paper-based practices. We ask, how can technology help us tag and track the victims of disaster? Our response has two parts. First, we conducted a human-computer interaction led investigation into the systematic factors impacting DVI tagging and tracking processes. Through interviews with Australian DVI practitioners, we explored how technologies to improve linkage might fit with prevailing work practices and preferences; practical and social considerations; and existing systems and processes. Using insights from these interviews and relevant literature, we identified four critical themes: protocols and training; stress and stressors; the plurality of information capture and management systems; and practicalities and constraints. Second, we applied the themes identified in the first part of the investigation to critically review technologies that could support DVI practitioners by enhancing DVI processes that link physical evidence to information. This resulted in an overview of candidate technologies matched with consideration of their key attributes. This study recognises the importance of considering human factors that can affect technology adoption into existing practices. We provide a searchable table (Supplementary Information) that relates technologies to the key attributes relevant to DVI practice, for the reader to apply to their own context. While this research directly contributes to DVI, it also has applications to other domains in which a physical/digital linkage is required, particularly within high-stress environments. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 27 pages, 2 figures

Journal ref: Forensic Sciences Research 2022

arXiv:1709.02039 [pdf, other]

doi 10.1371/journal.pone.0094346

Capturing natural-colour 3D models of insects for species discovery

Authors: Chuong V. Nguyen, David R. Lovell, Matt Adcock, John La Salle

Abstract: Collections of biological specimens are fundamental to scientific understanding and characterization of natural diversity. This paper presents a system for liberating useful information from physical collections by bringing specimens into the digital domain so they can be more readily shared, analyzed, annotated and compared. It focuses on insects and is strongly motivated by the desire to acceler… ▽ More Collections of biological specimens are fundamental to scientific understanding and characterization of natural diversity. This paper presents a system for liberating useful information from physical collections by bringing specimens into the digital domain so they can be more readily shared, analyzed, annotated and compared. It focuses on insects and is strongly motivated by the desire to accelerate and augment current practices in insect taxonomy which predominantly use text, 2D diagrams and images to describe and characterize species. While these traditional kinds of descriptions are informative and useful, they cannot cover insect specimens "from all angles" and precious specimens are still exchanged between researchers and collections for this reason. Furthermore, insects can be complex in structure and pose many challenges to computer vision systems. We present a new prototype for a practical, cost-effective system of off-the-shelf components to acquire natural-colour 3D models of insects from around 3mm to 30mm in length. Colour images are captured from different angles and focal depths using a digital single lens reflex (DSLR) camera rig and two-axis turntable. These 2D images are processed into 3D reconstructions using software based on a visual hull algorithm. The resulting models are compact (around 10 megabytes), afford excellent optical resolution, and can be readily embedded into documents and web pages, as well as viewed on mobile devices. The system is portable, safe, relatively affordable, and complements the sort of volumetric data that can be acquired by computed tomography. This system provides a new way to augment the description and documentation of insect species holotypes, reducing the need to handle or ship specimens. It opens up new opportunities to collect data for research, education, art, entertainment, biodiversity assessment and biosecurity control. △ Less

Submitted 6 September, 2017; originally announced September 2017.

Comments: 24 pages, 17 figures, PLOS ONE journal

Journal ref: published 2014

arXiv:1709.02033 [pdf, other]

Towards high-throughput 3D insect capture for species discovery and diagnostics

Authors: Chuong Nguyen, Matt Adcock, Stuart Anderson, David Lovell, Nicole Fisher, John La Salle

Abstract: Digitisation of natural history collections not only preserves precious information about biological diversity, it also enables us to share, analyse, annotate and compare specimens to gain new insights. High-resolution, full-colour 3D capture of biological specimens yields color and geometry information complementary to other techniques (e.g., 2D capture, electron scanning and micro computed tomog… ▽ More Digitisation of natural history collections not only preserves precious information about biological diversity, it also enables us to share, analyse, annotate and compare specimens to gain new insights. High-resolution, full-colour 3D capture of biological specimens yields color and geometry information complementary to other techniques (e.g., 2D capture, electron scanning and micro computed tomography). However 3D colour capture of small specimens is slow for reasons including specimen handling, the narrow depth of field of high magnification optics, and the large number of images required to resolve complex shapes of specimens. In this paper, we outline techniques to accelerate 3D image capture, including using a desktop robotic arm to automate the insect handling process; using a calibrated pan-tilt rig to avoid attaching calibration targets to specimens; using light field cameras to capture images at an extended depth of field in one shot; and using 3D Web and mixed reality tools to facilitate the annotation, distribution and visualisation of 3D digital models. △ Less

Submitted 6 September, 2017; originally announced September 2017.

Comments: 2 pages, 1 figure, for BigDig workshop at 2017 eScience conference

arXiv:1702.08112 [pdf, other]

3D Scanning System for Automatic High-Resolution Plant Phenoty**

Authors: Chuong V Nguyen, Jurgen Fripp, David R Lovell, Robert Furbank, Peter Kuffner, Helen Daily, Xavier Sirault

Abstract: Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing structures make plants difficult for three-dimensional (3D) scanning and reconstruction -- two critical steps in automated visual phenoty**. Many current solutions such as laser scanning, structured light, and multiview stereo can struggle to acquire usable 3D models because of limitations in scanning resolution and calibrati… ▽ More Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing structures make plants difficult for three-dimensional (3D) scanning and reconstruction -- two critical steps in automated visual phenoty**. Many current solutions such as laser scanning, structured light, and multiview stereo can struggle to acquire usable 3D models because of limitations in scanning resolution and calibration accuracy. In response, we have developed a fast, low-cost, 3D scanning platform to image plants on a rotating stage with two tilting DSLR cameras centred on the plant. This uses new methods of camera calibration and background removal to achieve high-accuracy 3D reconstruction. We assessed the system's accuracy using a 3D visual hull reconstruction algorithm applied on 2 plastic models of dicotyledonous plants, 2 sorghum plants and 2 wheat plants across different sets of tilt angles. Scan times ranged from 3 minutes (to capture 72 images using 2 tilt angles), to 30 minutes (to capture 360 images using 10 tilt angles). The leaf lengths, widths, areas and perimeters of the plastic models were measured manually and compared to measurements from the scanning system: results were within 3-4% of each other. The 3D reconstructions obtained with the scanning system show excellent geometric agreement with all six plant specimens, even plants with thin leaves and fine stems. △ Less

Submitted 26 February, 2017; originally announced February 2017.

Comments: 8 papes, DICTA 2016

Journal ref: In Digital Image Computing: Techniques and Applications (DICTA), 2016 International Conference on, pp. 1-8. IEEE, 2016

arXiv:1304.2302 [pdf, other]

ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

Authors: Dan Lovell, Jonathan Malmaud, Ryan P. Adams, Vikash K. Mansinghka

Abstract: The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose… ▽ More The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous approaches, our technique does not require alteration of the model and leaves the true posterior distribution invariant. It also naturally lends itself to a distributed software implementation in terms of Map-Reduce, which we test in cluster configurations of over 50 machines and 100 cores. We present experiments exploring the parallel efficiency and convergence properties of our approach on both synthetic and real-world data, including runs on 1MM data vectors in 256 dimensions. △ Less

Submitted 8 April, 2013; originally announced April 2013.

Comments: 12 pages, 10 figures. Submitted to ICML 2013 during third submission cycle

Showing 1–6 of 6 results for author: Lovell, D