Use of Equivalent Relative Utility (ERU) to Evaluate Artificial Intelligence-Enabled Rule-Out Devices
Authors:
Kwok Lung Fan,
Yee Lam Elim Thompson,
Weijie Chen,
Craig K. Abbey,
Frank W Samuelson
Abstract:
We investigated the use of equivalent relative utility (ERU) to evaluate the effectiveness of artificial intelligence (AI)-enabled rule-out devices that use AI to identify and autonomously remove non-cancer patient images from radiologist review in screening mammography.We reviewed two performance metrics that can be used to compare the diagnostic performance between the radiologist-with-rule-out-…
▽ More
We investigated the use of equivalent relative utility (ERU) to evaluate the effectiveness of artificial intelligence (AI)-enabled rule-out devices that use AI to identify and autonomously remove non-cancer patient images from radiologist review in screening mammography.We reviewed two performance metrics that can be used to compare the diagnostic performance between the radiologist-with-rule-out-device and radiologist-without-device workflows: positive/negative predictive values (PPV/NPV) and equivalent relative utility (ERU). To demonstrate the use of the two evaluation metrics, we applied both methods to a recent US-based study that reported an improved performance of the radiologist-with-device workflow compared to the one without the device by retrospectively applying their AI algorithm to a large mammography dataset. We further applied the ERU method to a European study utilizing their reported recall rates and cancer detection rates at different thresholds of their AI algorithm to compare the potential utility among different thresholds. For the study using US data, neither the PPV/NPV nor the ERU method can conclude a significant improvement in diagnostic performance for any of the algorithm thresholds reported. For the study using European data, ERU values at lower AI thresholds are found to be higher than that at a higher threshold because more false-negative cases would be ruled-out at higher threshold, reducing the overall diagnostic performance. Both PPV/NPV and ERU methods can be used to compare the diagnostic performance between the radiologist-with-device workflow and that without. One limitation of the ERU method is the need to measure the baseline, standard-of-care relative utility (RU) value for mammography screening in the US. Once the baseline value is known, the ERU method can be applied to large US datasets without knowing the true prevalence of the dataset.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
Evaluation of wait time saving effectiveness of triage algorithms
Authors:
Yee Lam Elim Thompson,
Gary M Levine,
Weijie Chen,
Berkman Sahiner,
Qin Li,
Nicholas Petrick,
Jana G Delfino,
Miguel A Lago,
Qian Cao,
Qin Li,
Frank W Samuelson
Abstract:
In the past decade, Artificial Intelligence (AI) algorithms have made promising impacts to transform healthcare in all aspects. One application is to triage patients' radiological medical images based on the algorithm's binary outputs. Such AI-based prioritization software is known as computer-aided triage and notification (CADt). Their main benefit is to speed up radiological review of images wit…
▽ More
In the past decade, Artificial Intelligence (AI) algorithms have made promising impacts to transform healthcare in all aspects. One application is to triage patients' radiological medical images based on the algorithm's binary outputs. Such AI-based prioritization software is known as computer-aided triage and notification (CADt). Their main benefit is to speed up radiological review of images with time-sensitive findings. However, as CADt devices become more common in clinical workflows, there is still a lack of quantitative methods to evaluate a device's effectiveness in saving patients' waiting times. In this paper, we present a mathematical framework based on queueing theory to calculate the average waiting time per patient image before and after a CADt device is used. We study four workflow models with multiple radiologists (servers) and priority classes for a range of AI diagnostic performance, radiologist's reading rates, and patient image (customer) arrival rates. Due to model complexity, an approximation method known as the Recursive Dimensionality Reduction technique is applied. We define a performance metric to measure the device's time-saving effectiveness. A software tool is developed to simulate clinical workflow of image review/interpretation, to verify theoretical results, and to provide confidence intervals of the performance metric we defined. It is shown quantitatively that a triage device is more effective in a busy, short-staffed setting, which is consistent with our clinical intuition and simulation results. Although this work is motivated by the need for evaluating CADt devices, the framework we present in this paper can be applied to any algorithm that prioritizes customers based on its binary outputs.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.