\onlineid

0 \vgtccategoryResearch \vgtcinsertpkg \teaser [Uncaptioned image] The visualization interface for tooth segmentation on panoramic radiograph. (A)Model Explorer: The dataset panel displays the basic information of the unlabeled panoramic radiograph and the control panel controls the values of parameters in the model such as training times, learning rate and so on. (A1)The line chart to display the process of model optimization. (A2)The barchart to show the time of manually correction per image. (B)Radiographic Feature Explorer: (B1)The Panoramic View to show the segmentation masks on panoramic radiograph. (B2)The Glyph View to reveal the features of tooth segmentation. (C)Extracted Feature Explorer: (C1)The Scatterplot View to provide an overview of relationships among the tooth samples. (C2)The Zoomed View to display a more detailed exploration of extracted features. (C3)The Reference Sample View to illustrate the attributes of similar instances.

ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

Shenji Zhu
Hangzhou Dianzi University e-mail: [email protected] Miaoxin Hu
Hangzhou Dianzi University e-mail: [email protected] Tianya Pan
Hangzhou Dianzi University e-mail: [email protected] Yue Hong
Department of Stomatology e-mail: [email protected] First Affiliated Hospital Zhejiang University.
School of Medicine Zhejiang University Bin Li
Department of Stomatology e-mail: [email protected] Shengzhou People’s Hospital Zhiguang Zhou
Hangzhou Dianzi University e-mail: [email protected] Ting Xu
Department of Stomatology e-mail: [email protected] First Affiliated Hospital Zhejiang University

Abstract

Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a visualization framework for tooth segmentation on dental panoramic radiograph. First, we employ Mask R-CNN to conduct preliminary tooth segmentation, and a set of domain metrics are proposed to estimate the accuracy of the segmented teeth, including tooth shape, tooth position and tooth angle. Then, we represent the teeth with high-dimensional vectors and visualize their distribution in a low-dimensional space, in which experts can easily observe those teeth with specific metrics. Further, we expand the sample set with the expert-specified teeth and train the tooth segmentation model iteratively. Finally, we conduct case study and expert study to demonstrate the effectiveness and usability of our ViSTooth, in aiding experts to implement accurate tooth segmentation guided by expert knowledge.

keywords:

Tooth segmentation, panoramic radiograph, visualization, visual analytics, human computer collaboration.

Introduction

Panoramic radiograph is a widely-used imaging modality for dental examination in stomatology, which provides a visual representation of all teeth within the dental cavity, and helps doctors to examine the pathological conditions, such as dental calculus, dental malformations and caries[1]. Tooth segmentation is a pivotal step for computer aided diagnosis of tooth-related disorders[2]. However, manual annotation is a laborious and time-consuming task, especially when there are overlap** shadows or low contrast.

In recent years, numerous machine learning models have been proposed for tooth segmentation on dental panoramic radiograph[3, 4, 5], encompassing both unsupervised and supervised methods. Unsupervised methods include threshold-based segmentation[6, 7], edge detection[8, 9], and graph theory[10], while supervised methods rely on labeled data for training[11]. Deep learning-based approaches, such as U-Net[12, 13], Faster R-CNN[14, 15] and PANet[16], fall under the category of supervised methods. However, due to the substantial variations in tooth shape and types, such a strategy is still unable to fundamentally solve the problem of accuracy and robustness of automatic segmentation algorithms, which brings uncertainty to the subsequent diagnosis[17, 11, 18].

Extensive discussions with professional dental experts and computer experts have led to the consensus that traditional AI segmentation methods exhibit significant limitations when applied to tooth segmentation on panoramic radiograph. Two primary questions have been identified. Q1. The model has a primary focus on pixel features but lacks essential dental expertise like tooth shape and angle, hindering its contextual understanding and ability to discern intricate details and nuances specific to dental conditions. This limitation becomes particularly evident when the training sample is unable to cover the full spectrum of dental conditions, causing the model to struggle in accurately segmenting complex or special teeth. Q2. In the process of segmentation, complete dependence on automated algorithms may lead to suboptimal tooth segmentation results in certain cases, as the model lacks the ability to dynamically adapt and refine its segmentation outputs in response to the nuances of individual cases.

As such, we develop a human-machine collaboration framework, VisTooth(Figure ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph), comprehensively considering tooth features and introducing expertise in the segmentation process to optimize the model outcomes. Firstly, we use the fine-adjusted Mask R-CNN model[19] to achieve preliminary tooth segmentation, which focuses on the features involving dental expertise. A glyph representation is generated to visualize these features(Q1). For cases such as structural abnormalities or blurred tooth contours, human expert judgment and intervention are needed to improve segmentation quality[20]. To address this, we develop an interactive tool to allow experts to correct the outcomes of the initial segmentation. Further, different levels of detail information views are incorporated to assist experts in screening out high-quality segmentation , and expert-specified teeth will be piped into the model for interactive optimization (Q2). Finally, in order to demonstrate the effectiveness and usability of our VisTooth in addressing the issue of tooth segmentation, we conduct case study and expert study.

The major contributions of this paper are listed as follows:

$\bullet$

A set of feature metrics are proposed to assess the segmentation according to dental expertise.
$\bullet$

A novel visual analytics system is implemented to summarize and compare the tooth segmentation with different levels of details.
$\bullet$

A new human-machine collaboration workflow leveraging advanced machine learning algorithms and human expertise is implemented to guarantee the accuracy and efficiency of tooth segmentation.

1 RELATED WORK

1.1 Tooth Segmentation

Tooth segmentation on panoramic radiograph is a critical task, addressed through two primary methodologies: unsupervised and supervised approaches. In the unsupervised category, various strategies have been developed. Modi et al.[21] proposed a region-based method to identify regions of interest for gap valley and tooth isolation using binary edge intensity integral curves. Indraswari et al.[6] employed a three-step process involving directional image formation using DDFBT, enhancement for edge reinforcement and noise removal, and MAT with Sauvola Local Thresholding for segmentation. Alsmad et al.[22] utilized a cluster-based approach, while Hasan et al.[23] focused on jaw segmentation using gradient information in a four-step method comprising k-means clustering, point detection around the jaw, gradient vector flow snakes, and shape correction for the segmented area. Li et al.[24] introduced a new watershed algorithm based on mathematical morphology, specifically tailored for dental X-ray image segmentation. Fariza et al.[25] employed a method to extract different dental structures using conditional spatial fuzzy C-means clustering.

In contrast, supervised methods leverage deep learning models trained on annotated data to improve segmentation accuracy and stability. Jader et al. [26] are credited for being the pioneers who detected and segmented each tooth on panoramic radiographs. Almalki et al.[27] applied two self-supervised learning methods to Swin Transformer on dental panoramic radiographs: SimMIM and UM-MAE. Zhang et al.[28] proposed a novel method that using label tree with cascade network structure combining several key strategies for teeth recognition, which can deal with many complex cases. Helli et al.[29] employed a two-step method where they employed a U-Net to create prediction followed by post-processing operations to achieve segmentation. Leite et al.[18] proposed a CNN-based solution for determining tooth contours using semantic segmentation, further refined by a Fully Convolutional Network (FCN). These methods were evaluated using metrics such as average Intersection over Union (IoU) and Hausdorff distance, compared against manual annotations and medical software. Tuzoff et al.[30] applied the Faster R-CNN object detection model to generate tooth borders, enhancing the output through integration with the VGG16 classification network and heuristic rules of dentition arrangement. Additionally, Mask R-CNN[19], a deep learning-based method, offers simultaneous object detection and segmentation, incorporating ROI Align for improved accuracy. Despite the higher precision of supervised methods, challenges persist in scenarios of insufficient or inaccurately annotated training data, underscoring the ongoing need for accurate segmentation on panoramic radiograph[31].

In this paper, we select appropriate neural network and incorporate a consideration for dental expertise when using Mask R-CNN for tooth segmentation. Further, we prpose a novel human-computer interaction system that allows experts to interactively refine segmentation outputs, thereby enhancing the accuracy and reliability of the final results.

1.2 Visualization for Artificial Intelligence

In the domain of artificial intelligence, the burgeoning complexity of models necessitates advanced methods for elucidation of their inner workings. Visualization tools play an instrumental role in this context, aiding in the comprehension of training data, model architecture, and output[32, 33]. A notable contribution in this field is the OoDAnalyzer[34] by Chen et al., which presents an interactive visual method for the identification and explanation of Out-of-Distribution (OoD) samples. Kandel et al.[35] proposed Profiler, a tool designed to assess quality issues in tabular data. Anomaly detection methods are employed to detect and categorized data anomalies. And visual summaries aids evaluation of potential anomalies and their causes. Liu et al.[36] developed a visual analytics approach using time series data to represent training dynamics of Deep Generative Models (DGMs). It includes a novel blue noise line sampling scheme and a credit assignment algorithm for improved understanding and diagnosis of DGM training processes. Cao et al.[37] presented a visual analysis tool AEVis to explain why adversarial examples are misclassified. The contribution analysis and rich interactions further enable users to trace the root cause of the misclassification of adversarial examples. Wang et al.[38] presented CNN EXPLAINER, an interactive visualization tool designed for non-experts to learn and examine convolutional neural networks. Through smooth transitions across levels of abstraction, users can inspect the interplay between operations and outcomes. Mahendran et al.[39] introduced the Deep Visualization Toolbox (DeepVis) to visualize and interpret CNN features by synthesizing input images that maximally activate specific neurons. Selvaraju et al.[40] introduced Grad-CAM, which has since been widely adopted for interpreting CNN-based models in various domains, including medical imaging and natural language processing. Chen et al.[41] introduced Uni-Evaluator, an open-source visual analytic tool for model evaluation tasks like target detection. It represents predictions as probability distributions across tasks, using matrices, tables, and grids for comprehensive evaluation from a global to sample level. Humans can also monitor the learning process and evaluate the effectiveness of AI models at any time through visualization[32]. Ahn et al.[42] proposed a visual analytic system FairSight to capture both the global and instancelevel fairness with evidence of potential unfair outcomes.

Collectively, these developments underscore the pivotal role of visual analytics in the interpretation, evaluation, and refinement of complex AI models within the scientific community. In contrast, we apply interactive visual analytics to the detection and correction of mask errors in the process of automatic segmentation, aiming to facilitate the high-quality of outputs.

2 TASK ANALYSIS AND SYSTEM OVERVIEW

In this section, we provide a summary of analysis tasks(T1-T4) identified through interviews with domain experts and subsequently present the pipeline of the proposed visual analysis system.

2.1 Task Analysis

Our system was developed through a collaborative effort involving experts in dental examination (E1 and E2) and an expert in graphics and visualization (E3). E1 and E2 are highly experienced oral and maxillofacial radiologists each possessing over 5 years of extensive expertise. E3 is a seasoned professor specializing in data visual analysis. In the early stages of our collaboration, weekly meetings were conducted with these three experts to seek opportunities to optimize the process of tooth segmentation through literature review. According to experts, the diversity of teeth presents significant challenges to current AutoML approaches. To ensure the accuracy of tooth segmentation, further expert judgment and correction are deemed necessary. Consequently, we delved into the design requirements of a human-machine collaborate system. From these discussions, we derived four key analytical tasks, summarized as follows:

T1. Integration of dental expertise into the segmentation process. General segmentation method only considers the pixel features[43]. However, dental expertise like the regularity in the physiological structure and arrangement characteristics of teeth can provide valuable information for segmentation. Hence, the employed feature extraction network of model should be adept at identifying the intricate structures on panoramic images[31]. And the workflow should also incorporate a consideration for arrangement features when determining tooth labels.

T2. Assessment of Automatic Tooth Segmentation Accuracy. Once obtaining preliminary automated outputs, the subsequent step is to exam and modify potentially incorrect segmentation masks. In order to facilitate this correction process, it is necessary to propose quantifiable metrics to assess the accuracy of automatic tooth segmentation results. Additionally, a clear visual cue should also be displayed to guide the experts to review and manually correct.

T3. Model Optimization through Valuable Instance Sampling. The diversity in different types of teeth presents a challenge for the machine learning model. However, manual corrections capture expert expertise on the accurate delineation of tooth boundaries which can serve as high-quality labeled data to retrain the ML-model. Thus, it becomes necessary to incorporate valuable expert corrections as a complement to the training set to optimize the model especially when the initial sample set is hard to encompass all possible variations.

T4. Development of an Interactive Tooth Segmentation Tool. To ensure the effectiveness of subsequent work, it is deemed crucial to develop an interactive tool that can implement accurate tooth segmentation and with continuous iterative optimization. To the best of our knowledge, our work is the first attempt to provide a combination of man-machine tooth segmentation tool.

2.2 System Overview

Motivated by the identified tasks, we propose a visualization framework enabling experts to efficiently achieve high-quality tooth segmentation on panoramic radiograph. The system pipeline is depicted in Figure 1. Initially, the Mask R-CNN model is trained with a certain amount of manual labeled data, categorizing teeth into five classes: incisor, canine, 1st, 2nd, and 3rd molar. To estimate the outputs of the model, we propose several quantifiable metrics including tooth shape, tooth position and tooth angle. Concurrently, we devise a glyph-based visualization scheme to represent these information, thereby offering experts a comprehensive set of evaluation criteria(T1). We develop a scatterplot view to provide an overview of relationships among the tooth samples, so that the possible inaccurate results can be identified by the abnormal distribution(T2). We provide experts with visual interface to show the initial segmentation of the model and interactive tools for error correction. Then the corrected high-quality tooth samples, selected by experts, are fed back into the model for adaptive iterative optimization(T3). Ultimately, a human-machine collaborative visual tool is developed for the segmentation of teeth(T4).

Refer to caption — Figure 1: The pipeline of VisTooth for tooth segmentation on panoramic radiograph.

3 VISTOOTH

We propose a visualization framework, ViSTooth, that integrates automatic technologies and interactive visualization to support human-machine collaboration for accurate tooth segmentation. This section introduces four key components: data labeling, tooth segmentation model, visualization design and model optimization.

3.1 Data Labeling

The panoramic radiographs used in this study were selected from a patient image database at the hospital. The patients gave their informed consent before any panoramic radiographs were taken, and their privacy was protected when using the data for medical research. The dataset comprises 521 panoramic radiographs. We selected 300 images for experts to mark ground truth segmentation labels randomly, while the remaining 221 images were used as a test set. This process was under a supervision of two dentists(E1 and E2) using a tagging tool developed with the Python programming language. We attended weekly meetings where related issues were discussed and the labels were reviewed to assure quality. In the end, the 300 labeled images with ground truth segmentation labels was divided into a training set(240 images) and a validation set(60images). The study was approved by the Ethics Committee of The First Affiliated Hospital, Zhejiang University School of Medicine. (approval no. 20230785)

3.2 Tooth Segmentation Model

In this paper, we emloy the Mask R-CNN model for teeth segmentation on panoramic radiograph. Mask R-CNN is a two-stage instance segmentation framework, as depicted in Figure 2. Specifically, the first stage proposes candidate tooth bounding boxes regardless of categories. Fistly, the panoramic radiograph is fed into the backbone to extract features. Then the features compose a pyramid network (FPN) to generate candidate regions with the potential to contain tooth structures. Since Mask R-CNN is a flexible framework, we tried to change the feature extraction network in backbone to make the model more suitable for panoramic segmentation tasks, including ResNet networks with 50, 101 and 152 layers[44] and VGG16 network[45]. As shown in Table 1, We find that ResNet50 is the optimal choice for panoramic radiograph due to its fewer layers, which can refrain from overfitting, and its overall IoU score reaches 75.14%.

Table 1: Comparison of evaluation metrics with different backbones.

Model	Backbone	IoU(%)	Precision(%)	Recall(%)	F1-score(%)
Mask R-CNN	ResNet-50	75.1	75.7	83.5	79.4
Mask R-CNN	ResNet-101	65.3	65.9	73.7	69.6
Mask R-CNN	ResNet-152	53.4	53.9	58.1	55.9
Mask R-CNN	VGG16	71.2	71.8	81.1	76.2

The second stage is termed as the R-CNN stage, which extracts features using RoIAlign[19] for each proposal and performs proposal classification, bounding box regression and mask predicting. This involves corresponding each pixel on the original panoramic radiograph with the feature map and matching it with preset fixed features. Subsequently, the model conducts multi-classification on these candidate regions, generating masks to complete the segmentation task. During the training stage, we classified sample teeth into five categories: incisors, cuspids, 1st and 2nd molars, and 3rd molar. In the process of classification, we guide the model to not only consider the image features of the segmented targets but also introduce heuristic rules based on the order of tooth arrangement. When image features are blurred and difficult to discern, priority is given to the segmentation category determined by the arrangement order.

3.3 Visualization for Tooth Segmentation

Due to the above AutoML segmentaion approach not always being accurate, in this section, we design the visualization interface to present the segmentation results from the model and to support more detailed feature exploration. Figure ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph, displays the visual interface of our system, which comprises a control panel and five maim views.

3.3.1 Segmentation Explorer Component

The radiographic feature exploration component contains two sub-views: a panoramic view and a glyph view.

As shown in Figure ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph(B1), the panoramic view visualizes the tooth segmentation outcomes generated by the system. It facilitates a direct comparative analysis for experts to assess the congruence between the segmented contours and ground truth. Experts can adjust the initial segmentation mask by dragging contour points, ensuring a closer alignment with the actual targets.

The glyph view represents the detailed features of segmentation outputs. In this paper, we propose three essential metrics of tooth segmentation including shape, coordinates and center-line angle. Subsequently, we employ a visual prompting approach to guide experts in making more nuanced judgments and corrections to these results.

Firstly, we use the HU moment[46, 47] as the shape feature of the segmentation mask to characterize individual teeth, which is calculated as follows:

m_{p,q}=\sum_{x}\sum_{y}x^{p}y^{q}f(x,y)\quad p,q=0,1,2……

(1)

where f(x,y) is the pixel intensity value at the (x ,y)-coordinate.

Given the symmetrical arrangement of tooth sequences, the positional attribute is defined as the two-dimensional coordinates of the segmentation mask’s center point subtracted by the absolute values of the coordinates of the overall panoramic radiograph’s center point. And the centerline angle of the segmentation mask is determined by calculating the angle between the midline of the mask and the vertical direction.

We design the glyph to visualize the multi-dimensional features of the segmentation mask(Figure 3(B)). The values of HU moments are encoded with a radial bar chart(Figure 3(B-b)). Within the glyph, we use the metaphor of a dashboard to encode the tooth’s two-dimensional coordinates(Figure 3(B-c,B-d)) and centerline angle(Figure 3(B-e)). To visually demonstrate the differences of features between the segmentation results and conventional training samples, we calculate the average value for each feature. Then we encode features close to the average value in gray, features significantly above the average value in blue, and features significantly below the average value in red(Figure 3(B-a)). The dental legend at the center of the glyph is populated with distinct colors according to the identified categories, facilitating a clearer observation of the tooth categorization(Figure 3(B-f)). Experts can effortlessly modify the assigned category labels by clicking on the dental legend.

3.3.2 Feature Explorer Component

For the reason that automatic segmentation algorithms rely on the matching between prior features and image characteristics, satisfactory segmentation results may not be achieved when there is prominent variation. In this section, we employ dimensionality reduction and map** to obtain the standard range and distribution of multi-dimensional features for each category of teeth. By contrasting newly generated segmentation masks with the sample set distribution, we identify segmentation results that deviate from conventional patterns as which has a high probability of error.

Each point in the scatterplot view represents a tooth sample, with distinct colors indicating different categories. The manually annotated training and test sets are represented by points with higher transparency, while newly loaded tooth samples are differentiated by larger radii and lower transparency. To lay out the points in the scatterplot with respect to the feature similarities of the samples, we firstly employ the HU moments matrix to extract shape features from individual tooth slices, incorporating positional information within the original panoramic radiograph and centerline angular of the tooth to formulate a set of high-dimensional feature vectors. Then, we employ LDA[48] to project the vectors into a two-dimensional plane, generating a scatterplot, such that the samples share the similar features are closer. In general use, train samples are shown as solid circles, new loaded samples are shown as circles with black outlines and expert-specified samples are shown as crosses.

The similarity view(Figure 3(D)) is designed to show historical labeled data with high similarity, providing an essential reference for whether masks are successfully identified or not. When the expert clicks on a tooth in the scatter plot, we calculate the historical labeled data adjacent to its projection position. Subsequently, the panorama slice map and the glyph will be presented in pairs in the similarity list and arranged in order of distance.

3.4 Model Optimization

The process begins with loading panoramic radiograph data for tooth segmentation. Firstly, the model output are projected in the scatterplot view, enabling experts to quickly discover the abnormal segmentation masks. The zoomed view and the reference view show different levels of detail, hel** experts to do precise corrections manually. These expert corrections, functioning as high-quality labeled data, capture expert input on the accurate delineation of tooth boundaries. Once the necessary corrections are made, the projection view will update to show the new overview of the corrected results. Then the expert has the ability to choose several high-quality tooth samples and click ‘train’ in the control panel to feed the corrected high-quality labeled data back into the segmentation model. This step helps the model learn from the corrected data and improve its performance over time. The evaluation view provides a graphical representation of the optimization process, offering an intuitive insight into the model’s performance throughout training. And the feedback loop continues as experts repeatedly load data, correct model outputs, and contribute to the ongoing refinement of the segmentation model.

4 SYSTEM Interface

We develop a set of interactions to integrate intelligent model and expert knowledge into the process of tooth segmentation. Initially, expert can gain automatic tooth segmentation by loading the panoramic radiograph in the control panel. Scatterplot view provides a compact overview for the standard range and distribution of multi-dimensional features for each tooth category. For more detailed features, expert can observe the glyph in zoomed view and similar samples in similarity view by clicking the corresponding scatter. By comparing the dissimilarity, experts can assess the consistency of feature distribution in the segmentation results with real structures. When the results of automatic segmentation deviate significantly from the normal range, further expert judgment and correction are necessary. Experts can make corrections to the delineation of tooth boundaries by clicking the anchor points on tooth. When the corrected segmentation is satisfactory, the scatterplot view will update to show the corrected results overview. Then experts have the ability to select high-quality labeled data which is considered expert feedback and contributes to improving the accuracy of the segmentation results. This iterative feedback loop helps improve the model’s performance over time as it adapts to the corrections made by experts.

5 EVALUATION

We conducted two case studies and an expert study to demonstrate the effectiveness and usability of ViSTooth in tooth segmentation.

5.1 Case Study

5.1.1 Case 1. Interactive Correction Insights

We invited E1 to utilize VisTooth for detailed human-machine collaborative segmentation of teeth on 10 panoramic X-ray images, and asked him to follow the system’s visual cues during the process. In the process of the segmentation task, all corrections and feedback were recorded. Initially, the segmentation model achieved an accuracy of 74.91%, which was unsatisfactory. Thus E1 would like to use the system to inspect and refine the model outputs. Immediately, E1 identified some abnormal outliers from the scatter plot view(Figure 4). He first clicked on an outlier to locate the tooth represented by it, concurrently the similarity view was updated to display detailed glyph and show samples similar in height to the selected sample. Upon observation, E1 found that one case of the outlier might be attributed to incomplete segmentation, leading to a significant separation between the mask and the regular distribution of that category. Figure 4(A) illustrates how E1 corrected examples of teeth S1 and S2 by examining the morphology in the similarity view. Typically, the second molars have two roots, but due to the proximity of the pixel values between the root and the gingival tissue in the S1 and S2 regions, the model struggles to accurately differentiate tooth structure from other tissues. And the glyph in similarity view suggests that despite its mask features deviating from the second molar and resembling the first molar, its coordinate and angular features closely match those of the 2nd molars as predicted by the model. Subsequently, E1 attempted to adjust the contrast of the panoramic radiograph using the toolbar to enhance the differentiation between the target teeth and other structures, then manually adjusted the contour points to restore precise positioning. Furthermore, E1 also found that some cases characterized by individual differences could lead to outliers, as depicted in Figure 4(B). Here, the patient exhibited incomplete tooth structures, significantly deviating from the training samples. Such situations bears the potential for erroneous segmentation, requiring manual assessment. E1 highly praised the glyph design, ”Utilizing feature indicators as visual cues to help us detect segmentation anomalies for further manual correction is beneficial in the absence of ground truth for the newly loaded panoramic image.”

5.1.2 Case 2. Iterative Retraining Optimization

In the second case, we introduced more panoramic images. E2 was invited to perform batch panoramic segmentation and select samples for feedback to the model for retraining. Figure 5 shows the projection changes after manual correction by experts. The results indicate that cluster A exhibited a mixed distribution pattern during initial segmentation, and even after manual correction, clear differentiation was not achieved. E2 explained to us that the distinction between individual teeth is not particularly clear during actual reading, and there may be confusion between the cuspid and 1st molar labels (yellow and red) for the model. Therefore, E2 marked the mixed regions between these two patterns and added them to the training samples in the hope of strengthening the model’s learning. Cluster B, on the other hand, consistently differentiated into five major distribution patterns. Figure 5 illustrates examples of S3 segmentation verification through Reference View examination. From this, we can see that S3 has a double-root structure similar to the reference view, but the significant crown loss deviates its shape features from the normal cluster. As the number of annotations increased, we observed the gradual aggregation of similar residual tooth clusters along the edge of S3. This feature is distinct from the training sample set and, therefore, E2 was eager to label it as a new sample to improve the model’s recognition rate for residual teeth during segmentation. During the labeling process, E2 commented, ”Using labeled samples to further enhance the model is very innovative. The improvement in initial segmentation accuracy means we can reduce manual correction.” He also praised the visual attractiveness design of the projection view, noting that this distribution view effectively conveys the distribution pattern of segmentation masks and facilitates batch sample selection. Figure 5 shows the evaluation results of three retraining sessions, allowing experts to add 100 teeth slices with correction labels to the training set each time. The line graph illustrates the change in segmentation results before and after each retraining, demonstrating the effectiveness of our system in high-quality tooth segmentation. Initially, the IoU score was 75.14%. After three rounds of training, significant improvement was observed, with the IoU score reaching 80.11%.

5.2 Expert Study

ViSTooth was designed to be an expressive and task efficient tool. To further evaluate the effectiveness of our system, we conducted an expert study involving 2 experts in dental examination and 10 graduate students (5 males and 5 females) majoring in Medicine. They were all trained to use our system until they were familiar with the workflow and proficient in utilizing the system. Thereafter, they were tasked with the segmentation of 60 panoramic radiographs. During the process, we recorded their comments and the interactions. Further, we formulated a set of questions, which are closely related to the analytical tasks outlined in Section 3. The questionnaire is displayed in Table 2, and participants’ responses can be observed in Figure 6. Here are some key findings from the analysis:

System Performance. The majority of participants expressed satisfaction with the accuracy and speed of the preliminary segmentation performed by the AutoML model. E1 commented,”The proposed model can effectively support preliminary segmentation, which alleviates laborious and time-consuming manual detection.” Statistical analysis showed that 75% of participants rated the accuracy as satisfactory, while 83% were satisfied with the speed. Analysis of the collected metrics indicated that they effectively reflected the quality of the segmentation results, with 80% of participants agreeing with this statement.

Visual Design. Over 90% of participants found the interface design to be intuitive and easy to understand, highlighting the effectiveness of the visual design in facilitating user interaction. An overwhelming majority (over 95%) of participants agreed that the color choices and graphical elements in the system contributed to detecting segmentation anomalies, underscoring the importance of visual cues in the analysis process. E2 remarked, “The visual design of ViSTooth greatly facilitates the interpretation of segmentation results, making it easier to identify abnormalities.”

Interactivity. The interactive features designed for digging deeper and gaining more insights into the segmentation results were well-received, with 83% of participants expressing satisfaction with this aspect. Similarly, the interactive feature design for adjusting segmentation results garnered positive feedback, with 75% of participants reporting satisfaction. One graduate student noted, “The interactive features provide flexibility and control, allowing for fine-tuning of segmentation results according to individual preferences.”

Overall Satisfaction. A significant portion of participants found ViSTooth to be easy to use, indicating high overall satisfaction with the system’s usability. Impressively, 75% of participants expressed willingness to continue using ViSTooth in their future clinical practice, reflecting a strong endorsement of the system’s utility and effectiveness. However, a minority of participants expressed concerns about mastering ViSTooth’s advanced features, suggesting the need for additional training resources or user guides.

These statistical findings provide robust evidence supporting the positive reception of ViSTooth among users, affirming its effectiveness as an expressive and task-efficient tool for panoramic radiograph segmentation.

Table 2: The questionnaire consists of four parts: the system performance (Q1-3), the visual design (Q4-6), the interactivity (Q7-8), and the overall satisfaction (Q9-10).

Q1	I am satisfied with the accuracy of preliminary tooth segmentation by the AutoML model.
Q2	I am satisfied with the speed of tooth segmentation by the AutoML model.
Q3	The metrics proposed can reflect the quality of the tooth segmentation results.
Q4	The interface design is intuitive and easy to understand.
Q5	The color choices and graphical elements in the system contribute to detecting segmentation anomalies.
Q6	The layout of the system’s interface contributes to my ease of understanding and using its features.
Q7	I am satisfied with the interactive feature design for digging deeper and gaining more insights into the tooth segmentation results.
Q8	I am satisfied with the provided tools and controls for adjusting the tooth segmentation results.
Q9	ViSTooth is easy to use.
Q10	I am willing to continue using this system in clinical practice.

6 Discussion

Model performance. Automation of tooth segmentation is considered the first and foundational step in the development of AI systems for adjuvant therapy in dentistry. Therefore, this first step should be as accurate as possible. We focus on the revolutionary impact of Large Language Models (LLMs), such as ChatGPT[49], SAM[50], has permeated various industries. We believe that the advanced language understanding, contextual interpretation and more nuanced feature recognition abilities of LLMs can enhance the segmentation process.

Feature indicators. Automatic evaluation is crucial in efficiently guiding experts to improve segmentation quality. Starting from the common characteristics of teeth, this paper extracts tooth angles, positions, and shapes to screen out results with higher error probabilities. However, personalized differences among teeth, such as the proximity between adjacent teeth, treatment marks, and developmental stages, can affect this assessment. Therefore, in future work, we plan to explore more extensively how to utilize richer features to characterize the quality of segmentation results, such as internal density distribution, texture features, and edge features of teeth.

Automated Diagnosis Tooth segmentation is the most widely used processing technique to analyze panoramic radiographs. With precisely segmented tooth structures, further applications can be developed in computer-aided dental diseases, such as diagnosis, tooth alignment assessment, orthodontic optimization, etc. This work forms the basis of our further developments of AI-driven tools for precise and automated diagnosis of various dental diseases[51, 52, 53]. By leveraging these developments, we hope to foster efficiency and accuracy in dental healthcare delivery.

7 CONCLUSION

In this paper, we present ViSTooth for accurate tooth segmentation through human-machine collaboration. Based on domain expertise, the model in ViSTooth automatically preliminary tooth segmentation. Then the visual interface provides various supporting information to help experts to learn the segmentation results and detect anomalies. Rich human computer interactions are integrated to enable higher quality corrected data and iterative optimization of the segmentation model. Two case studies and an expert study highlight the effectiveness of our tool in streamlining the tooth segmentation process and minimizing the manual effort required for accurate results. In the future work, we hope to improve the performance of automatic segmentation to further reduce the effort of manual correction, leverage richer features for automatic evaluation, as well as integrate tooth segmentation into disease diagnosis and treatment applications.

References

[1] Vanessa Machado, Luís Proença, Mariana Morgado, José João Mendes, and João Botelho. Accuracy of panoramic radiograph for diagnosing periodontitis comparing to clinical examination. Journal of Clinical Medicine, 9(7), 2020.
[2] Shaofeng Wang, Shuang Liang, Qiao Chang, Li Zhang, Beiwen Gong, Yuxing Bai, Feifei Zuo, Yajie Wang, Xianju Xie, and Yu Gu. Stsn-net: Simultaneous tooth segmentation and numbering method in crowded environments with deep learning. Diagnostics, 14(5), 2024.
[3] Jie Yang, Yuchen Xie, Lin Liu, Bin Xia, Zhanqiang Cao, and Chuanbin Guo. Automated dental image analysis by deep learning on small dataset. pages 492–497, 2018.
[4] A comprehensive review of recent advances in artificial intelligence for dentistry e-health. 2023.
[5] Develo** deep learning methods for classification of teeth in dental panoramic radiography. 2023.
[6] Rarasmaya Indraswari, Agus Zainal Arifin, Dini Adni Navastara, and Naser Jawas. Teeth segmentation on dental panoramic radiographs using decimation-free directional filter bank thresholding and multistage adaptive thresholding. In 2015 International Conference on Information & Communication Technology and Systems (ICTS), pages 49–54, 2015.
[7] Muhamad Rizal Mohamed razali, Nazatul Sabariah Ahmad, Zulkifly Mohd Zaki, and Waidah Ismail. Region of adaptive threshold segmentation between mean, median and otsu threshold for dental age assessment. pages 353–356, 2014.
[8] Muhamad Rizal Mohamed Razali, Nazatul Sabariah Ahmad, Rozita Hassan, Zulkifly Mohd Zaki, and Waidah Ismail. Sobel and canny edges segmentations for the dental age assessment. In 2014 International Conference on Computer Assisted System in Health, pages 62–66. IEEE, 2014.
[9] N Senthilkumaran. Fuzzy logic approach to edge detection for dental x-ray image segmentation. International Journal of Computer Science and Information Technologies, 3(5):5236–5238, 2012.
[10] Pengcheng Li, Yang Liu, Zhiming Cui, Feng Yang, Yue Zhao, Chunfeng Lian, and Chenqiang Gao. Semantic graph attention with explicit anatomical association modeling for tooth segmentation from cbct images. IEEE Transactions on Medical Imaging, 41(11):3116–3127, 2022.
[11] Gil Silva, Luciano Oliveira, and Matheus Pithon. Automatic segmenting teeth in x-ray images: Trends, a novel data set, benchmarking and future perspectives. Expert Systems with Applications, 107:15–31, 2018.
[12] Senbao Hou, Tao Zhou, Yuncan Liu, Pei Dang, Huiling Lu, and Hongbin Shi. Teeth u-net: A segmentation model of dental panoramic x-ray images for context semantics and contrast enhancement. Computers in Biology and Medicine, 152:106296, 2023.
[13] Thorbjørn Louring Koch, Mathias Perslev, Christian Igel, and Sami Sebastian Brandt. Accurate segmentation of dental panoramic radiographs with u-nets. pages 15–19, 2019.
[14] Hu Chen, Kailai Zhang, Peijun Lyu, Hong Li, Ludan Zhang, Ji Wu, and Chin-Hui Lee. A deep learning approach to automatic teeth detection and numbering based on object detection in dental periapical films. Scientific reports, 9(1):3840, 2019.
[15] Changgyun Kim, Donghyun Kim, HoGul Jeong, Suk-Ja Yoon, and Sekyoung Youm. Automatic tooth detection and numbering using a combination of a cnn and heuristic algorithm. Applied Sciences, 10(16):5624, 2020.
[16] Bernardo Silva, Laís Pinheiro, Luciano Oliveira, and Matheus Pithon. A study on tooth segmentation and numbering using end-to-end deep neural networks. pages 164–171, 2020.
[17] Krois J. Artificial Schwendicke F, Samek W. Intelligence in dentistry: Chances and challenges. Journal of Dental Research, pages 769–774, 2020.
[18] André Ferreira Leite, Adriaan Van Gerven, Holger Willems, Thomas Beznik, Pierre Lahoud, Hugo Gaêta-Araujo, Myrthel Vranckx, and Reinhilde Jacobs. Artificial intelligence-driven novel tool for tooth detection and segmentation on panoramic radiographs. Clinical oral investigations, 25:2257–2267, 2021.
[19] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. pages 2980–2988, 2017.
[20] Inkyu Shin, Dong-** Kim, Jae-Won Cho, Sanghyun Woo, KwanYong Park, and In So Kweon. Labor: Labeling only if required for domain adaptive semantic segmentation. CoRR, abs/2108.05570, 2021.
[21] Chintan K. Modi and Nirav P. Desai. A simple and novel algorithm for automatic selection of roi for dental radiograph segmentation. In 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE), pages 000504–000507, 2011.
[22] Mutasem K Alsmadi. A hybrid fuzzy c-means and neutrosophic for jaw lesions segmentation. Ain Shams Engineering Journal, 9(4):697–706, 2018.
[23] Mosaddik Hasan, Waidah Binti Ismail, Rozita Hassan, and Atsuo Yoshitaka. Automatic segmentation of jaw from panoramic dental x-ray images using gvf snakes. 2016 World Automation Congress (WAC), pages 1–6, 2016.
[24] Hui Li, Guoxia Sun, Huiqiang Sun, and W. Liu. Watershed algorithm based on morphology for dental x-ray images segmentation. 2012 IEEE 11th International Conference on Signal Processing, 2:877–880, 2012.
[25] Arna Fariza, Agus Zainal Arifin, Eha Renwi Astuti, and Takio Kurita. Segmenting tooth components in dental x-ray images using gaussian kernel- based conditional spatial fuzzy c-means clustering algorithm. International Journal of Intelligent Engineering and Systems, 2019.
[26] Gil Jader, Jefferson Fontineli, Marco Ruiz, Kalyf Abdalla, Matheus Pithon, and Luciano Oliveira. Deep instance segmentation of teeth in panoramic x-ray images. pages 400–407, 2018.
[27] A. Almalki and L. Latecki. Self-supervised learning with masked image modeling for teeth numbering, detection of dental restorations, and instance segmentation in dental panoramic radiographs. pages 5583–5592, jan 2023.
[28] Kailai Zhang, Ji Wu, Hu Chen, and Peijun Lyu. An effective teeth recognition method using label tree with cascade network structure. Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society, 68:61–70, 2018.
[29] Serdar Helli and Andaç Hamamcı. Tooth instance segmentation on panoramic dental radiographs using u-nets and morphological processing. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 10(1):39–50, 2022.
[30] Dmitry V Tuzoff, Lyudmila N Tuzova, Michael M Bornstein, Alexey S Krasnov, Max A Kharchenko, Sergey I Nikolenko, Mikhail M Sveshnikov, and Georgiy B Bednenko. Tooth detection and numbering in panoramic radiographs using convolutional neural networks. Dentomaxillofacial Radiology, 48(4):20180051, 2019.
[31] Suvarna Bhat, Gajanan K Birajdar, and Mukesh D Patil. A comprehensive survey of deep learning algorithms and applications in dental radiograph analysis. Healthcare Analytics, page 100282, 2023.
[32] Xumeng Wang, Ziliang Wu, Wenqi Huang, Wei Yating, Zhaosong Huang, Mingliang Xu, and Wei Chen. Vis+ai: integrating visualization with artificial intelligence for efficient data analysis. Frontiers of Computer Science, 17, 06 2023.
[33] Wen**g Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. Chart decoder: Generating textual and numeric information from chart images automatically. Journal of Visual Languages & Computing, 48:101–109, 2018.
[34] Changjian Chen, Jun Yuan, Yafeng Lu, Yang Liu, Hang Su, Songtao Yuan, and Shixia Liu. Oodanalyzer: Interactive analysis of out-of-distribution samples. IEEE Transactions on Visualization and Computer Graphics, 27(7):3335–3349, jul 2021.
[35] Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. pages 547–554, 2012.
[36] Mengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, and Shixia Liu. Analyzing the training processes of deep generative models. IEEE transactions on visualization and computer graphics, 24(1):77–87, 2017.
[37] Kelei Cao, Mengchen Liu, Hang Su, **g Wu, Jun Zhu, and Shixia Liu. Analyzing the noise robustness of deep neural networks. IEEE Transactions on Visualization and Computer Graphics, 27(7):3289–3304, 2020.
[38] Zijie J. Wang, Robert Turko, Omar Shaikh, Haekyu Park, Nilaksh Das, Fred Hohman, Minsuk Kahng, and Duen Horng Polo Chau. Cnn explainer: Learning convolutional neural networks with interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 27(2):1396–1406, 2021.
[39] Aravindh Mahendran and Andrea Vedaldi. Understanding deep image representations by inverting them. pages 5188–5196, 2015.
[40] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. pages 618–626, 2017.
[41] Changjian Chen, Yukai Guo, Fengyuan Tian, Shilong Liu, Weikai Yang, Zhaowei Wang, **g Wu, Hang Su, Hanspeter Pfister, and Shixia Liu. A unified interactive model evaluation for classification, object detection, and instance segmentation in computer vision. IEEE Transactions on Visualization and Computer Graphics, 2023.
[42] Yongsu Ahn and Yu-Ru Lin. Fairsight: Visual analytics for fairness in decision making. IEEE transactions on visualization and computer graphics, 26(1):1086–1095, 2019.
[43] Maaz Ansari, Surendra Bhosale, and Archana Choudhary. Semantic segmentation using convolutional neural networks. 10:31–34, 06 2023.
[44] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. pages 770–778, 2016.
[45] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[46] Ming-Kuei Hu. Visual pattern recognition by moment invariants. IRE transactions on information theory, 8(2):179–187, 1962.
[47] Frederik J.S. Doerr and Alastair J. Florence. A micro-xrt image analysis and machine learning methodology for the characterisation of multi-particulate capsule formulations. International Journal of Pharmaceutics: X, 2:100041, 2020.
[48] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
[49] OpenAI. Introducing chatgpt.
[50] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. pages 4015–4026, 2023.
[51] André Ferreira Leite, Karla de Faria Vasconcelos, Holger Willems, and Reinhilde Jacobs. Radiomics and machine learning in oral healthcare. PROTEOMICS–Clinical Applications, 14(3):1900040, 2020.
[52] Burak Dayı, Hüseyin Üzen, İpek Balıkçı Çiçek, and Şuayip Burak Duman. A novel deep learning-based approach for segmentation of different type caries lesions on panoramic radiographs. Diagnostics, 13(2):202, 2023.
[53] Esra Sivari, Guler Burcu Senirkentli, Erkan Bostanci, Mehmet Serdar Guzel, Koray Acici, and Tunc Asuroglu. Deep learning in diagnosis of dental anomalies and diseases: A systematic review. Diagnostics, 13(15):2512, 2023.