¹¹institutetext: 1. Vanderbilt University, Nashville TN 37215, USA,
2. Vanderbilt University Medical Center, Nashville TN 37232, USA,

Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results

Jialin Yue 11 Tianyuan Yao 11 Ruining Deng 11 Quan Liu 11 Juming Xiong 11 Haichun Yang 22 Yuankai Huo 11

Abstract

Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn’t crucial. Unfortunately, this widely adopted strategy is not readily available for combining circle representations. In this paper, we propose Weighted Circle Fusion (WCF), a simple approach for merging predictions from various circle detection models. Our method leverages confidence scores associated with each proposed bounding circle to generate averaged circles. Our method undergoes thorough evaluation on a proprietary dataset for glomerular detection in object detection within whole slide imaging (WSI). The findings reveal a performance gain of 5%, respectively, compared to existing ensemble methods. Furthermore, the Weighted Circle Fusion technique not only improves the precision of object detection in medical images but also notably decreases false detections, presenting a promising direction for future research and application in pathological image analysis.

Keywords:

Medical Imaging Ensemble Methods Circle Representation Weighted Circle Fusion

1 Introduction

Refer to caption — Figure 1: Comparison of Box Fusion and Circle Fusion Methods for Object Detection. This figure delineates the differences between the ensemble results of box representation and circle representation. Box fusion alters the dimensions of the box, thereby changing its shape, while circle fusion only modifies the radius of the circle, preserving its shape. For the detection of medical ball-shaped objects, circle representation can achieve better performance.

Object detection plays an essential role in medical imaging [7, 12, 11], offering a wide range of applications that are enhanced by machine learning technologies. Traditional object detection models, such as Faster R-CNN [4], YOLO [17], and SSD [13], have been widely adopted across various domains for their efficiency and accuracy [9, 10, 3, 6]. In medical object detection tasks, detecting glomeruli is essential for effective diagnosis and quantitative assessments in renal pathology. For these tasks, CircleNet [22] stands out in the medical field for its unique approach to detection tasks. Unlike conventional detection networks that rely on bounding boxes, CircleNet offers a rotation-consistent circle representation with fewer parameters for ball-shaped objects, such as glomeruli in kidney pathology (Fig. 1). Despite CircleNet’s advantages, relying on a single CircleNet-trained model for detection tasks presents considerable challenges, including missed and false detections [14, 5, 21, 20, 1, 8].

To enhance the robustness of object detection, ensemble learning algorithms, such as Non-Maximum Suppression (NMS) [15], Soft-NMS [2], and Weighted Box Fusion (WBF) [19], have been proposed to fuse the detection results from multiple models (Fig. 1). MS and Soft-NMS work by eliminating lower confidence detections based on an Intersection Over Union (IOU) threshold [18], with Soft-NMS adjusting detection scores rather than removing detections outright. WBF further refines this approach by merging overlap** detections, allowing those with higher confidence scores to improve the merged result. Unfortunately, such methods were optimized for traditional bounding box based representation for natural images.

In this paper, we propose a simple ensemble method, called Weighted Circle Fusion (WCF), designed specifically for circle representation in medical imaging detections. This method merges overlap** detections, with the fusion result’s position decided by the confidence of the contributing detections. Importantly, it calculates the number of overlapped circles merged for each object, while computing the average score for false positive elimination. In experiments, we assessed the detection results of glomeruli on whole slide images (WSIs) using five-fold cross-validation. Additionally, to validate the method’s consistency across rotations, we tested it on images rotated by 90 degrees. The results demonstrate the method’s decent rotation consistency. To summarize, the contribution of this paper is threefold:

$\bullet$ The WCF method is proposed for fusing detection results from circle representation, enhancing the precision and reliability of instance detection on ball-shaped medical objects.

$\bullet$ The false positive detection outcomes are further eliminated through our dual thresholds strategy. It not only considers the individual confidence score but also counts the overlap across hard decisions.

$\bullet$ Our method achieved a substantial performance gain ( $>$ 5% ) compared to any individual models.

2 Methods

In this section, we introduce an innovative method for fusing predictions: Weighted Circle Fusion (Fig. 2). This technique is designed to enhance the accuracy of object detection, particularly focusing on circular objects commonly encountered in medical imaging, such as cells, glomeruli, or other spherically shaped features. Our approach involves pairwise fusion of the detection results from five models, where the results from the first model are fused with the second, then the combined results are fused with the third model, and so on until the fifth model is included.

The WCF process begins with aggregating predictions from multiple models, resulting in several sets of detection outcomes. Initially, the detection results from the first model are stored in a list, referred to as $R$ . Subsequent detections from other models are compared against the entries in list $R$ based on their cIOU [22].The definition of cIOU can be found in the corresponding reference. If the cIOU between any two detections exceeds a predetermined threshold, indicating an enhanced agreement between models on the presence and location of an object, these detections are considered for fusion.

Upon fusion of the two results, it is necessary to recalculate the coordinates and confidence score of the new, combined result. Given that our detection results are represented as circles, we utilize the circles’ center coordinates and radii for computation. Suppose the center coordinates and radius of a result from the first set are ( $x_{1}$ , $y_{1}$ ) and $r_{1}$ with a confidence score $s_{1}$ ; and similarly, ( $x_{2}$ , $y_{2}$ ) and $r_{2}$ with score $s_{2}$ for a result from the second set. The formulas for calculating the weighted average coordinates and radius are as follows:

For center coordinates:

x_{\text{fuse}}=\frac{x_{1}\cdot s_{1}+x_{2}\cdot s_{2}}{s_{1}+s_{2}}

(1)

y_{\text{fuse}}=\frac{y_{1}\cdot s_{1}+y_{2}\cdot s_{2}}{s_{1}+s_{2}}

(2)

For radius:

r_{\text{fuse}}=\frac{r_{1}\cdot s_{1}+r_{2}\cdot s_{2}}{s_{1}+s_{2}}

(3)

After calculating the fused coordinates, we compute the average of the scores of the merged results and keep track of how many detections have been merged to form this new result.

If a result from the second set cannot fuse with any result in list $R$ , it is directly added to $R$ . This process is repeated for each set of predictions until all m sets have been processed.

Upon completing the fusion of all model predictions, the confidence score $S$ for the fused result is calculated as follows:

S=\frac{\sum_{m=1}^{M}S_{m}}{M}

(4)

where $S_{m}$ is the confidence score of each individual model’s prediction. We also apply a “T score” as the threshold for the average confidence score to decide which circles to exclude. In our experiments, we set this value to 0.9.

Additionally, we apply a “count score” $C$ to quantify how many model predictions have been fused into a single detection. The max value of $C$ depends on how many models we use in our ensemble method.We also apply a “T count” as the threshold for the count score to determine which circles should be dropped out.In our experiments, this value was established at 2.

To further refine the detection outcomes, two thresholds are set: one for the confidence score and another for the count score. Detections that fall below these thresholds are considered unreliable and are therefore excluded from the final results. This strategic approach to fusion enhances the precision of detection, making WCF particularly effective for instances where erroneous detections are common.

Through its sophisticated fusion mechanism, WCF not only improves the overall accuracy of medical object detection but also demonstrates robustness against variations in object presentations, including changes in orientation and size. By aggregating and intelligently weighting the detection outcomes from multiple models, WCF addresses the critical challenge of reducing false positives and negatives, ensuring that only the most credible detections are considered. This method’s emphasis on consensus among models, coupled with its innovative use of weighted averages for circle properties, sets a new standard for precision in the detection of circular objects in medical imaging.

3 Data and Experiments

3.1 Dataset

For our training dataset, we utilized three distinct sources of data. Firstly, we collected 15,190 patches from whole slide images derived from renal biopsies. These kidney tissue samples underwent routine processing, were embedded in paraffin, and sectioned to a thickness of 3 $\mu$ m before being stained with hematoxylin and eosin (H&E), Periodic Acid–Schiff (PAS), or Jones stains. The samples were anonymized, and the study received approval from the Institutional Review Board (IRB).

In addition, we included 8,151 glomerular images from the OmniSeg dataset. These images were extracted from 459 WSIs, originating from 125 patients diagnosed with Minimal Change Disease (MCD). Manual segmentation was performed on these images to identify six structurally normal pathological primitives, utilizing digital renal biopsies from the NEPTUNE study. Each image, with a resolution of 3000 × 3000 pixels at 40× magnification (0.25 $\mu$ m per pixel), included features such as TUFT, CAP, PT, DT, PTC, and VES, across H&E, PAS, Silver (SIL), and Trichrome (TRI) stains.Both manual and automatic segmentation techniques were employed on the multi-channel images generated from these different stains.

Furthermore, our dataset incorporated 9,260 patches from PAS-stained WSIs of murine kidneys. This dataset was divided into training, validation, and testing sets with a ratio of 7:1:2 for each of the five models.

All patches in our training dataset were either cropped or resized to dimensions of 512 × 512 pixels. Each patch contained at least one glomerulus.

For the testing dataset, we included 15 PAS-stained WSIs, encompassing 2051 mouse glomeruli.

3.2 Experiments

This study embarked on a comprehensive evaluation of object detection models, leveraging the CircleNet architecture with dla-34 [23] as the fundamental backbone for each model. To enrich the diversity of learning and ensure a comprehensive understanding, the training dataset for each of the five models was meticulously curated with slight variations. This strategic diversity is aimed at encompassing a broad spectrum of scenarios and enhancing the robustness of the models. Upon completion of training, which extended across 30 epochs for each model, the outputs generated by these models were refined through the application of the Non-Maximum Suppression algorithm.

In the experiment, our WCF method was configured with specific parameters: a circle Intersection Over Union threshold set at 0.5, a “T count” threshold of 2, and a “T score” threshold of 0.9. The initial phase of the experiments involved applying the WCF algorithm to the outputs previously refined by the NMS algorithm. This fusion process is aimed at amalgamating the strengths of individual detections into a singular, more accurate detection result. The effectiveness of the WCF-fused results was meticulously evaluated, comparing it against the performances of the individual models as well as against outcomes obtained through traditional NMS and the more nuanced Soft NMS techniques, with cIOU thresholds set at 0.5 and 0.3, respectively.

In the subsequent phase of the study, we delved into assessing the rotational consistency of their fusion method. This was achieved by extracting patches from Whole Slide Images and rotating them by 90 degrees prior to the detection process. The results from these rotated patches were then subjected to the same fusion process to ascertain the method’s stability and accuracy in the face of rotational variations. This rigorous assessment aimed at evaluating the fusion method’s adaptability and consistency, ensuring that the detection accuracy remains uncompromised regardless of the orientation of the input images.

This dual-faceted experimental setup provided a holistic view of the fusion method’s effectiveness, both in terms of its capability to integrate multiple detection outcomes into a unified, highly accurate result and its resilience in maintaining performance consistency across varied rotational perspectives. The study’s methodological rigor and the strategic parameter settings for the WCF, NMS, and Soft NMS algorithms underscore the comprehensive effort to optimize object detection accuracy, paving the way for advancements in automated detection systems.

In the second part, patches extracted from WSIs were rotated 90 degrees prior to detection. The results from these rotated patches were then fused to assess the rotation consistency of our fusion method.

Model	mAP(0.5:0.95)	mAP(@0.5IOU)	mAP(@0.75IOU)	Average Recall(0.5:0.95)
model1	0.726	0.880	0.812	0.699
model2	0.639	0.738	0.692	0.724
model3	0.532	0.658	0.590	0.658
model4	0.687	0.838	0.785	0.622
model5	0.697	0.831	0.777	0.667
NMS [15]	0.463	0.566	0.516	0.745
Soft-NMS [2]	0.319	0.402	0.357	0.722
WCF(Ours)	0.776	0.912	0.883	0.697
	(+5%)	(+3.2%)	(+7.1%)	(-2.7%)

Table 1: The table displays the performance of individual models and various fusion methods, including WCF, across different cIOU thresholds in terms of mean average precision and average recall. Notably, the predictions from WCF exhibit superior performance, highlighting its effectiveness in improving both precision and recall over traditional methods. This comparative analysis underscores WCF’s enhanced capability to accurately detect medical imaging features.

3.3 Evaluation

The models were evaluated based on the mean average precision (mAP) at IoU values of 0.5 and 0.75. Additionally, mAP was computed across a spectrum of IoU thresholds, thereby conducting a comprehensive assessment. This metric was calculated over a range of IoU thresholds, from 0.5 to 0.95 in steps of 0.05, at each step averaging the precision. Alongside precision, the average recall across these IoU thresholds was also measured, providing a rounded evaluation of model performance.

The IoU metric, a ratio reflecting the overlap between two objects versus their combined area, is traditionally calculated for bounding box representations. However, given that this study’s predictions utilize circle representations, we adopted the circle IoU (cIoU) [16] metric as our evaluation standard. The cIoU offers a more fitting measure for our circular detection outputs, aligning with the unique geometry of the objects being detected.

4 Results

4.1 Performance on glomerular detection

Fig. 3 and Table 1 showcase the performance of our fusion method, which integrates the outputs from five models on murine glomerular WSIs. The results demonstrate that our approach achieves remarkably higher mAP values and average recall rates. Notably, the mAP obtained through our method surpasses that of any individual model included in the study. While the average recall did not reach the highest observed value, it remained competitively high. Importantly, compared to the model with the highest average recall, our fusion method achieved a substantially greater mAP, underscoring the effectiveness of our approach in balancing precision and recall in detection tasks.

Model	mAP(0.5:0.95)	mAP(@0.5IOU)	mAP(@0.75IOU)	Average Recall(0.5:0.95)
model1	0.778	0.889	0.879	0.778
model2	0.688	0.809	0.779	0.688
model3	0.704	0.835	0.802	0.711
model4	0.711	0.854	0.814	0.713
model5	0.759	0.873	0.856	0.744
NMS [15]	0.641	0.776	0.730	0.636
Soft-NMS [2]	0.570	0.661	0.635	0.565
WCF(Ours)	0.823	0.924	0.913	0.817
	(+4.5%)	(+3.5%)	(+3.4%)	(+3.9%)

Table 2: Performance on rotation invariance:The chart displays the rotation invariance of various models and methods. From the results, we can see that the WCF method has achieved improvements in mean average precision and mean average recall. The results indicate that WCF possesses better rotation consistency.

4.2 Rotation consistency

The study meticulously explores the rotation consistency of our object detection method, offering detailed insights in Table 2. This analysis embarked on a thorough assessment by methodically rotating each patch 90 degrees clockwise before the detection process and juxtaposing these findings with outcomes derived from non-rotated patches. This rigorous comparison was designed to evaluate the resilience and adaptability of our approach when confronted with alterations in orientation. The results underscored the WCF method’s notable consistency in rotation, highlighting its robustness against orientation changes.

5 Conclusion and Discussion

This work is the first to implement the task of ensembling detection results for circle representation. We presented a novel ensemble method named Weighted Circle Fusion, aimed at refining the ensemble learning process for predictions derived from multiple deep learning models. This method has demonstrated its efficacy through superior precision metrics, outperforming conventional benchmarks, especially in contexts plagued by a remarkable rate of detection inaccuracies. Our empirical findings underscore the potential of WCF in mitigating errors associated with circle representation, thereby establishing it as an advantageous strategy for the nuanced demands of medical image analysis through optimized deep learning approaches.

Limitation. Despite the commendable progress WCF heralds in precision enhancement, the exploration of its capabilities reveals certain limitations that warrant attention. Notably, the improvement in recall facilitated by our method does not manifest as significantly as anticipated. In certain evaluations, the recall rate exhibited a decrement, casting light on an essential avenue for future development. This phenomenon suggests that while WCF excels in reducing false positives, leading to higher precision, its strategy may inadvertently lead to an increase in false negatives, thereby affecting the recall negatively.

References

[1] Alzubaidi, L., Al-Amidie, M., Al-Asadi, A., Humaidi, A.J., Al-Shamma, O., Fadhel, M.A., Zhang, J., Santamaría, J., Duan, Y.: Novel transfer learning approach for medical imaging with limited labeled data. Cancers 13(7), 1590 (2021)
[2] Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision. pp. 5561–5569 (2017)
[3] Del Signore, A., Hendriks, A.J., Lenders, H.R., Leuven, R.S., Breure, A.: Development and application of the ssd approach in scientific case studies for ecological risk assessment. Environmental Toxicology and Chemistry 35(9), 2149–2161 (2016)
[4] Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 1440–1448 (2015)
[5] Gomathisankaran, M., Yuan, X., Kamongi, P.: Ensure privacy and security in the process of medical image analysis. In: 2013 IEEE international conference on granular computing (GrC). pp. 120–125. IEEE (2013)
[6] Huang, R., Pedoeem, J., Chen, C.: Yolo-lite: a real-time object detection algorithm optimized for non-gpu computers. In: 2018 IEEE international conference on big data (big data). pp. 2503–2510. IEEE (2018)
[7] Jaeger, P.F., Kohl, S.A., Bickelhaupt, S., Isensee, F., Kuder, T.A., Schlemmer, H.P., Maier-Hein, K.H.: Retina u-net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. In: Machine Learning for Health Workshop. pp. 171–183. PMLR (2020)
[8] Jia, X., Tian, W., Li, C., Yang, X., Luo, Z., Wang, H.: A dynamic active safe semi-supervised learning framework for fault identification in labeled expensive chemical processes. Processes 8(1), 105 (2020)
[9] Jiang, H., Learned-Miller, E.: Face detection with the faster r-cnn. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). pp. 650–657. IEEE (2017)
[10] Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of yolo algorithm developments. Procedia Computer Science 199, 1066–1073 (2022)
[11] Kaur, A., Singh, Y., Neeru, N., Kaur, L., Singh, A.: A survey on deep learning approaches to medical images and a systematic look up into real-time object detection. Archives of Computational Methods in Engineering pp. 1–41 (2022)
[12] Li, Z., Dong, M., Wen, S., Hu, X., Zhou, P., Zeng, Z.: Clu-cnns: Object detection for medical images. Neurocomputing 350, 53–59 (2019)
[13] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. pp. 21–37. Springer (2016)
[14] Natu, P., Natu, S., Agrawal, U.: Privacy issues in medical image analysis. In: Data protection and privacy in healthcare, pp. 51–64. CRC Press (2021)
[15] Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06). vol. 3, pp. 850–855 (2006). https://doi.org/10.1109/ICPR.2006.479
[16] Nguyen, E.H., Yang, H., Deng, R., Lu, Y., Zhu, Z., Roland, J.T., Lu, L., Landman, B.A., Fogo, A.B., Huo, Y.: Circle representation for medical object detection. IEEE transactions on medical imaging 41(3), 746–754 (2021)
[17] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)
[18] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 658–666 (2019)
[19] Solovyev, R., Wang, W., Gabruseva, T.: Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing 107, 104117 (2021)
[20] Tajbakhsh, N., Hu, Y., Cao, J., Yan, X., Xiao, Y., Lu, Y., Liang, J., Terzopoulos, D., Ding, X.: Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). pp. 1251–1255. IEEE (2019)
[21] Vincent, J., Pan, W., Coatrieux, G.: Privacy protection and security in ehealth cloud platform for medical image sharing. In: 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). pp. 93–96. IEEE (2016)
[22] Yang, H., Deng, R., Lu, Y., Zhu, Z., Chen, Y., Roland, J.T., Lu, L., Landman, B.A., Fogo, A.B., Huo, Y.: Circlenet: Anchor-free glomerulus detection with circle representation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23. pp. 35–44. Springer (2020)
[23] Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2403–2412 (2018)