\cftpagenumbersoff

figure \cftpagenumbersofftable

MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging

Chaojie Zhang Department of Radiology, New York University Langone Health, New York, NY, 10016, USA Shengjia Chen Department of Radiology, New York University Langone Health, New York, NY, 10016, USA Ozkan Cigdem Department of Radiology, New York University Langone Health, New York, NY, 10016, USA Haresh Rengaraj Rajamohan Center for Data Science, New York University, New York, NY, 10011, USA Kyunghyun Cho Center for Data Science, New York University, New York, NY, 10011, USA Richard Kijowski Department of Radiology, New York University Langone Health, New York, NY, 10016, USA Cem M. Deniz Department of Radiology, New York University Langone Health, New York, NY, 10016, USA

Summary

A transformer-based deep learning model, MR-Transformer, was developed to predict total knee replacement using magnetic resonance imaging. The model exhibited state-of-the-art performance on the Osteoarthritis Initiative and Multicenter Osteoarthritis Study databases.

Key Points

•

MR-Transformer incorporates the ImageNet pre-training and captures three-dimensional spatial correlation to predict total knee replacement using MRI.
•

The model achieved areas under the receiver operating characteristic curve (AUC) of 0.89 for coronal intermediate-weighted turbo spin-echo and 0.91 for sagittal intermediate-weighted turbo spin-echo with fat suppression from the Osteoarthritis Initiative database, as well as 0.82 for coronal short-tau inversion recovery and 0.82 for sagittal proton density fat-saturated from the Multicenter Osteoarthritis Study database.
•

Using the coronal short-tau inversion recovery MRI knee scans, MR-Transformer outperformed other deep learning models in AUC (MR-Transformer: 0.82, TSE: 0.76 (P $<$ .001), 3DMeT: 0.74 (P = .005), and MRNet: 0.78 (P = .010)).

List of Abbreviations

2D = two-dimensional, 3D = three-dimensional, OAI = Osteoarthritis Initiative, MOST = Multicenter Osteoarthritis Study, TKR = total knee replacement, AUC = area under receiver operating characteristic curve, COR IW TSE = coronal intermediate-weighted turbo spin-echo, SAG IW TSE FS = sagittal intermediated-weighted turbo spin-echo with fat suppression, COR STIR = coronal short-tau inversion recovery, SAG PD FAT SAT = sagittal proton density fat-saturated

Abstract

Purpose: To develop a transformer-based deep learning model that incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation to predict total knee replacement (TKR) using MRI.

Materials and Methods: A transformer-based deep learning model, MR-Transformer, was developed for TKR prediction. The model adapted from the ImageNet pre-trained vision transformer DeiT-Ti, directly utilizes 3D MR images as input. A total of 353 case-control matched pairs of coronal intermediate-weighted turbo spin-echo (COR IW TSE) and sagittal intermediated-weighted turbo spin-echo with fat suppression (SAG IW TSE FS) knee MR scans from the Osteoarthritis Initiative (OAI) and 270 case-control pairs of coronal short-tau inversion recovery (COR STIR) and sagittal proton density fat-saturated (SAG PD FAT SAT) knee MRI scans from the Multicenter Osteoarthritis Study (MOST) databases were utilized in the study. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury diagnosis using MRI, with the area under the receiver operating characteristic curve (AUC) as the evaluation metric.

Results: The proposed MR-Transformer model consistently achieved higher AUC performance compared to TSE, 3DMeT, and MRNet models for both coronal and sagittal knee MR scans from the OAI and MOST databases. Specifically, for COR STIR, MR-Transformer achieved a significantly higher AUC (0.82) compared to other DL models: TSE (0.76, P $<$ .001), 3DMeT (0.74, P = .005), and MRNet (0.78, P = .010). Similarly, for COR IW TSE, SAG IW TSE FS, and SAG PD FAT SAT, MR-Transformer achieved AUCs of 0.89, 0.91, and 0.82, respectively. These were significantly higher than those of TSE (0.86, P = .009; 0.82, P $<$ .001; 0.75, P = .006) and 3DMeT (0.82, P = .004; 0.78, P $<$ .001; 0.63, P $<$ .001), respectively. While the proposed model achieved either similar or slightly higher AUCs than MRNet (0.89, 0.90, and 0.81 for COR IW TSE, SAG IW TSE FS, and SAG PD FAT SAT, respectively), the differences were not statistically significant (P = .44, .070, and .20, respectively).

Conclusion: MR-Transformer exhibited state-of-the-art performance on TKR prediction using MRI compared to currently available deep learning models.

1 Introduction

Knee osteoarthritis, the most prevalent type of arthritis, is a degenerative joint disease that significantly diminishes the quality of life for millions worldwide, resulting in pain, limited mobility, and disability [1, 2]. While knee osteoarthritis has no cure, total knee replacement (TKR) offers a viable treatment option for advanced stages of the disease [3]. Early identification of patients at risk of TKR progression is crucial to facilitate the development and implementation of potential disease-modifying therapies [3]. Knee osteoarthritis is also a structural disease that results in structural changes within the joint, the diagnosis of it requires comprehensive inspection of imaging studies like MRI, by an experienced radiologist. Recently, deep learning methods have shown the potential to assist in TKR prediction [4, 5, 6].

Several challenges arise when employing deep learning methods for predicting TKR from MRI data: the three-dimensional (3D) nature of the data, the limited size of medical datasets, and the inclusion of global structural information pertinent to knee osteoarthritis. Current studies implemented 3D convolutional neural networks to capture 3D spatial information on the MR images to predict TKR[4, 5]. However, deep learning as a data-driven method, often requires a large amount of data to achieve optimal performance. Deep learning models without any pre-training usually achieve suboptimal performance on small medical datasets[7]. To address this problem, MRNet used the ImageNet pre-trained two-dimensional (2D) convolutional neural network to encode each MR slice and aggregated all encoded features to diagnose knee injuries from knee MRI scans[8]. This approach leverages large-scale pre-training but fails to capture the 3D spatial correlation of MR images.

In this study, we proposed MR-Transformer, a novel deep learning model adapted from the ImageNet pre-trained vision transformer DeiT-Ti[9], for TKR prediction using MRI. The model inherits the ImageNet[10] pre-trained weights and captures 3D spatial correlation of MR images. In addition, the transformer architecture modeling long-range dependencies enables the model to better capture global structural information from the MR images. We evaluated our proposed model on TKR prediction using MR images from the Osteoarthritis Initiative (OAI)[11] and Multicenter Osteoarthritis Study (MOST)[12] databases. MR-Transformer exhibited state-of-the-art performance on TKR prediction using MRI.

2 Materials and Methods

2.1 Data

In this study, we used MR images from two publicly available databases established as prospective cohort studies: OAI[11] and MOST[12]. The OAI database contains MRI examinations of different tissue contrasts from 4796 subjects with or at risk for knee osteoarthritis evaluated at baseline and 12, 24, 36, 48, 60, 72, 84, and 108-month follow-up. The MOST database contains MRI examinations from 3026 subjects with or at risk for knee osteoarthritis evaluated at baseline and 15, 30, 60, and 84-month follow-up. The OAI and MOST databases were approved by the Internal Review Boards at University of California at San Francisco, Boston University Medical Center, and each individual clinical recruitment site and was performed in compliance with the Declaration of Helsinki. All subjects signed written informed consent. In both databases, balanced case–control cohorts were selected by matching case subjects and control subjects using baseline demographic variables associated with knee osteoarthritis progression including age, sex, ethnicity, and body mass index. Case subjects were identified as individuals who underwent a TKR after the baseline enrollment date, while control subjects were identified as individuals who did not.

In the OAI database, a total of 353 case–control pairs were identified and separated into the training group of 302 pairs and the test group of 51 pairs. The deep learning models were trained on coronal intermediate-weighted turbo spin-echo (COR IW TSE) and sagittal intermediated-weighted turbo spin-echo with fat suppression (SAG IW TSE FS) MRI knee scans of the training group using six-fold cross-validation.

In the MOST database, a total of 270 case–control pairs were identified and separated into the training group of 231 pairs and the test group of 39 pairs. The deep learning models were trained on coronal short-tau inversion recovery (COR STIR) and sagittal proton density fat-saturated (SAG PD FAT SAT) MRI knee scans of the training group using six-fold cross-validation.

2.2 Model Development

The proposed model, MR-Transformer, was adapted from the ImageNet pre-trained tiny vision transformer DeiT-Ti[9]. The main idea of the adaptation is to modify the 2D vision transformer to accommodate 3D input while retaining the pre-trained model weights.

Different from traditional convolutional neural networks, vision transformers separate the images into small 2D patches and encode the vectorized patches using self-attention mechanisms. This patch-based processing enables vision transformers to be flexible in handling images with different dimensions, making the adaptation from 2D inputs to 3D inputs possible. In our adaptation, we split the 3D MR images into 2D patches, retain the pre-trained model weights, and modify the position embeddings of the model.

The model architecture of MR-Transformer is provided in Figure 1. In the pipeline of MR-Transformer, the single grayscale channel of the MR image is replicated to generate a three-color-channel input matrix. The 3D input matrix is then separated into 16 $\times$ 16 patches, with each patch being flattened into a 768-dimensional vector. Then, the linear projection layer inherited from DeiT-Ti project the flattened patch vectors into 192-dimensional patch embeddings. The adapted learnable position embeddings are added to the corresponding patch embeddings to retain the positional information of each 2D patch. The patch embeddings, appended with an inherited learnable ”class” embedding, are processed by the pre-trained transformer encoder from DeiT-Ti. Then the final linear layer utilizes the encoded ”class” embedding to generate the TKR prediction outcome.

MR-Transformer was designed to capture 3D spatial correlation of MR images, meanwhile leveraging the advantages of 2D large-scale pre-training. Moreover, the transformer architecture of MR-Transformer enabled the modeling of long-range dependencies, facilitating the capture of global structural information from the MR images.

Refer to caption — Figure 1: Model architecture of MR-Transformer. The model inherited the model weights of the ImageNet pre-trained vision transformer and processed knee MR images by separating them into small 2D patches.

2.3 Position Embeddings

The key components of MR-Transformer can be easily inherited from the ImageNet pre-trained DeiT-Ti except for the position embeddings. Position embeddings are learnable vectors with the same dimension as the patch embeddings. They are used to retain the positional information of different patches. To adapt the 2D pre-trained position embeddings to the 3D MR images, we replicated the 2D pre-trained position embeddings for each MR slice. Then, 2D interpolation was employed to the position embeddings when the MR slice had a larger size than the pre-training images. Note that position embeddings are learnable vectors that can be trained to learn the 3D spatial position information of MR images.

2.4 Comparison with Other Models

The proposed MR-Transfomer model was compared to three different deep learning models designed for knee injury diagnosis from MRI. MRNet is a convolutional neural network-based model developed for detecting general abnormalities and specific diagnoses on knee MRI exams, it employs the ImageNet pre-trained AlexNet[13] to encode each MR slice and uses max pooling to aggregate the encoded features[8]. TSE is a 3D convolutional neural network model developed for TKR prediction, it captures 3D spatial correlation from knee MR images[4]. 3DMeT is a 3D transformer-based model with a convolutional neural network teacher mechanism, it was developed for knee cartilage defect assessment from knee MRI[14].

2.5 Statistical Analysis

Receiver operator characteristic analysis for area under the curve (AUC) and the calculation of 95% confidence interval were employed for individual model evaluation. Sensitivity and specificity assessments were conducted at high specificity (80%) and sensitivity (80%) thresholds, respectively. Statistical analyses were implemented utilizing the R Project version 4.2.1 for Statistical Computing Software[15]. The one-sided paired t-tests were used to evaluate differences in performance between the different models. Statistical significance was defined as a P value less than 0.05.

3 Results

3.1 Participant Characteristics

A total of 353 case–control pairs were derived from the 4796 subjects in the OAI database, while 270 case–control pairs were derived from the 3026 subjects in the MOST database. Subjects were excluded if they had TKR at baseline, received partial knee replacement over the course of follow-up, were missing baseline or 108-month follow-up information, or did not match with a case or control subject. The flowcharts of selecting case-control pairs from the OAI and MOST databases are provided in Figure 2. In the 353 case–control pairs, there were 138 males aged from 45 to 78 with a mean $\pm$ standard deviation of 64 $\pm$ 8 and 215 females aged from 45 to 79 with a mean $\pm$ standard deviation of 63 $\pm$ 8. In 270 case–control pairs, there were 67 males aged from 50 to 78 with a mean $\pm$ standard deviation of 65 $\pm$ 7 and 203 females aged from 50 to 79 with a mean $\pm$ standard deviation of 65 $\pm$ 7. The baseline demographics of study cohorts are provided in Table 1.

Table 1: Summary Statistics of Demographic Variables for Matched Case-Control Cohorts

Dataset	Parameter		Men		P Value	Women		P Value
Dataset	Parameter		Patients	Control Patients	P Value	Patients	Control Patients	P Value
OAI	No. of patients		138	138		215	215
	Age range (y)		45-78	45-78		45-79	45-79
	Mean age (y)		64 $\pm$ 8	64 $\pm$ 8	>0.99	63 $\pm$ 8	63 $\pm$ 8	>0.99
	Mean height (m)		1.76 $\pm$ 0.06	1.76 $\pm$ 0.06	0.78	1.62 $\pm$ 0.06	1.62 $\pm$ 0.06	0.56
	Mean weight (kg)		93.4 $\pm$ 14.1	92.0 $\pm$ 12.7	0.38	78.5 $\pm$ 14.8	77.0 $\pm$ 13.6	0.28
	Mean BMI (kg/m2)		29.9 $\pm$ 3.8	29.4 $\pm$ 3.4	0.31	29.9 $\pm$ 5.3	29.5 $\pm$ 4.7	0.41
	Ethnicity*	White	126	126		177	177
		African American	10	10		33	33
		Asian	0	0		2	2
		Other nonwhite	2	2		3	3
MOST	No. of patients		67	67		203	203
	Age range (y)		50-78	50-78		50-79	50-79
	Mean age (y)		65 $\pm$ 7	65 $\pm$ 7	0.99	65 $\pm$ 7	65 $\pm$ 7	0.99
	Mean height (m)		1.79 $\pm$ 0.06	1.78 $\pm$ 0.06	0.46	1.62 $\pm$ 0.06	1.63 $\pm$ 0.06	0.41
	Mean weight (kg)		94.8 $\pm$ 13.0	96.7 $\pm$ 18.2	0.61	81.6 $\pm$ 13.6	81.6 $\pm$ 12.6	0.99
	Mean BMI (kg/m2)		30.1 $\pm$ 4.2	30.0 $\pm$ 4.1	0.98	31.1 $\pm$ 4.9	30.9 $\pm$ 4.7	0.71
	Ethnicity*	White	67	67		188	188
		African American	0	0		15	15
		Asian	0	0		0	0
		Other nonwhite	0	0		0	0

•

Note: Mean data are presented as mean $\pm$ standard deviation. The P-value compares the differences in mean between case and control groups for each variable. BMI = Body Mass Index, MOST= Multi-Center Osteoarthritis Study, OAI= Osteoarthritis Initiative. *Number of patients are shown.

3.2 Comparison of Models

The comparison of models for TKR prediction is presented in Table 2. When using MR images of different contrasts from the OAI and MOST databases, MR-Transformer attained significantly higher AUCs (COR IW TSE: 0.89 and SAG IW TSE: 0.91 and COR STIR: 0.82 and SAG PD FAT SAT: 0.82) than TSE (COR IW TSE: 0.86, P = .009 and SAG IW TSE: 0.82, P $<$ .001 and COR STIR: 0.76, P $<$ .001 and SAG PD FAT SAT: 0.75, P = .006) and 3DMeT (COR IW TSE: 0.82, P = .004 and SAG IW TSE: 0.78, P $<$ .001 and COR STIR: 0.74, P = .005 and SAG PD FAT SAT: 0.63, P $<$ .001). Similarly, using COR STIR MR images from the MOST database, MR-Transformer attained significantly higher AUC than MRNet (0.78, P = .010). However, the utilization of MR images with other tissue contrasts did not yield significant differences in AUCs between MR-Transformer and MRNet (COR IW TSE: 0.89, P = .44 and SAG IW TSE: 0.90, P = .070 and SAG PD FAT SAT: 0.81, P = .20).

Using COR IW TSE (OAI) MR scans, the MR-Transformer achieved a significantly higher sensitivity (0.84) compared to TSE (0.75, P = .013) and 3DMeT (0.68, P = .014). While the sensitivity of the MR-Transformer was also higher than MRNet (0.80), this difference was not statistically significant (P = .10). Similarly, the MR-Transformer’s specificity (0.80) was significantly higher than TSE (0.74, P = .012) and 3DMeT (0.65, P = .010), yet not statistically different from MRNet (0.78, P = .25). For SAG IW TSE (OAI), the MR-Transformer achieved a sensitivity of 0.83, which was significantly higher than TSE (0.68, P $<$ .001) and 3DMeT (0.64, P $<$ .001). Although the sensitivity of MR-Transformer was lower than MRNet (0.84), the difference was not statistically significant (P = .74). Similarly, the MR-Transformer’s specificity (0.84) was significantly higher than TSE (0.69, P = .001) and 3DMeT (0.55, P $<$ .001), yet not statistically different from MRNet (0.83, P = .33).

Using COR STIR (MOST) MR images, the MR-Transformer achieved a significantly higher sensitivity (0.62) compared to 3DMeT (0.53, P = .037). Although the sensitivity of the MR-Transformer was also higher than TSE (0.59) and MRNet (0.57), these differences were not statistically significant (P = .17 and P = .075, respectively). Conversely, the specificity of MR-transformer (0.66) was significantly higher than TSE (0.47, P = .001), 3DMeT (0.49, P $<$ .001), and MRNet (0.59, P = .032). For SAG PD FAT SAT (MOST), the MR-Transformer achieved a sensitivity of 0.64, which was significantly higher than TSE (0.52, P = .030) and 3DMeT (0.38, P = .002). Although the sensitivity of MR-Transformer was also higher than MRNet (0.62), the difference was not statistically significant (P = .16). Similarly, the MR-Transformer’s specificity (0.58) was significantly higher than 3DMeT (0.30, P $<$ .001). Even though the sensitivity of MR-Transformer was higher than 3DMeT (0.49) and lower than MRNet (0.60), these differences were not statistically significant (P = .053 and P = .68, respectively).

Table 2: Comparison of Deep Learning Models for Total Knee Replacement Prediction.

Model

AUC

P Value

Sensitivity

(at 80% Specificity)

P Value

Specificity

(at 80% Sensitivity)

P Value

COR IW TSE

TSE

0.86 ± 0.02

.009

0.75 ± 0.07

.013

0.74 ± 0.04

.012

3DMeT

0.82 ± 0.03

.004

0.68 ± 0.07

.014

0.65 ± 0.08

.010

MRNet

0.89 ± 0.01

.44

0.80 ± 0.01

.10

0.78 ± 0.03

.25

MR-Transformer

0.89 ± 0.01

*

0.84 ± 0.04

*

0.80 ± 0.03

*

SAG IW TSE

TSE

0.82 ± 0.01

<.001

0.68 ± 0.03

<.001

0.69 ± 0.02

.001

3DMeT

0.78 ± 0.02

<.001

0.64 ± 0.03

<.001

0.55 ± 0.08

<.001

MRNet

0.90 ± 0.00

.070

0.84 ± 0.02

.74

0.83 ± 0.02

.33

MR-Transformer

0.91 ± 0.01

*

0.83 ± 0.02

*

0.84 ± 0.03

*

COR STIR

TSE

0.76 ± 0.01

<.001

0.59 ± 0.04

.17

0.47 ± 0.06

.001

3DMeT

0.74 ± 0.03

.005

0.53 ± 0.09

.037

0.49 ± 0.06

<.001

MRNet

0.78 ± 0.01

.010

0.57 ± 0.07

.075

0.59 ± 0.03

.032

MR-Transformer

0.82 ± 0.01

*

0.62 ± 0.04

*

0.66 ± 0.03

*

SAG PD FAT SAT

TSE

0.75 ± 0.02

.006

0.52 ± 0.08

.030

0.49 ± 0.05

.053

3DMeT

0.63 ± 0.03

<.001

0.38 ± 0.07

.002

0.30 ± 0.05

<.001

MRNet

0.81 ± 0.01

.20

0.62 ± 0.03

.16

0.60 ± 0.02

.68

MR-Transformer

0.82 ± 0.02

*

0.64 ± 0.04

*

0.58 ± 0.07

*

•

Note: The MR-Transformer model was compared against three baseline deep learning models based on AUC, sensitivity at 80% specificity, and specificity at 80% sensitivity. The metrics (AUC, sensitivity, specificity) are presented as mean ± 95% confidence interval, calculated from six cross-validation folds. AUC = area under receiver operating characteristic curve, COR IW TSE = coronal intermediate-weighted turbo spin-echo, SAG IW TSE FS = sagittal intermediated-weighted turbo spin-echo with fat suppression, COR STIR = coronal short-tau inversion recovery, SAG PD FAT SAT = sagittal proton density fat-saturated, $*$ : the reference model for one-sided paired t-test.

3.3 Model Interpretation

Attention maps generated using Attention Rollout [16] can highlight crucial regions within MR images. Figure 3 presents an MR image from a subject who underwent a TKR within nine years. The highlighted area in the joint region indicates informative regions from the MR image relevant to TKR prediction.

4 Discussion

In this study, we introduced MR-Transformer, a transformer-based deep learning model designed for predicting TKR using sagittal and coronal MRI scans from the OAI and MOST databases. The proposed model exhibits several advantages, including leveraging large-scale ImageNet pre-training, capturing 3D spatial correlation, and modeling long-range dependencies from MR images. For TKR prediction using COR STIR knee MRI scans from the MOST database, MR-Transformer achieved a significantly higher AUC (0.82) compared to other deep learning models: TSE (0.76, P $<$ .001), 3DMeT (0.74, P = .005), and MRNet (0.78, P = .010). Similarly, MR-Transformer also achieved significantly higher AUCs compared to TSE and 3DMeT for other tissue contrasts in MRI scans: COR IW TSE (0.89 vs 0.86, P = .009 and 0.82, P = .004), SAG IW TSE (0.91 vs 0.82, P $<$ .001 and 0.78, P $<$ .001), and SAG PD FAT SAT (0.82 vs 0.75, P = .006 and 0.63, P $<$ .001), while differences compared to MRNet were not statistically significant for these contrasts.

Compared with TSE and 3DMeT, MR-Transformer leveraged ImageNet pre-trained weights and achieved better performance in TKR prediction, while TSE and 3DMeT are models trained from scratch. The results demonstrate the importance of large-scale pre-training in small medical datasets. As MRNet also leveraged ImageNet pre-training, it achieved close performance to MR-Transformer in TKR prediction. However, we still observed a better performance of MR-Transformer when using COR STIR MR images. This could be attributed to MR-Transformer’s capability to capture 3D spatial correlation and model long-range dependencies from MR images.

Although MR-Transformer exhibited leading performance in TKR prediction using MRI, some limitations of the model must be noted. The computation burden of MR-Transformer is heavy especially when processing large MR images. Since the self-attention mechanism of transformer models has quadratic computation complexity to the number of input patches, the considerable number of separated patches from a 3D MR image would bring an overwhelming computation load (a $36\times 512\times 512$ MRI matrix would be separated into $36,864$ patches). It is recommended to use small MR images when the computational resources are limited. As a future work, exploring the extension of MR-Transformer to diverse MRI classification tasks can be considered, aiming to investigate its generalizability and potential adaptations for various disease diagnoses.

The application of deep learning models in disease diagnosis signals the potential of artificial intelligence in medical fields. The proposed MR-Transformer, incorporating ImageNet pre-training and capturing three-dimensional spatial correlation, predicts TKR using MRI. It achieved superior AUC performance compared to TSE, 3DMeT, and MRNet across coronal and sagittal knee MR scans from the OAI and MOST databases, demonstrating its potential for TKR prediction. The open-source implementation of our method is available at https://github.com/denizlab/MR-Transformer.

References

[1] J. H. Kellgren, J. Lawrence, et al., “Radiological assessment of osteo-arthrosis,” Ann Rheum Dis 16(4), 494–502 (1957).
[2] R. Altman, E. Asch, D. Bloch, et al., “Development of criteria for the classification and reporting of osteoarthritis: classification of osteoarthritis of the knee,” Arthritis & Rheumatism: Official Journal of the American College of Rheumatology 29(8), 1039–1049 (1986).
[3] D. T. Felson and Y. Zhang, “An update on the epidemiology of knee and hip osteoarthritis with a view to prevention,” Arthritis & Rheumatism: Official Journal of the American College of Rheumatology 41(8), 1343–1355 (1998).
[4] H. R. Rajamohan, T. Wang, K. Leung, et al., “Prediction of total knee replacement using deep learning analysis of knee mri,” Scientific reports 13(1), 6922 (2023).
[5] A. A. Tolpadi, J. J. Lee, V. Pedoia, et al., “Deep learning predicts total knee replacement from magnetic resonance images,” Scientific reports 10(1), 6371 (2020).
[6] K. Leung, B. Zhang, J. Tan, et al., “Prediction of total knee replacement and diagnosis of osteoarthritis by using deep learning on knee radiographs: data from the osteoarthritis initiative,” Radiology 296(3), 584–593 (2020).
[7] C. Matsoukas, J. F. Haslum, M. Söderberg, et al., “Is it time to replace cnns with transformers for medical images?,” arXiv preprint arXiv:2108.09038 (2021).
[8] N. Bien, P. Rajpurkar, R. L. Ball, et al., “Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of mrnet,” PLoS medicine 15(11), e1002699 (2018).
[9] H. Touvron, M. Cord, M. Douze, et al., “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning, 10347–10357, PMLR (2021).
[10] J. Deng, W. Dong, R. Socher, et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 248–255, Ieee (2009).
[11] C. G. Peterfy, E. Schneider, and M. Nevitt, “The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee,” Osteoarthritis and cartilage 16(12), 1433–1441 (2008).
[12] N. A. Segal, M. C. Nevitt, K. D. Gross, et al., “The multicenter osteoarthritis study (most): opportunities for rehabilitation research,” PM & R: the journal of injury, function, and rehabilitation 5(8) (2013).
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems 25 (2012).
[14] S. Wang, Z. Zhuang, K. Xuan, et al., “3dmet: 3d medical image transformer for knee cartilage defect assessment,” in Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, 347–355, Springer (2021).
[15] R. R Core Team et al., “R: A language and environment for statistical computing,” (2013).
[16] S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,” arXiv preprint arXiv:2005.00928 (2020).