Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review

Amanda Lans, Robertus J.B. Pierik, John R. Bales, Mitchell S. Fourman, David Shin, Laura N. Kanbier, Jack Rifkin, William H. DiGiovanni, Rohan R. Chopra, Rana Moeinzad, Jorrit Jan Verlaan, Joseph H. Schwab

Research output: Contribution to journalReview articlepeer-review

3 Scopus citations


Background: Machine learning (ML) models are emerging at a rapid pace in orthopaedic imaging due to their ability to facilitate timely diagnostic and treatment decision making. However, despite a considerable increase in model development and ML-related publications, there has been little evaluation regarding the quality of these studies. In order to successfully move forward with the implementation of ML models for diagnostic imaging in orthopaedics, it is imperative that we ensure models are held at a high standard and provide applicable, reliable and accurate results. Multiple reporting guidelines have been developed to help authors and reviewers of ML models, such as the Checklist for AI in Medical Imaging (CLAIM) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Previous investigations of prognostic orthopaedic ML models have reported concerns with regard to the rate of transparent reporting. Therefore, an assessment of whether ML models for diagnostic imaging in orthopaedics adequately and clearly report essential facets of their model development is warranted. Purposes: To evaluate (1) the completeness of the CLAIM checklist and (2) the risk of bias according to the QUADAS-2 tool for ML-based orthopaedic diagnostic imaging models. This study sought to identify ML details that researchers commonly fail to report and to provide recommendations to improve reporting standards for diagnostic imaging ML models. Methods: A systematic review was performed to identify ML-based diagnostic imaging models in orthopaedic surgery. Articles published within the last 5 years were included. Two reviewers independently extracted data using the CLAIM checklist and QUADAS-2 tool, and discrepancies were resolved by discussion with at least two additional reviewers. Results: After screening 7507 articles, 91 met the study criteria. The mean completeness of CLAIM items was 63 % (SD ± 28 %). Among the worst reported CLAIM items were item 28 (metrics of model performance), item 13 (the handling of missing data) and item 9 (data preprocessing steps), with only 2 % (2/91), 8 % (7/91) and 13 % (12/91) of studies correctly reporting these items, respectively. The QUADAS-2 tool revealed that the patient selection domain was at the highest risk of bias: 18 % (16/91) of studies were at high risk of bias and 32 % (29/91) had an unknown risk of bias. Conclusions: This review demonstrates that the reporting of relevant information, such as handling missing data and data preprocessing steps, by diagnostic ML studies for orthopaedic imaging studies is limited. Additionally, a substantial number of works were at high risk of bias. Future studies describing ML-based models for diagnostic imaging should adhere to acknowledged methodological standards to maximize the quality and applicability of their models.

Original languageEnglish (US)
Article number102396
JournalArtificial Intelligence in Medicine
StatePublished - Oct 2022
Externally publishedYes


  • Artificial intelligence
  • Machine learning
  • Medical imaging
  • Orthopaedics

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Artificial Intelligence


Dive into the research topics of 'Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review'. Together they form a unique fingerprint.

Cite this