Imaging4 min read

Deep learning measures Cobb angle to within 3 degrees of expert reads, with segmentation methods outperforming landmark methods

Source: Spine Deformity·Published: January 2025

Authors: Zhu Y, Yin X, Chen Z, Zhang H, Xu K, Zhang J, Wu N·DOI: 10.1007/s43390-024-00954-4Open Access

Key figure: Figure 4 — Forest plot of the 17 studies contributing to the meta-analysis, showing the overall pooled CMAE of 2.99 degrees and the subgroup advantage of segmentation-based models (2.40°) over landmark-based models (3.31°). View in source

Bottom line: Across 17 studies in meta-analysis, deep learning measured the Cobb angle to within about 3 degrees of expert radiologist reads (CMAE 2.99°, 95% CI 2.61 to 3.38). Segmentation models outperformed landmark models by roughly 1 degree. Clinical use still rests largely on single-center retrospective cohorts.

What the study did

The authors searched six databases through September 2023 for studies developing or evaluating deep learning algorithms to estimate Cobb angle on spinal radiographs. Fifty studies were included in the systematic review and seventeen contributed data to the meta-analysis. The primary outcome was circular mean absolute error (CMAE) relative to expert radiologist ground truth. A pre-specified subgroup compared segmentation-based methods (pixel-wise spine segmentation then angle calculation) against landmark-based methods (identification of vertebral corners). Risk of bias was assessed with QUADAS-2 and the protocol was registered in PROSPERO (CRD42023403057).

What they found

The pooled CMAE across 17 studies was 2.99° (95% CI 2.61 to 3.38), with high between-study heterogeneity (94%, p < 0.01). Segmentation-based models reached a CMAE of 2.40° (95% CI 1.85 to 2.95), significantly lower than the 3.31° (95% CI 2.89 to 3.72) achieved by landmark-based models (p < 0.01). Individual study CMAE ranged from 1.07° to 17.13°. Most included studies relied on convolutional architectures, with U-Net, ResNet, DeepLab V3+, and HRNet the most common.

Why it matters for orthopedic practice

The clinically relevant threshold for Cobb angle measurement error is generally considered to be 5 degrees, the range within which two expert human readers typically agree. Pooled deep learning performance now sits inside that window. For adolescent idiopathic scoliosis screening, a workflow where an algorithm produces an initial Cobb measurement and the treating physician confirms or adjusts is within reach based on the published accuracy. Segmentation architectures appear to be the stronger starting point for new clinical deployments.

Limitations

Only three of the 50 reviewed studies were prospective, and only one was multicenter. Most models were trained and evaluated on open challenge datasets or single-institution cohorts, leaving external validity unclear. Between-study heterogeneity was high, and publication bias could not be fully excluded. Measurement of error only captured agreement with a human read, not clinical outcomes such as treatment threshold decisions or surgical planning accuracy. Pediatric cohorts and severe curves above 50 degrees remain underrepresented in the literature.

Citation

Zhu Y, Yin X, Chen Z, Zhang H, Xu K, Zhang J, Wu N. Deep learning in Cobb angle automated measurement on X-rays: a systematic review and meta-analysis. Spine Deform. 2025;13(1):19-27. doi:10.1007/s43390-024-00954-4

More in Imaging →AI in Orthopedics hub →

Publishing AI research in orthopedics?

OSCRSJ accepts case reports and series on novel AI-assisted diagnoses and surgical planning. Free to publish in 2026.

Submit a manuscript

OSCRSJ News items are editorial summaries for educational purposes. They are not clinical recommendations, endorsements, or substitutes for the primary literature. Always consult the source paper and applicable specialty-society guidelines before changing practice.