RSNA Benchmarks is a community-driven initiative to establish standardized, reproducible evaluation frameworks for frontier radiology AI models.
As vision language models rapidly advance, the field needs rigorous, multi-center benchmarks that reflect real-world clinical complexity. RSNA Benchmarks exists to fill this gap.
Our benchmarks are designed to be multisite and multi-diagnosis, drawing data and expertise from institutions worldwide. Each benchmark targets a specific clinical domain with carefully curated cases, consensus ground truth, and transparent evaluation metrics. Critically, our datasets are assembled to be representative of real-world clinical populations — capturing the diversity of pathology, patient demographics, and imaging conditions that practitioners encounter in daily practice.
By providing an open, community-governed resource grounded in clinical realism, we aim to accelerate responsible development and deployment of AI in radiology.
Every benchmark we create follows a consistent set of principles to ensure rigor, fairness, and clinical relevance.
Cases are sourced from multiple institutions to ensure diversity in imaging protocols, patient populations, and disease presentations.
Ground truth labels are established through multi-reader consensus with clearly documented adjudication protocols.
Evaluation criteria and scoring methods are fully documented, reproducible, and aligned with clinical significance.
Benchmark specifications, evaluation code, and aggregate results are openly available to the research community.
RSNA Benchmarks is guided by a steering committee of radiologists, AI researchers, and informatics experts.
Steering committee members to be announced.