Our Mission

As vision language models rapidly advance, the field needs rigorous, multi-center benchmarks that reflect real-world clinical complexity. RSNA Benchmarks exists to fill this gap.

Our benchmarks are designed to be multisite and multi-diagnosis, drawing data and expertise from institutions worldwide. Each benchmark targets a specific clinical domain with carefully curated cases, consensus ground truth, and transparent evaluation metrics. Critically, our datasets are assembled to be representative of real-world clinical populations — capturing the diversity of pathology, patient demographics, and imaging conditions that practitioners encounter in daily practice.

By providing an open, community-governed resource grounded in clinical realism, we aim to accelerate responsible development and deployment of AI in radiology.

Multi
Center Collaboration
Open
Community Resource
VLM
Vision Language Models
RSNA
Society Partnership

How we build benchmarks

Every benchmark we create follows a consistent set of principles to ensure rigor, fairness, and clinical relevance.

01

Multi-Center by Design

Cases are sourced from multiple institutions to ensure diversity in imaging protocols, patient populations, and disease presentations.

02

Consensus Ground Truth

Ground truth labels are established through multi-reader consensus with clearly documented adjudication protocols.

03

Transparent Metrics

Evaluation criteria and scoring methods are fully documented, reproducible, and aligned with clinical significance.

04

Open & Reproducible

Benchmark specifications, evaluation code, and aggregate results are openly available to the research community.

Steering Committee

RSNA Benchmarks is guided by a steering committee of radiologists, AI researchers, and informatics experts.

Steering committee members to be announced.