Community-Driven Initiative

RSNA
Benchmarks

Multisite, multi-center, multi-diagnosis benchmarks for evaluating frontier radiology AI models and vision language models.

Rigorous evaluation for the
next generation of radiology AI

RSNA Benchmarks is a community-driven initiative to establish standardized, reproducible evaluation frameworks for frontier radiology AI models. As vision language models rapidly advance, the field needs rigorous, multi-center benchmarks that reflect real-world clinical complexity.

Our benchmarks are designed to be multisite and multi-diagnosis, drawing data and expertise from institutions worldwide. Each benchmark targets a specific clinical domain with carefully curated cases, consensus ground truth, and transparent evaluation metrics. Critically, our datasets are assembled to be representative of real-world clinical populations — capturing the diversity of pathology, patient demographics, and imaging conditions that practitioners encounter in daily practice.

By providing an open, community-governed resource grounded in clinical realism, we aim to accelerate responsible development and deployment of AI in radiology.

Multi
Center Collaboration
Open
Community Resource
VLM
Vision Language Models
RSNA
Society Partnership

Active & Upcoming Projects

Each benchmark is a structured evaluation covering specific clinical domains, modalities, and diagnostic tasks.

Planned

Chest X-Ray Benchmark

Multi-center evaluation of AI performance on frontal and lateral chest radiographs across a spectrum of thoracic pathology.

RadiographChestPlanned
Upcoming

Brain MRI Benchmark

Structured assessment of AI interpretation across neuro MRI sequences for common and critical neurological diagnoses.

MRINeuroUpcoming

Built by the community,
for the community

RSNA Benchmarks is an open initiative. We welcome contributions from radiologists, AI researchers, radiology AI vendors, regulators, and institutions worldwide.

Contribute Data

Share anonymized cases from your institution to strengthen benchmark diversity and clinical representativeness.

Learn More

Join Development

Help design evaluation frameworks, define ground truth protocols, and build the technical infrastructure.

Get Started

Evaluate Models

Run your models against our benchmarks and contribute results to the growing body of evaluation data.

Coming Soon