About The Map — RSNA Benchmarks

What is The Map

The Diagnosis Map is a full-screen, interactive visualization of the radiology diagnostic landscape. It renders every diagnosis that RSNA Benchmarks tracks — currently 848 conditions across 10 body systems — as a field of colored squares, each one representing a single diagnosis.

The map exists to answer a simple question: how much of radiology has AI actually been tested on? The answer, visually, is striking. Of the hundreds of conditions radiologists encounter, only a small fraction have rigorous, multi-center benchmark data. The filled squares at the center of each cluster represent the diagnoses where we have real evaluation data. The vast ocean of dim squares surrounding them represents everything else — the unmapped territory.

How to Read It

Each square on the map represents one radiology diagnosis. The squares are organized into angular sectors — one wedge per body system — radiating outward from the center of the screen. Within each wedge, the most important diagnoses (those with active benchmarks) sit closest to the center.

There are three types of squares:

Solid filled squares — diagnoses with active, live benchmark data. These are conditions where AI models have been formally evaluated.
Outlined squares (with a breathing glow animation) — diagnoses that are planned for benchmarking. Data collection is underway or imminent.
Dim squares — the "ocean" of unmapped diagnoses. Real conditions that radiologists diagnose, but where no standardized AI evaluation exists yet.

Scroll up to zoom into the map. Squares expand, spacing increases, and text labels appear inside each square showing the diagnosis name. The zoom centers on your cursor position. Scroll down to zoom back out. You can also click and drag to pan around while zoomed in.

Body Systems

The map uses the 10 official RSNA body system designations. Each system is assigned a distinct color and occupies its own angular sector on the map. Hovering over a diagnosis shows its name, body system, and any relevant score for the active view mode.

NR — Neuro / Brain

CH — Chest / Lung

CV — Cardiovascular

GI — Abdomen / GI

MK — Musculoskeletal

GU — Genitourinary

HN — Head & Neck

BR — Breast

OB — Obstetrics

PD — Pediatric

Some diagnoses span two body systems. For example, a pediatric cardiac condition like Tetralogy of Fallot carries both a primary system (PD) and a secondary system (CV). The map positions it within its primary system's wedge.

View Modes

The toggle in the upper left switches between three view modes, each revealing a different dimension of the data:

System — the default view. Each square is colored by its body system. Active diagnoses are solid, planned are outlined, and ocean diagnoses are dim. This view emphasizes the sheer scope of the diagnostic landscape and how little of it has been formally benchmarked.
Performance — benchmarked diagnoses are colored on a gradient representing estimated AI diagnostic accuracy, from low (dark) to high (bright). Unbenchmarked diagnoses appear as faint outlines. A legend in the bottom-right maps color to score.
Commonality — every diagnosis is colored on a gradient representing how common or recognizable the condition is to a radiology trainee. The layout also changes: the most common diagnoses move to the center, and rare "zebra" diagnoses drift to the periphery.

Commonality

The commonality score is an abstract measure of how familiar a diagnosis would feel to a medical student or radiology trainee. It is not a strict epidemiological prevalence — it reflects teaching emphasis, clinical frequency, and cultural familiarity within medical education.

1.0 — universally known. Every medical student learns this. Examples: Pneumonia, Appendicitis, Myocardial Infarction.
0.7–0.9 — very common, taught early in training. Examples: Subdural Hematoma, Pleural Effusion, ACL Tear.
0.4–0.6 — solid knowledge, not everyday. Examples: Budd-Chiari Syndrome, Sarcoidosis, Cholesteatoma.
0.1–0.3 — specialized, encountered later in training. Examples: Moyamoya Disease, Chordoma, Epiploic Appendagitis.
<0.1 — true zebras. Fellowship-level rarities. Examples: Joubert Syndrome, Scimitar Syndrome, Adamantinoma, Mondor Disease.

In Commonality mode, the spatial layout shifts so that universally known diagnoses cluster at the center of the map and rare conditions scatter to the edges — giving a visceral sense of how medical knowledge is distributed.

AI Performance

The performance score (0.00 to 1.00) represents estimated AI diagnostic accuracy for benchmarked diagnoses. These scores reflect factors like how distinctive the imaging findings are, how much training data exists, and how well-defined the diagnostic criteria are on imaging.

Performance scores are currently estimated based on domain knowledge about imaging difficulty. As real benchmark evaluations are completed, these estimates will be replaced with measured values from multi-center model evaluations. Diagnoses without benchmark data (the ocean) have no performance score and appear as empty outlines in Performance mode.

The color gradient maps low performance (dark) to high performance (bright), making it easy to spot which conditions AI handles well and where it struggles.

What's Next

The Diagnosis Map is a living document. As new benchmarks are developed and evaluated, diagnoses will transition from ocean (unmapped) to planned (collecting data) to active (benchmarked). The goal is to systematically fill in the map — converting dim squares to bright ones — until the radiology community has a comprehensive, evidence-based picture of where AI works, where it doesn't, and where we simply don't know yet.

If you are interested in contributing data, participating in benchmark development, or evaluating your AI system against these benchmarks, please get in touch.

About The Diagnosis Map