RSNA Benchmarks
  • About
  • Benchmarks
  • Publications
  • Community
  • Contact
Publications

Research & Documentation

Peer-reviewed publications, preprints, and technical reports from the RSNA Benchmarks initiative.

RSNA Benchmark Dataset

RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)

Wei Y, Flanders AE, Colak E, Mongan J, Prevedello LM, Chen PH, Lee HMH, Szarf G, Shoji H, Sho J, Andriole K, Cook T, Adams LC, Chu LC, Chung M, Brusca-Augello G, Deva DP, Singh N, Sanchez Tijmes F, Alpert JB, Nguyen ET, Torigian DA, Hanneman K, Groner LK, Phan A, Islam A, Callejas MF, Borges da Silva Teles G, Jamal F, Vazirabad M, Tejani A, Trivedi H, Kuriki P, Bhayana R, Benishay ET, Lin Y, Peng Y, Shih G

arXiv:2601.15129 · January 2026

A curated benchmark of 200 chest radiograph studies (100 public, 100 holdout) with 12 cardiothoracic labels, derived from 13,735 deidentified MIDRC radiographs. Labels were extracted via GPT-4o and mapped by a locally-hosted LLM, then verified by 17 radiologists with at least three independent confirmations per study. Publicly available via the RSNA platform.

Chest Radiograph LLM Benchmark MIDRC Multi-reader
View on arXiv →
RSNA Benchmarks

An open community initiative for rigorous evaluation of frontier radiology AI.

Project

  • About
  • Benchmarks
  • Publications
  • Documentation
  • GitHub

Community

  • Get Involved
  • Discussion Forum
  • Mailing List
  • Events

Connect

  • Contact
  • Twitter / X
  • LinkedIn

© 2026 RSNA Benchmarks. A community-driven initiative.

Built with purpose.