Publications — RSNA Benchmarks

RSNA Benchmark Dataset

RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)

Wei Y, Flanders AE, Colak E, Mongan J, Prevedello LM, Chen PH, Lee HMH, Szarf G, Shoji H, Sho J, Andriole K, Cook T, Adams LC, Chu LC, Chung M, Brusca-Augello G, Deva DP, Singh N, Sanchez Tijmes F, Alpert JB, Nguyen ET, Torigian DA, Hanneman K, Groner LK, Phan A, Islam A, Callejas MF, Borges da Silva Teles G, Jamal F, Vazirabad M, Tejani A, Trivedi H, Kuriki P, Bhayana R, Benishay ET, Lin Y, Peng Y, Shih G

arXiv:2601.15129 · January 2026

A curated benchmark of 200 chest radiograph studies (100 public, 100 holdout) with 12 cardiothoracic labels, derived from 13,735 deidentified MIDRC radiographs. Labels were extracted via GPT-4o and mapped by a locally-hosted LLM, then verified by 17 radiologists with at least three independent confirmations per study. Publicly available via the RSNA platform.

Chest Radiograph LLM Benchmark MIDRC Multi-reader

View on arXiv →

Research & Documentation

RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)