RSNA Benchmark Dataset
RSNA Large Language Model Benchmark Dataset for Chest Radiographs of Cardiothoracic Disease: Radiologist Evaluation and Validation Enhanced by AI Labels (REVEAL-CXR)
arXiv:2601.15129 · January 2026
A curated benchmark of 200 chest radiograph studies (100 public, 100 holdout) with 12 cardiothoracic labels, derived from 13,735 deidentified MIDRC radiographs. Labels were extracted via GPT-4o and mapped by a locally-hosted LLM, then verified by 17 radiologists with at least three independent confirmations per study. Publicly available via the RSNA platform.
View on arXiv →