What is CheXplanation?

CheXplanation is a radiologist-annotated segmentation dataset on chest X-rays and competition for automated pathology localization.

Read the Paper (Saporta, Gui & Agrawal et al.)

Why CheXplanation?

While deep learning has enabled automated medical imaging interpretation at a level shown to surpass that of practicing experts, the “black box” nature of neural networks represents a barrier to physicians’ trust and model adoption in the clinical setting. Therefore, to encourage the development and validation of more “interpretable” models for chest X-ray interpretation, we present a new radiologist-annotated segmentation dataset and a competition for pathology localization.

Leaderboard (coming soon)

Will your model perform as well as radiologists in localizing different pathologies in chest X-rays?

How can I participate?

CheXplanation uses a hidden test set with official evaluation of models. Teams submit binary segmentation masks per pathology which will be compared against the reference segmentations using mIoU (mean intersection over union). We also give teams the option to submit confidence scores relating to each pixel for pointing game evaluation.

Submission format and tutorial is in progress.

How did we produce the CheXplanation dataset?

The dataset’s chest X-rays came from CheXpert, a large public dataset for chest X-ray interpretation. We obtained pixel-level reference segmentations on the CheXpert validation and test sets from two board-certified radiologists. For each chest X-ray, the radiologists were asked to segment any of the following 10 conditions that were present in that chest X-ray, as determined by CheXpert’s ground-truth labels: Airspace Opacity, Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged Cardiomediastinum, Lung Lesion, Pleural Effusion, Pneumothorax, and Support Devices. We also established a human benchmark by collecting segmentations from three additional radiologists. These radiologists were also asked to segment the 10 conditions of interest present on each chest X-ray, as determined by CheXpert’s ground-truth labels.

What is our baseline model?

Our baseline model leveraged saliency method, a weakly-supervised learning technique to generate segmentations. For each chest X-ray, we used Grad-CAM to generate heatmaps for each of the ten pathologies. We then applied a threshold to the heat maps to produce binary segmentations in order to evaluate their overlap with the radiologist reference segmentations. You may read the full details in the linked paper.

* Note: Only positive pathologies are shown in our example figure.

Validation and Test Sets

The CheXpert validation set consists of 234 chest X-rays from 200 patients randomly sampled from the full dataset and was labeled according to the consensus of three board-certified radiologists. The test set consists of 668 chest X-rays from 500 patients not included in the training or validation sets and was labeled according to the consensus of five board-certified radiologists. 

We provide the original X-ray images and reference segmentations of the validation set. The test set will be used as a held-out set for leaderboard evaluation.

Downloading the Dataset (v1.0)

Please read the Stanford University School of Medicine CheXplanation Dataset Research Use Agreement. Once you register to download the CheXplanation dataset, you will receive a link to the download over email. Note that you may not share the link to download the dataset with others.

Stanford University School of Medicine CheXplanation Dataset Research Use Agreement

By registering for downloads from the CheXplanation Dataset, you are agreeing to this Research Use Agreement, as well as to the Terms of Use of the Stanford University School of Medicine website as posted and updated periodically at http://www.stanford.edu/site/terms/.

1. Permission is granted to view and use the CheXplanation Dataset without charge for personal, non-commercial research purposes only. Any commercial use, sale, or other monetization is prohibited.

2. Other than the rights granted herein, the Stanford University School of Medicine (“School of Medicine”) retains all rights, title, and interest in the CheXplanation Dataset.

3. You may make a verbatim copy of the CheXplanation Dataset for personal, non-commercial research use as permitted in this Research Use Agreement. If another user within your organization wishes to use the CheXplanationDataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

4. YOU MAY NOT DISTRIBUTE, PUBLISH, OR REPRODUCE A COPY of any portion or all of the CheXplanation Dataset to others without specific prior written permission from the School of Medicine.

5. YOU MAY NOT SHARE THE DOWNLOAD LINK to the CheXplanation dataset to others. If another user within your organization wishes to use the CheXplanation Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

6. You must not modify, reverse engineer, decompile, or create derivative works from the CheXplanation Dataset. You must not remove or alter any copyright or other proprietary notices in the CheXplanation Dataset.

7. The CheXplanation Dataset has not been reviewed or approved by the Food and Drug Administration, and is for non-clinical, Research Use Only. In no event shall data or images generated through the use of the CheXplanation Dataset be used or relied upon in the diagnosis or provision of patient care.


9. You will not make any attempt to re-identify any of the individual data subjects. Re-identification of individuals is strictly prohibited. Any re-identification of any individual data subject shall be immediately reported to the School of Medicine.

10. Any violation of this Research Use Agreement or other impermissible use shall be grounds for immediate termination of use of this CheXplanation Dataset. In the event that the School of Medicine determines that the recipient has violated this Research Use Agreement or other impermissible use has been made, the School of Medicine may direct that the undersigned data recipient immediately return all copies of the CheXplanation Dataset and retain no copies thereof even if you did not cause the violation or impermissible use.

In consideration for your agreement to the terms and conditions contained here, Stanford grants you permission to view and use the CheXplanation Dataset for personal, non-commercial research. You may not otherwise copy, reproduce, retransmit, distribute, publish, commercially exploit or otherwise transfer any material.

Limitation of Use

You may use CheXplanation Dataset for legal purposes only.

You agree to indemnify and hold Stanford harmless from any claims, losses or damages, including legal fees, arising out of or resulting from your use of the CheXplanation Dataset or your violation or role in violation of these Terms. You agree to fully cooperate in Stanford’s defense against any such claims. These Terms shall be governed by and interpreted in accordance with the laws of California.


* indicates required

Deep learning saliency maps do not accurately highlight diagnostically relevant regions for medical image interpretation

Adriel Saporta *, Xiaotong Gui *, Ashwin Agrawal *, Anuj Pareek, Steven QH Truong, Chanh DT Nguyen, Van-Doan Ngo, Jayne Seekins, Francis G Blankenberg, Andrew Ng, Matthew P Lungren, Pranav Rajpurkar

If you have questions about our work, contact us at our google group.

Read the Paper