What is MURA?

MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal.

Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. We hope that our dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.

MURA is one of the largest public radiographic image datasets. We're making this dataset available to the community and hosting a competition to see if your models can perform as well as radiologists on the task.

How can I participate?

MURA uses a hidden test set for official evaluation of models. Teams submit their executable code on Codalab, which is then run on a test set that is not publicly readable. Such a setup preserves the integrity of the test results.

Here's a tutorial walking you through official evaluation of your model. Once your model has been evaluated officially, your scores will be added to the leaderboard.

Leaderboard

Will your model perform as well as radiologists in detecting abnormalities in musculoskeletal X-rays?

RankDateModelKappa
Best Radiologist Performance Stanford University Rajpurkar & Irvin et al., 170.778
1
Jul 24, 2018he_j0.775
2
Aug 19, 2018ianpan0.774
3
Jun 17, 2018gcm (ensemble) Peking University 0.773
4
Jul 14, 2018Trs (single model) SCU_MILAB 0.763
5
Aug 21, 2018ellonde0.763
6
Jul 16, 2018null0.755
7
Jul 25, 2018DenseNet001 (single model) zhou 0.747
8
Aug 21, 2018base-AllParts-sq-tv(single) Avail 0.747
9
Jul 14, 2018type_resnet (single model) CCLab 0.746
10
Jun 18, 2018VGG19 (single model) ZHAW 0.744
11
Jul 02, 2018ImageXrModel-001 (single model) Ruslan Baikulov 0.737
12
Aug 12, 2018base169-AllParts-diffParts-tv ensemble 0.727
13
Aug 21, 2018base-AllParts-tv(single) Avail 0.717
14
Jul 18, 2018he_j0.712
15
Jul 24, 2018AIAPlus (ensemble) Taiwan AI Academy http://aiacademy.tw0.707
15
Jul 30, 2018dn169-baseline (single model) PKU 0.707
16
May 23, 2018Stanford Baseline (ensemble) Stanford University https://arxiv.org/abs/1712.069570.705
17
Jul 11, 2018he_j0.701
17
Jul 16, 2018type_inception2(single model) CCLab 0.701
18
Jun 24, 2018single-densenet169 single model 0.699
19
Aug 09, 2018base169-AllParts-diffParts ensemble 0.698
19
Aug 17, 2018base ensemble 0.698
19
Aug 21, 2018baseAllPartsDiffParts-sq ensemble 0.698
19
Jul 02, 2018Baseline169 (single model) Personal 0.698
20
Jul 22, 2018Densenet DI-MT Single 0.690
21
Jul 13, 2018dense169(ensemble) Availink 0.686
22
Jun 10, 2018ResNet (single model) UCSC CE graduate student huimin yan 0.675
22
Aug 18, 2018{monica_v1}(single model) Zzmonica 0.675
23
Jul 09, 2018null0.664
24
Jun 30, 2018Baseline169-v0.2 (single) Personal 0.659
25
Jul 08, 2018madcarrot0.653
26
Jun 30, 2018zhy0.638
27
Jun 30, 2018Densenet121 (single model) Personal 0.629
28
Jul 26, 2018Bhaukali_v1.0 (single model) IIT BHU, Varanasi 0.581
29
Jul 21, 2018inceptionv3-pci (single model) PCI 0.578
30
Jul 12, 2018DN169 single 0.574
31
Jul 31, 2018Densenet169-lite(single model) Tang 0.560
32
Jul 06, 2018DenseNet (single model) Zhou 0.518

How did we collect MURA?

MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. Each study was manually labeled as normal or abnormal by board-certified radiologists from the Stanford Hospital at the time of clinical radiographic interpretation in the diagnostic radiology environment between 2001 and 2012.

Test Set Collection

To evaluate models and get a robust estimate of radiologist performance, we collected additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. The radiologists individually retrospectively reviewed and labeled each study in the test set as a DICOM file as normal or abnormal in the clinical reading room environment using the PACS system. The radiologists have 8.83 years of experience on average ranging from 2 to 25 years. We randomly chose 3 of these radiologists to create a gold standard, defined as the majority vote of labels of the radiologists.

What is our baseline?

Our baseline uses a 169-layer convolutional neural network to detect and localize abnormalities. The model takes as input one or more views for a study of an upper extremity. On each view, our 169-layer convolutional neural network predicts the probability of abnormality. We compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model makes the binary prediction of abnormal if the probability of abnormality for the study is greater than 0.5.

The network uses a Dense Convolutional Network architecture, which connects each layer to every other layer in a feed-forward fashion to make the optimization of deep networks tractable. We replace the final fully connected layer with one that has a single output, after which we apply a sigmoid nonlinearity. We use Class Activation Maps to visualize the parts of the radiograph which contribute most to the model's prediction of abnormality.

How does our baseline do?

We evaluated our baseline on the Cohen’s kappa statistic, which expresses the agreement of the model with the gold standard. Baseline performance is comparable to radiologist performance in detecting abnormalities on finger studies and equivalent on wrist studies. However, baseline performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, shoulder studies, and overall, indicating that the task is a good challenge for future research.

Downloading the Dataset (v1.1)

Please read the Stanford University School of Medicine MURA Dataset Research Use Agreement. Once you register to download the MURA dataset, you will receive a link to the download over email. Note that you may not share the link to download the dataset with others.

Stanford University School of Medicine MURA Dataset Research Use Agreement

By registering for downloads from the MURA Dataset, you are agreeing to this Research Use Agreement, as well as to the Terms of Use of the Stanford University School of Medicine website as posted and updated periodically at http://www.stanford.edu/site/terms/.

1. Permission is granted to view and use the MURA Dataset without charge for personal, non-commercial research purposes only. Any commercial use, sale, or other monetization is prohibited.

2. Other than the rights granted herein, the Stanford University School of Medicine (“School of Medicine”) retains all rights, title, and interest in the MURA Dataset.

3. You may make a verbatim copy of the MURA Dataset for personal, non-commercial research use as permitted in this Research Use Agreement. If another user within your organization wishes to use the MURA Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

4. YOU MAY NOT DISTRIBUTE, PUBLISH, OR REPRODUCE A COPY of any portion or all of the MURA Dataset to others without specific prior written permission from the School of Medicine.

5. YOU MAY NOT SHARE THE DOWNLOAD LINK to the MURA dataset to others. If another user within your organization wishes to use the MURA Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

6. You must not modify, reverse engineer, decompile, or create derivative works from the MURA Dataset. You must not remove or alter any copyright or other proprietary notices in the MURA Dataset.

7. The MURA Dataset has not been reviewed or approved by the Food and Drug Administration, and is for non-clinical, Research Use Only. In no event shall data or images generated through the use of the MURA Dataset be used or relied upon in the diagnosis or provision of patient care.

8. THE MURA DATASET IS PROVIDED "AS IS," AND STANFORD UNIVERSITY AND ITS COLLABORATORS DO NOT MAKE ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, NOR DO THEY ASSUME ANY LIABILITY OR RESPONSIBILITY FOR THE USE OF THIS MURA DATASET.

9. You will not make any attempt to re-identify any of the individual data subjects. Re-identification of individuals is strictly prohibited. Any re-identification of any individual data subject shall be immediately reported to the School of Medicine.

10. Any violation of this Research Use Agreement or other impermissible use shall be grounds for immediate termination of use of this MURA Dataset. In the event that the School of Medicine determines that the recipient has violated this Research Use Agreement or other impermissible use has been made, the School of Medicine may direct that the undersigned data recipient immediately return all copies of the MURA Dataset and retain no copies thereof even if you did not cause the violation or impermissible use.

In consideration for your agreement to the terms and conditions contained here, Stanford grants you permission to view and use the MURA Dataset for personal, non-commercial research. You may not otherwise copy, reproduce, retransmit, distribute, publish, commercially exploit or otherwise transfer any material.

Limitation of Use

You may use MURA Dataset for legal purposes only.

You agree to indemnify and hold Stanford harmless from any claims, losses or damages, including legal fees, arising out of or resulting from your use of the MURA Dataset or your violation or role in violation of these Terms. You agree to fully cooperate in Stanford’s defense against any such claims. These Terms shall be governed by and interpreted in accordance with the laws of California.

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs.

Pranav Rajpurkar*, Jeremy Irvin*, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

If you have questions about our work, contact us at our google group.

Read the Paper