What is MURA?

MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal.

Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. We hope that our dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.

MURA is one of the largest public radiographic image datasets. We're making this dataset available to the community and hosting a competition to see if your models can perform as well as radiologists on the task.

How can I participate?

Update: This competition is now closed.

MURA uses a hidden test set for official evaluation of models. Teams submit their executable code on Codalab, which is then run on a test set that is not publicly readable. Such a setup preserves the integrity of the test results.

Here's a tutorial walking you through official evaluation of your model. Once your model has been evaluated officially, your scores will be added to the leaderboard.

Leaderboard

Will your model perform as well as radiologists in detecting abnormalities in musculoskeletal X-rays?

RankDateModelKappa
Best Radiologist Performance Stanford University Rajpurkar & Irvin et al., 170.778
1
Nov 30, 2018base-comb2-xuan-v3(ensemble) jzhang Availink 0.843
2
Nov 06, 2018base-comb2-xuan(ensemble) jtzhang Availink 0.834
3
Oct 06, 2018muti_type (ensemble model) SCU_MILAB 0.833
4
Oct 02, 2018base-comb4(ensemble) jtzhang Availink 0.824
5
Nov 08, 2018base-comb2-jun2(ensemble)0.814
5
Nov 07, 2018base-comb2-ping(ensemble)0.814
6
Aug 22, 2018base-comb3(ensemble)0.805
7
Sep 14, 2018double_res(ensemble model) SCU_MILAB 0.804
8
Aug 24, 2018double-dense-Axy-Axyf512 ensemble 0.795
9
Jul 24, 2018densenet169_v2/single model0.775
10
Aug 19, 2018ianpan (ensemble) RIH 3D Lab 0.774
11
Jul 24, 2018inceptionv3/single model0.774
12
Jun 17, 2018gcm (ensemble) Peking University 0.773
12
Sep 10, 2018ty101 single model 0.773
13
Aug 31, 2018he_j0.764
13
Aug 31, 2018AIAPlus (ensemble) Taiwan AI Academy http://aiacademy.tw0.764
14
Sep 04, 2018SER_Net_Baseline (single model) SJTU 0.764
15
Jul 14, 2018Trs (single model) SCU_MILAB 0.763
16
Sep 12, 2018null0.763
16
Aug 21, 2018densenet single model unknown0.763
17
Jul 16, 2018null0.755
17
Aug 24, 2018dense-sep-xyz ensemble 0.755
18
Nov 16, 2018VGG19 single model 0.754
19
Jul 25, 2018DenseNet001 (single model) zhou 0.747
20
Aug 21, 2018dn169-Aftrva(single) AliHealth 0.747
21
Jul 14, 2018type_resnet (single model) CCLab 0.746
22
Dec 06, 2018res101_da_sqtv(single)0.746
23
Jun 18, 2018VGG19 (single model) ZHAW 0.744
24
Jul 02, 2018ImageXrModel-001 (single model) Ruslan Baikulov 0.737
25
Oct 04, 2018ExtremityModel ensemble 0.734
26
Jan 19, 2019DenseAttention (single model) BIT 0.727
26
Aug 12, 2018base169-AllParts-diffParts-tv(ensemble) MSFT-research 0.727
27
Sep 27, 2018aiinside0.725
27
Mar 14, 2019Resology14 (ensemble) Rology http://www.rology.net0.725
28
Dec 06, 2018inc3_sqtv(single) MIT AI 0.724
29
Aug 21, 2018base-model-Atv(single) Avail-AI 0.717
30
Dec 11, 2018incev3_xy(single) UCB 0.716
31
Jul 18, 2018nasnet_mobile/single model0.712
32
Mar 14, 2019kmle-second (ensemble) kmle 0.707
33
Jul 30, 2018dn169-baseline (single model) PKU 0.707
34
May 23, 2018Stanford Baseline (ensemble) Stanford University https://arxiv.org/abs/1712.069570.705
35
Mar 10, 2019asa_model_nasnetmo (single) toyohashi 0.702
36
Jul 11, 2018mobilenet/single model0.701
36
Jul 16, 2018type_inception2(single model) CCLab 0.701
37
Dec 06, 2018term2-model0sqtv(single)0.700
38
Jun 24, 2018single-densenet169 single model 0.699
39
Oct 26, 2018Joint-tv single 0.698
39
Aug 17, 2018base ensemble 0.698
39
Aug 21, 2018baseAllPartsDiffParts-sq ensemble 0.698
39
Aug 09, 2018base169-AllParts-diffParts(ensemble) MSFT-reseach 0.698
39
Jul 02, 2018Baseline169 (single model) Personal 0.698
40
Jan 18, 2019first-attempt-kmle (ensemble) kmle 0.696
41
Dec 31, 2018DenseNet_144 single model http://www.rology.net/0.694
42
Jul 22, 2018Densenet DI-MT Single 0.690
43
Jul 13, 2018dense169(ensemble) mitAI 0.686
44
Oct 27, 2018xception(single model) bimal 0.686
45
Dec 23, 2018{EnglebertDGC} (single model) UCLouvain 0.684
46
Dec 06, 2018res_daxy(single) CMU ml 0.680
47
Jan 16, 2019GoGoing (ensemble) Inner Mongolia University 0.678
48
Dec 11, 2018inceptionresnetv2_tv(single) baidu AI 0.676
49
Jun 10, 2018ResNet (single model) UCSC CE graduate student huimin yan 0.675
49
Aug 18, 2018{monica_v1}(single model) Zzmonica 0.675
50
Jul 09, 2018null0.664
51
Jan 11, 2019PFResNet (single model) USTC_Math_1222 0.664
52
Nov 29, 2018{DenseNet_169} (single model) Rology http://www.rology.net0.662
53
Jan 31, 2019DenseNet_v2 single model http://www.rology.net0.661
54
Sep 03, 2018DenseNet002 (single model) zhou 0.660
55
Nov 05, 2018DenseNet (single model) Rology http://www.rology.net/0.659
56
Jun 30, 2018Baseline169-v0.2 (single) Personal 0.659
57
Jul 08, 2018madcarrot0.653
58
Dec 06, 2018base_largexy(single) Tsinghua Deep Learning 0.652
59
Jun 30, 2018zhy0.638
60
Jun 30, 2018Densenet121 (single model) Personal 0.629
61
Oct 26, 2018baseJoint-tvsq(single) ali 0.624
62
Dec 31, 2018ConvNet single model http://www.rology.net/0.599
63
Jan 31, 2019Ensemble_V0 ensemble model http://www.rology.net/0.599
64
Aug 29, 2018Inception-ResNet-v2 (single model) Royal Holloway 0.597
64
Aug 28, 2018Inception-ResNet-v2 (single model) Royal Holloway 0.597
65
Jul 26, 2018Bhaukali_v1.0 (single model) IIT BHU, Varanasi 0.581
66
Jul 21, 2018inceptionv3-pci (single model) PCI 0.578
67
Jul 12, 2018DN169 single 0.574
68
Jul 31, 2018Densenet169-lite(single model) Tang 0.560
69
Aug 29, 2018ensemble1 ensemble 0.534
70
Jul 06, 2018DenseNet (single model) Zhou 0.518

How did we collect MURA?

MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. Each study was manually labeled as normal or abnormal by board-certified radiologists from the Stanford Hospital at the time of clinical radiographic interpretation in the diagnostic radiology environment between 2001 and 2012.

Test Set Collection

To evaluate models and get a robust estimate of radiologist performance, we collected additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. The radiologists individually retrospectively reviewed and labeled each study in the test set as a DICOM file as normal or abnormal in the clinical reading room environment using the PACS system. The radiologists have 8.83 years of experience on average ranging from 2 to 25 years. We randomly chose 3 of these radiologists to create a gold standard, defined as the majority vote of labels of the radiologists.

What is our baseline?

Our baseline uses a 169-layer convolutional neural network to detect and localize abnormalities. The model takes as input one or more views for a study of an upper extremity. On each view, our 169-layer convolutional neural network predicts the probability of abnormality. We compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model makes the binary prediction of abnormal if the probability of abnormality for the study is greater than 0.5.

The network uses a Dense Convolutional Network architecture, which connects each layer to every other layer in a feed-forward fashion to make the optimization of deep networks tractable. We replace the final fully connected layer with one that has a single output, after which we apply a sigmoid nonlinearity. We use Class Activation Maps to visualize the parts of the radiograph which contribute most to the model's prediction of abnormality.

How does our baseline do?

We evaluated our baseline on the Cohen’s kappa statistic, which expresses the agreement of the model with the gold standard. Baseline performance is comparable to radiologist performance in detecting abnormalities on finger studies and equivalent on wrist studies. However, baseline performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, shoulder studies, and overall, indicating that the task is a good challenge for future research.

Downloading the Dataset (v1.0)

Find the dataset on the Stanford AIMI website.

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs.

Pranav Rajpurkar*, Jeremy Irvin*, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

If you have questions about our work, contact us at our google group.

Read the Paper