We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists.

Chest X-rays are currently the best available method for diagnosing pneumonia, playing a crucial role in clinical care and epidemiological studies. Pneumonia is responsible for more than 1 million hospitalizations and 50,000 deaths per year in the US alone.

Read our paper

Our model, CheXNet, is a 121-layer convolutional neural network that inputs a chest X-ray image and outputs the probability of pneumonia along with a heatmap localizing the areas of the image most indicative of pneumonia.

We train CheXNet on the recently released ChestX-ray14 dataset, which contains 112,120 frontal-view chest X-ray images individually labeled with up to 14 different thoracic diseases, including pneumonia. We use dense connections and batch normalization to make the optimization of such a deep network tractable.

We train on ChestX-ray14, the largest publicly available chest X- ray dataset.

The dataset, released by the NIH, contains 112,120 frontal-view X-ray images of 30,805 unique patients, annotated with up to 14 different thoracic pathology labels using NLP methods on radiology reports. We label images that have pneumonia as one of the annotated pathologies as positive examples and label all other images as negative examples for the pneumonia detection task.

We collected a test set of 420 frontal chest X-rays. Annotations were obtained independently from four practicing radiologists at Stanford University, who were asked to label all 14 pathologies, even though . We then evaluate the performance of an individual radiologist by using the majority vote of the other 3 radiologists as ground truth. Similarly, we evaluate CheXNet using the majority vote of 3 of 4 radiologists, repeated four times to cover all groups of 3.

We find that the model exceeds the average radiologist performance at the pneumonia detection task on both sensitivity and specificity.

ChexNet is tested against radiologists on sensitivity (which measures the proportion of positives that are correctly identified as such) and specificity (which measures the proportion of negatives that are correctly identified as such). A single radiologist’s performance is represented by an orange marker, while the average is represented by green. CheXNet outputs the probability of detecting pneumonia in a Chest X-ray, and the blue curve is generated by varying the thresholds used for the classification boundary. The sensitivity-specificity point for each radiologist and for the average lie below the blue curve, signifying that CheXNet is able to detect pneumonia at a level matching or exceeding radiologists.

With approximately 2 billion procedures per year, chest X-rays are the most common imaging examination tool used in practice, critical for screening, diagnosis, and management of diseases including pneumonia. However, an estimated two thirds of the global population lacks access to radiology diagnostics. With automation at the level of experts, we hope that this technology can improve healthcare delivery and increase access to medical imaging expertise in parts of the world where access to skilled radiologists is limited.

Read our paper

If you have questions about our work, contact us at:

pranavsr@cs.stanford.edu and jirvin16@cs.stanford.edu