Predictive Uncertainty Estimation in the real world.

Estimating the uncertainty in the predictions of a machine learning model is crucial for production deployments in the real world. Not only do we want our models to make accurate predictions, but we also want a correct estimate of uncertainty along with each prediction. When model predictions are part of an automated decision-making workflow or production line, predictive uncertainty estimates are important for determining manual fallback alternatives or for human inspection and intervenion.

Probabilistic prediction (or probabilistic forecasting), which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties.

Compare the point predictions vs probabilistic predictions in the following examples.

QuestionPoint Prediction (No uncertainty estimate)Probabilistic Prediction (Uncertainty is implicit)
What will be the temperature at noon tomorrow?73.4 Fahrenheit
How long will this patient live?11.3 months

NGBoost brings predictive uncertainty estimation to Gradient Boosting.

Gradient Boosting methods have generally been among the top performers in predictive accuracy over structured or tabular input data.

NGBoost enables predictive uncertainty estimation with Gradient Boosting through probabilistic predictions (including real valued outputs). With the use of Natural Gradients, NGBoost overcomes technical challenges that make generic probabilistic prediction hard with gradient boosting.

We release an open source package of our implementation on GitHub.

Read our paperGitHubDocumentation

Simple and modular approach.

The NGBoost algorithm is simple to use. It has three abstract modular components that are chosen as configuration:

Base Learner

The most common choice is Decision Trees, which tend to work well on structured inputs.

Probability Distribution

The distribution needs to be compatible with the output type. For e.g. Normal distribution for real valued outputs, Bernoulli for binary outputs.

Scoring rule

Maximum Likelihood Estimation is an obvious choice. More robust rules such as Continuous Ranked Probability Score are also suitable.

The above choices can be mixed and matched to be customized for the specific prediction problem at hand.

The natural gradient makes learning efficient and effective.

Our key innovation is in employing the natural gradient to perform gradient boosting by casting it as a problem of determining the parameters of a probability distribution.

Ordinary gradients can be highly unsuitable for learning multi-parameter probability distributions (such as the Normal distribution). The training dynamics with the use of natural gradients tends to be much more stable and result in a better fit, as seen in the probabilistic regression example above.

Competitive performance in both uncertainty estimates and traditional metrics.

NGBoost requires far less expertise to use than competing methods, and performs as well on common benchmarks. NGBoost has particularly strong performance on smaller data sets.

Read our paperGitHub

If you have questions about our work, contact us at:

avati@cs.stanford.edu