Tuning
from ngboost import NGBClassifier, NGBRegressor
from ngboost.distns import k_categorical, Normal
from ngboost.scores import LogScore
from sklearn.datasets import load_breast_cancer, load_boston
from sklearn.model_selection import train_test_split
X, Y = load_boston(True)
X_reg_train, X_reg_test, Y_reg_train, Y_reg_test = train_test_split(X, Y, test_size=0.2)
X, y = load_breast_cancer(True)
y[0:15] = 2 # artificially make this a 3-class problem instead of a 2-class problem
X_cls_train, X_cls_test, Y_cls_train, Y_cls_test = train_test_split(X, y, test_size=0.2)
All fit NGBoost objects support staged prediction.
ngb_cls = NGBClassifier(Dist=k_categorical(3), Score=LogScore, n_estimators=500, verbose=False).fit(X_cls_train, Y_cls_train)
For instance, to get the predictions on the first 5 examples after fitting 415 base learners, use:
preds = ngb_cls.staged_predict(X_cls_test)
preds[415][0:5]
pred_dists = ngb_cls.staged_pred_dist(X_cls_test)
pred_dists[415][0:5].params
This is useful in conjunction with tracking errors on a validation set, which you can do by passing the X_val
and Y_val
arguments and then inspecting the .best_val_loss_itr
instance attribute.
ngb = NGBRegressor()
ngb.fit(X_reg_train, Y_reg_train, X_val=X_reg_test, Y_val=Y_reg_test) # use a validation set instead of test set here in your own work
print(ngb.best_val_loss_itr)
best_preds = ngb.predict(X_reg_test, max_iter=ngb.best_val_loss_itr)
NGBoost also has early stopping. If an integer early_stopping_rounds
and a validation set (X_val
,Y_val
) are passed to fit()
, the algorithm will stop running after the validation loss has increased for early_stopping_rounds
of consecutive iterations.
_ = NGBRegressor().fit(X_reg_train, Y_reg_train, X_val=X_reg_test, Y_val=Y_reg_test, early_stopping_rounds=2)
Validation set sample weights can be passed using the val_sample_weight
argument to fit
.
sklearn
methods are compatible with NGBoost.
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeRegressor
b1 = DecisionTreeRegressor(criterion='friedman_mse', max_depth=2)
b2 = DecisionTreeRegressor(criterion='friedman_mse', max_depth=4)
param_grid = {
'minibatch_frac': [1.0, 0.5],
'Base': [b1, b2]
}
ngb = NGBRegressor(Dist=Normal, verbose=False)
grid_search = GridSearchCV(ngb, param_grid=param_grid, cv=5)
grid_search.fit(X_reg_train, Y_reg_train)
print(grid_search.best_params_)