Skip to main content

Boosting

The old Boosting, à la papa, from Freund and Schapire [1].

For classification, SAMME (Multi-class AdaBoost) and SAMME.R [2] from Ji Zhu are implemented.

For regression, R2 (Improving Regressors using Boosting Techniques) [3] from H. Drucker has been chosen.

BoostingRegressionModel proposes two different voting strategies to aggregate predictions from base models, one using the weighted median as described in [3], the other one using the weighted mean.

Parameters

The parameters available for Boosting are related to the loss function and the algorithm for weight computation.

import org.apache.spark.ml.classification.{BoostingClassifier, DecisionTreeClassifier}
import org.apache.spark.ml.regression.{BoostingRegressor, DecisionTreeRegressor}

new BoostingClassifier()
.setBaseLearner(new DecisionTreeClassifier()) //Base learner used by the meta-estimator.
.setNumBaseLearners(10) //Number of base learners.
.setAlgorithm("real") //SAMME or SAMME.R algorithm.

new BoostingRegressor()
.setBaseLearner(new DecisionTreeClassifier()) //Base learner used by the meta-estimator.
.setNumBaseLearners(10) //Number of base learners.
.setLoss("squared") //Loss function.
.setVotingStrategy("median") //Voting strategy.

References

  • [1] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
  • [2] Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class adaboost. Statistics and its Interface, 2(3), 349-360.
  • [3] Drucker, H. (1997, July). Improving regressors using boosting techniques. In ICML (Vol. 97, pp. 107-115).