- Ensemble methods
- Bagging: fitting many trees on bootstrap samples and averaging predictions
- Out-of-bag error estimation: an alternative to cross-validation
- Random forests: decorrelating trees by using a random subset of predictors at each split

EDS 232
Lesson 9
Bagging and Random Forests
In this lesson
A guiding dataset
22 ecological survey sites, each characterized by:
rainfall — annual precipitation (mm)temperature — mean summer temperature (°C)biomass — aboveground plant biomass (tonnes/ha) (response)
The variance problem in decision trees
The variance problem in decision trees
A single decision tree suffers from high variance: small changes in the training data can lead to very different trees and very different predictions. Deep trees are prone to overfitting.
One general strategy for reducing variance:
Core idea: The more uncorrelated the individual predictions are, the more variance reduction we get from averaging.
The challenge: in practice we only have a single training set. Bootstrapping is a way around this.
Bootsrapping & bagging
Bootsrap
Bootstrapping: generate a new training set by sampling with replacement from the original training set

Bagging — Bootstrap AGGregatING
Bagging is a method to generate an ensemble model by:
Bagging for regression
Bagging for regression
Bagging for regression
Bagging for regression
Bagging for regression
Bagging for regression
\[\hat{f}_{\text{bag}}(x) = \frac{1}{B} \sum_{b=1}^{B} \hat{f}^{*b}(x)\]
Check-in
How would you modify the bagging workflow for a classification problem? What changes? Create the corresponding diagram.
Bagging for classification
In the case of classification, the workflow is the same as the regression case, but the final aggregation step changes:
Out-of-bag error estimation
Out-of-bag observations
When fitting each bootstrapped tree, some of the training observations are left out of that bootstrap sample. These are called the out-of-bag (OOB) observations for that tree.

Suppose your model has three trees fit on Bootstrap samples 1, 2, and 3 shown here. For which trees would observation 4 be an OOB observation?
Out-of-bag observations
When fitting each bootstrapped tree, some of the training observations are left out of that bootstrap sample. These are called the out-of-bag (OOB) observations for that tree.

Suppose your model has three trees fit on Bootstrap samples 1, 2, and 3 shown here. For which trees would observation 4 be an OOB observation?
Observation 4 does not appear in Bootstrap samples 2 and 3, so it is an OOB observation for those two trees.
Out-of-bag error
OOB observations give us a way to estimate test error without cross-validation:
The OOB error is valid because each observation is predicted only by trees that were not trained on it.
We can use this error estimate to examne model parameters such as the number of trees.
Choosing the number of trees B
Unlike tree depth, \(B\) is not a regularization parameter. A very large \(B\) does not cause overfitting.
We pick a \(B\) large enough that the OOB error stabilizes. 100 to 500 are common defaults.

Variable importance
Variable importance
With \(B\) trees we lose the direct interpretability of a single tree diagram. But we can still extract a useful summary of which predictors matter most.
For each predictor, record the total reduction in RSS (regression) or Gini impurity (classification) achieved by splits on that predictor, averaged over all \(B\) trees.
Predictors with larger average reductions are more important to the model’s predictions.
Check-in

The plot shows variable importance for a model of bagged regression trees (B=200) on our example data.
Check-in

The plot shows variable importance for a model of bagged regression trees (B=200) on our example data.
Which predictor is most important for predicting biomass?
When creating the trees for the ensemble, which predictor would you expect to be used in the first split for most of them?
Rainfall: it has the largest average RSS reduction across all trees.
Rainfall: consistent with the decision tree lesson, where the root node always split on rainfall first.
Random forests
Random forests: decorrelating the trees
Problem with bagging:
Random forests fix this by introducing randomness at each split:
No single strong predictor can dominate every tree → trees are more diverse → averaging produces a larger reduction in variance.
Check-in
Suppose you are building one tree in a random forest with predictors {rainfall, temperature, elevation, soil moisture} (\(p = 4\), \(m = 2\)).
At the root node the algorithm randomly selects {rainfall, elevation} and splits on rainfall.
At the next node, which predictors are available to split on?
A new random sample of 2 predictors is drawn independently: it could be any pair, including ones used before. The previous split does not restrict future splits.
Random forest vs. bagging vs. single tree

Check-in
If \(m = p\) (we use all predictors at every split), how does a random forest relate to bagging?
When would you expect a random forest to outperform bagging the most?
When \(m = p\), random forests reduce to bagging exactly. There is no restriction on which predictors can be considered, so every tree is built the same way as a bagged tree.
When there are one or a few dominant predictors: bagged trees become too similar to each other, while random forests force diversity by sometimes excluding the dominant predictor from consideration.