Decision Trees 🌳 and the Random Forest 🌳🌲🌳🌲

CIS 241, Dr. Ladd

spacebar to go to the next slide, esc/menu to navigate

What are Decision Trees?

Let’s find out more from the sklearn documentation.

But we will focus on their more common use as classifiers!

This is referred to as “variable (or feature) importance” and takes advantage of decision trees’ skill at finding patterns in the data.

But they are not so reliable one-at-a-time, and often cause overfitting. We need to think about the bias-variance tradeoff!

And what do you call a lot of trees? A forest!

You can see all the metaphors here: a forest, a musical ensemble, etc.

The decision trees are put together using “bagging”: bootstrap aggregating.

Setting these can help you create smaller trees and avoid spurious results!

By now, you’re equipped to find out how to do this on your own, so let’s try an example.

Here’s a hint:

from sklearn.ensemble import RandomForestClassifier

Just like the Decision Tree, you will predict the species of the penguins.
Use what you learned from the Decision Tree to determine your predictors and hyperparameters!
Fit a random forest classification model.
Do some out-of-sample validation of your model, using the usual metrics.