CIS 241, Dr. Ladd
spacebar
to go to the next slide, esc
/menu to navigate
Let’s find out more from the sklearn
documentation.
plot_tree()
, let’s fit a tree model predicting species
and recreate this plot.But we will focus on their more common use as classifiers!
This is referred to as “variable (or feature) importance” and takes advantage of decision trees’ skill at finding patterns in the data.
But they are not so reliable one-at-a-time, and often cause overfitting. We need to think about the bias-variance tradeoff!
And what do you call a lot of trees? A forest!
You can see all the metaphors here: a forest, a musical ensemble, etc.
The decision trees are put together using “bagging”: bootstrap aggregating.
min_samples_leaf
: the minimum number of records in a terminal node (leaf)max_leaf_nodes
: the maximum number of nodes in the entire treesplitter
and criterion
: the kind of trees you will createSetting these can help you create smaller trees and avoid spurious results!
By now, you’re equipped to find out how to do this on your own, so let’s try an example.
Here’s a hint:
penguins
dataset. Good luck! 🌲🌳🌲🌳species
of the penguins.