CIS 241, Dr. Ladd
spacebar
to go to the next slide, esc
/menu to navigate
There are many different classifiers, and we’ll learn about:
The only difference is, now you’re using them to predict a category instead of a number.
You’ll use KNeighborsClassifier
, DecisionTreeClassifier
, and RandomForestClassifier
instead of the Regressor versions.
Using the wrong one will lead to an error!
Like in hypothesis testing, this is the opposite of what we’d expect!
Find all records that are the same as the one you care about. What’s the proportion of possible targets? The highest proportion is your prediction!
This is impractical, because very few records are identical.
For each possible target, find the individual conditional probabilities of every predictor. Multiply these probabilities by each other and by the number of records in the possible target class. Divide this by the sum of these values for all the classes. That gives you the probability!
This isn’t always true, but naive Bayes classifiers can still be useful.
Numerical variables would need to be “binned” into categories first.
MultinomialNB
class.Let’s create a Naive Bayes model to predict survival in the titanic
dataset.
Consider how you’d do this for KNN and Random Forest, too!
All four of our classifiers—Logistic Regression, KNN, Random Forest, and Naive Bayes—use the same validation methods.