CIS 241, Dr. Ladd

Like in hypothesis testing, this is the opposite of what we’d expect!

Find all records that are the same as the one you care about. What’s the proportion of possible targets? The highest proportion is your prediction!

This is impractical, because very few records are identical.

For each possible target, find the individual conditional probabilities of every predictor. Multiply these probabilities by each other and by the number of records in the possible target class. Divide this by the sum of these values for all the classes. That gives you the probability!

This isn’t always true, but naive Bayes classifiers can still be useful.

Numerical variables would need to be “binned” into categories first.

`MultinomialNB`

class.Can we predict who survived based on some *categorical*
data?

Load the data and create `predictors`

and
`target`

variables.

This time we use *one-hot encoding*. We don’t need to
`drop_first`

.

Alpha sets smoothing to prevent problems with 0s. The default is 1,
but better to set it smaller. `fit_prior`

makes sure to use
prior probabilities (True is the default).

Import all the same metric functions as last time.

- Confusion Matrix
- Classification Report
- Cross-validation
- ROC Curve & AUC Score (binary classifier only)