CIS 241, Dr. Ladd
Like in hypothesis testing, this is the opposite of what we’d expect!
Find all records that are the same as the one you care about. What’s the proportion of possible targets? The highest proportion is your prediction!
This is impractical, because very few records are identical.
For each possible target, find the individual conditional probabilities of every predictor. Multiply these probabilities by each other and by the number of records in the possible target class. Divide this by the sum of these values for all the classes. That gives you the probability!
This isn’t always true, but naive Bayes classifiers can still be useful.
Numerical variables would need to be “binned” into categories first.
MultinomialNB
class.Can we predict who survived based on some categorical data?
Load the data and create predictors
and
target
variables.
This time we use one-hot encoding. We don’t need to
drop_first
.
Alpha sets smoothing to prevent problems with 0s. The default is 1,
but better to set it smaller. fit_prior
makes sure to use
prior probabilities (True is the default).
Import all the same metric functions as last time.