Unsupervised Learning & Principal Component Analysis 🍕🍕🍕

CIS 241, Dr. Ladd

spacebar to go to the next slide, esc/menu to navigate

What is Principal Component Analysis?

It groups or reduces the number of features (dimensions) in your data into a more manageable number of columns.

SVD is a linear algebra method for refactoring a matrix (dataset) into a smaller matrix by finding multiple lines of best fit.

It’s closely related to Factor Analysis.

n_components: the number of components the model will produce
whiten: an adjustment of results, good for when you will use components in another model
svd_solver: the exact SVD method you’ll use, usually can be left as ‘auto’
random_state

The resulting data will have the same number of rows but a new number of columns.

Put this data into a new dataframe to do something with it!

Select features and prepare data. Consider standardizations as well as null values.
Run PCA to reduce to 2 dimensions and plot the results.
Run PCA with 3 dimensions and use .components_ to assess results.
Use the 3-dimension PCA results to re-run last week’s K-means clustering.

Good luck! 🐧🐧🐧