CIS 241, Dr. Ladd
These slides don’t contain everything you will need to know! Make sure to refer to the links above as well as the Pandas cheatsheet.
It’s the code you write.
somevariable = 5 + 6
Libraries store reusable functions and methods. They need to be imported at the top of your code.
You can write Python in different places, but in this class we will write and run Python inside JupyterHub.
.ipynb
files.Files are organized in a hierarchy of directories, just like on your computer.
Python and Pandas both have syntax, the way that you code. But the syntax for each is a little different!
# In Python, anything can be stored in a variable
myvar = 5
myvar #or print(myvar)
# In Pandas, whole spreadsheets can be "read"
# and then stored in a variable
cars = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/taxis.csv')
cars
Data in Pandas can be loaded from a filename or a URL.
Use descriptive variable names, and avoid spaces!
newVar
that is equal to the value of five plus seven. Then display your variable to see what its value is.penguins
to hold the data available at https://jrladd.com/CIS241/data/penguins.csv
. Then display the DataFrame in Jupyter.Comments in Python begin with a #
symbol.
You should also use comments for citations!
"five"
)5
)5.0
)newVar
, that you created in the last exercise.# A list is surrounded by brackets and can contain any kind of data.
mylist = [5,6,7]
secondlist = ["cat","dog","fish"]
# Access items in a list
mylist[0]
secondlist[1]
# A Series is the Pandas version of a list and works the same way.
# Every column of a DataFrame is its own Series.
myseries = pd.Series(mylist)
myseries[0]
mydictionary = {"pet_name": "Fido", "age": 5, "pet_type": "dog"}
# Access items in a dictionary
mydictionary["pet_name"]
mydictionary["age"]
# Every DataFrame is made up of multiple Series columns
# You access these like keys in a dictionary.
taxis["distance"] # Dictionary notation
taxis.distance # Dot notation
# Or get a whole set of columns in a new DataFrame
prices = taxis[['fare','tip','tolls','total']]
prices
for
operator to iterate through a list.if
and else
to set conditions.Remember to save everything in variables
>
greater than>=
greater than or equal to<
less than<=
less than or equal to!=
not equal==
equal (note the double equals sign!)& “and”, | “or”, and ! “not”
for
loop and a conditional statement to create a new list of only the even numbers.append()
method.Python has many built-in functions.
# Some functions give a number result
sum([5,6,7])
mylist = [5,6,7]
sum(mylist)
len(mylist)
# But functions can do anything!
type(mydictionary)
Functions can do just about anything: calculate values, create graphs, transform data, etc.
.sort_values()
method lets you sort rows by value..rename()
lets you rename columns.Notice we kept the same variable name here!
.assign()
lets you add new columns based on existing ones.Sort the penguins dataframe by flipper length, with the shortest flippers at the top.
.groupby()
with summary statistics to make summary tables.Summary tables are new dataframes that summarize our original data.
This paradigm is known as split-apply-combine, and it’s key to data analysis.
Stat functions to use: mean(), median(), min(), max(), std()
.
It doesn’t look like anything on its own!
Create a summary table showing the average bill depth of penguins for each species.