{
"cells": [
{
"cell_type": "markdown",
"id": "1977b653-a13b-489b-b522-07689ac8d0bb",
"metadata": {},
"source": [
"# Pandas\n",
"\n",
"Pandas is a library for storing and manipulating tabular data, or data stored in rows and columns like a spreadsheet. Pandas is a huge library with many different functions and methods, so what follows is a brief introduction to the most important functions for data management.\n",
"\n",
"```{seealso}\n",
"If you encounter any part of Pandas out in the wild that you don't see here, you can always refer to the [Pandas documentation](https://pandas.pydata.org/docs/user_guide/index.html#user-guide).\n",
"```\n",
"\n",
"## DataFrames and Series\n",
"\n",
"Instead of normal Python lists and dictionaries, Pandas stores data in its own specialized objects. The main one is a *DataFrame*, which is a lot like a spreadsheet with rows and columns.\n",
"\n",
"You can create a DataFrame directly with the `DataFrame()` class in Pandas, but it's more likely that you'll read in a DataFrame from a CSV or spreadsheet file. First you must import the library, and it's a good idea to import the `numpy` library as well.\n",
"\n",
"```{note}\n",
"Numpy is a Python library for efficiently handling arrays and matrices of numbers. Pandas uses it under the hood to run quickly. You usually won't need to use it directly, but it's good to have it installed to avoid any mysterious errors.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a2852e73-f952-41e1-a670-d7138f2ca43b",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "26dc3974-38dd-4979-8d86-125b7faff853",
"metadata": {},
"source": [
"Now you can use the `read_csv()` function to read in a comma-separated value (CSV) spreadsheet file. You must put the name of this file in quotes, and the file should be in the same directory as your Jupyter notebook (or else you should include a full path). The `read_csv()` function will also accept a URL that points to a CSV file online.\n",
"\n",
"For this example, we'll use the file `mpg.csv` which comes from R's [ggplot2 library](https://www.rdocumentation.org/packages/ggplot2/versions/3.4.1/topics/mpg)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fe0d31e0-8aee-46cb-ac17-3aafa9e9024c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 11 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"4 audi a4 2.8 1999 6 auto(l5) f 16 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p \n",
"\n",
" class \n",
"0 compact \n",
"1 compact \n",
"2 compact \n",
"3 compact \n",
"4 compact \n",
".. ... \n",
"229 midsize \n",
"230 midsize \n",
"231 midsize \n",
"232 midsize \n",
"233 midsize \n",
"\n",
"[234 rows x 11 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg = pd.read_csv(\"../data/mpg.csv\")\n",
"mpg"
]
},
{
"cell_type": "markdown",
"id": "afed98c5-f210-4cc9-b1d3-79e79ae77fb6",
"metadata": {},
"source": [
"```{note}\n",
"Jupyter nicely formats DataFrames as tables when you type the name of a variable containing a DataFrame. But if you use the `print()` function, it won't display as well.\n",
"```\n",
"\n",
"You can get basic information about your DataFrames columns using the `.info()` method."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e8fcaa3a-6616-40f1-8f3b-98c6056378a2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 234 entries, 0 to 233\n",
"Data columns (total 11 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 manufacturer 234 non-null object \n",
" 1 model 234 non-null object \n",
" 2 displ 234 non-null float64\n",
" 3 year 234 non-null int64 \n",
" 4 cyl 234 non-null int64 \n",
" 5 trans 234 non-null object \n",
" 6 drv 234 non-null object \n",
" 7 cty 234 non-null int64 \n",
" 8 hwy 234 non-null int64 \n",
" 9 fl 234 non-null object \n",
" 10 class 234 non-null object \n",
"dtypes: float64(1), int64(4), object(6)\n",
"memory usage: 20.2+ KB\n"
]
}
],
"source": [
"mpg.info()"
]
},
{
"cell_type": "markdown",
"id": "f50d89b9-3697-4241-942f-0c26ac2d7d85",
"metadata": {},
"source": [
"A *Series* is a lot like a Python list, and each column of a DataFrame is a Series. You can access the columns of a Dataframe with dot notation."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3cf5b442-aed7-436d-a193-dd65a43420a6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 a4\n",
"1 a4\n",
"2 a4\n",
"3 a4\n",
"4 a4\n",
" ... \n",
"229 passat\n",
"230 passat\n",
"231 passat\n",
"232 passat\n",
"233 passat\n",
"Name: model, Length: 234, dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.model"
]
},
{
"cell_type": "markdown",
"id": "630181ac-3ba5-4f6c-a9cb-11bebc7fee79",
"metadata": {},
"source": [
"You can also turn a list into a Series with the `Series()` class."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "56c8fe04-db20-4d8f-b9d2-0cd6fe6382c5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 5\n",
"1 6\n",
"2 7\n",
"3 8\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myseries = pd.Series([5, 6, 7, 8])\n",
"myseries"
]
},
{
"cell_type": "markdown",
"id": "aee31823-e644-455a-bb06-39a0d1d68d82",
"metadata": {},
"source": [
"## Selecting Rows and Columns\n",
"\n",
"Once you have a DataFrame, you'll typically want to filter and select different rows or columns.\n",
"\n",
"To filter specific rows, Pandas uses a bracket notation. It takes conditional statements that are similar to [Python conditions](/python.html#conditions)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6a5f1c32-c585-47e5-99b7-823a06f547ad",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 7 | \n",
" audi | \n",
" a4 quattro | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" 4 | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 226 | \n",
" volkswagen | \n",
" new beetle | \n",
" 2.5 | \n",
" 2008 | \n",
" 5 | \n",
" auto(s6) | \n",
" f | \n",
" 20 | \n",
" 29 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 227 | \n",
" volkswagen | \n",
" passat | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 228 | \n",
" volkswagen | \n",
" passat | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
"
\n",
"
85 rows × 11 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"7 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"226 volkswagen new beetle 2.5 2008 5 auto(s6) f 20 29 r \n",
"227 volkswagen passat 1.8 1999 4 manual(m5) f 21 29 p \n",
"228 volkswagen passat 1.8 1999 4 auto(l5) f 18 29 p \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"\n",
" class \n",
"0 compact \n",
"1 compact \n",
"2 compact \n",
"3 compact \n",
"7 compact \n",
".. ... \n",
"226 subcompact \n",
"227 midsize \n",
"228 midsize \n",
"229 midsize \n",
"230 midsize \n",
"\n",
"[85 rows x 11 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get cars with fewer than 6 cylinders\n",
"four_cylinders = mpg[mpg.cyl < 6]\n",
"four_cylinders"
]
},
{
"cell_type": "markdown",
"id": "7884d017-b829-46b0-919d-1c0b6d93543f",
"metadata": {},
"source": [
"You can also use the operators `&` (and), `|` (or), and `!` (not) to combine conditional filters."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5eaa9859-1130-4921-9531-ae3a400f6b4c",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" 74 | \n",
" ford | \n",
" expedition 2wd | \n",
" 4.6 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" r | \n",
" 11 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 75 | \n",
" ford | \n",
" expedition 2wd | \n",
" 5.4 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" r | \n",
" 11 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 76 | \n",
" ford | \n",
" expedition 2wd | \n",
" 5.4 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l6) | \n",
" r | \n",
" 12 | \n",
" 18 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 77 | \n",
" ford | \n",
" explorer 4wd | \n",
" 4.0 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 78 | \n",
" ford | \n",
" explorer 4wd | \n",
" 4.0 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" 4 | \n",
" 15 | \n",
" 19 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 79 | \n",
" ford | \n",
" explorer 4wd | \n",
" 4.0 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 80 | \n",
" ford | \n",
" explorer 4wd | \n",
" 4.0 | \n",
" 2008 | \n",
" 6 | \n",
" auto(l5) | \n",
" 4 | \n",
" 13 | \n",
" 19 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 81 | \n",
" ford | \n",
" explorer 4wd | \n",
" 4.6 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l6) | \n",
" 4 | \n",
" 13 | \n",
" 19 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 82 | \n",
" ford | \n",
" explorer 4wd | \n",
" 5.0 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 83 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 4.2 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l4) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 84 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 4.2 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 85 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 4.6 | \n",
" 1999 | \n",
" 8 | \n",
" manual(m5) | \n",
" 4 | \n",
" 13 | \n",
" 16 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 86 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 4.6 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 16 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 87 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 4.6 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 88 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 5.4 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 11 | \n",
" 15 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 89 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 5.4 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 90 | \n",
" ford | \n",
" mustang | \n",
" 3.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" r | \n",
" 18 | \n",
" 26 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 91 | \n",
" ford | \n",
" mustang | \n",
" 3.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l4) | \n",
" r | \n",
" 18 | \n",
" 25 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 92 | \n",
" ford | \n",
" mustang | \n",
" 4.0 | \n",
" 2008 | \n",
" 6 | \n",
" manual(m5) | \n",
" r | \n",
" 17 | \n",
" 26 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 93 | \n",
" ford | \n",
" mustang | \n",
" 4.0 | \n",
" 2008 | \n",
" 6 | \n",
" auto(l5) | \n",
" r | \n",
" 16 | \n",
" 24 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 94 | \n",
" ford | \n",
" mustang | \n",
" 4.6 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" r | \n",
" 15 | \n",
" 21 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 95 | \n",
" ford | \n",
" mustang | \n",
" 4.6 | \n",
" 1999 | \n",
" 8 | \n",
" manual(m5) | \n",
" r | \n",
" 15 | \n",
" 22 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 96 | \n",
" ford | \n",
" mustang | \n",
" 4.6 | \n",
" 2008 | \n",
" 8 | \n",
" manual(m5) | \n",
" r | \n",
" 15 | \n",
" 23 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 97 | \n",
" ford | \n",
" mustang | \n",
" 4.6 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l5) | \n",
" r | \n",
" 15 | \n",
" 22 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 98 | \n",
" ford | \n",
" mustang | \n",
" 5.4 | \n",
" 2008 | \n",
" 8 | \n",
" manual(m6) | \n",
" r | \n",
" 14 | \n",
" 20 | \n",
" p | \n",
" subcompact | \n",
"
\n",
" \n",
" 207 | \n",
" volkswagen | \n",
" gti | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 208 | \n",
" volkswagen | \n",
" gti | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 19 | \n",
" 26 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 209 | \n",
" volkswagen | \n",
" gti | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 210 | \n",
" volkswagen | \n",
" gti | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 22 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 211 | \n",
" volkswagen | \n",
" gti | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 17 | \n",
" 24 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 212 | \n",
" volkswagen | \n",
" jetta | \n",
" 1.9 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 33 | \n",
" 44 | \n",
" d | \n",
" compact | \n",
"
\n",
" \n",
" 213 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 214 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 19 | \n",
" 26 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 215 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 22 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 216 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 217 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.5 | \n",
" 2008 | \n",
" 5 | \n",
" auto(s6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 218 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.5 | \n",
" 2008 | \n",
" 5 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 219 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l4) | \n",
" f | \n",
" 16 | \n",
" 23 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 220 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 17 | \n",
" 24 | \n",
" r | \n",
" compact | \n",
"
\n",
" \n",
" 221 | \n",
" volkswagen | \n",
" new beetle | \n",
" 1.9 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 35 | \n",
" 44 | \n",
" d | \n",
" subcompact | \n",
"
\n",
" \n",
" 222 | \n",
" volkswagen | \n",
" new beetle | \n",
" 1.9 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 29 | \n",
" 41 | \n",
" d | \n",
" subcompact | \n",
"
\n",
" \n",
" 223 | \n",
" volkswagen | \n",
" new beetle | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 224 | \n",
" volkswagen | \n",
" new beetle | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 19 | \n",
" 26 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 225 | \n",
" volkswagen | \n",
" new beetle | \n",
" 2.5 | \n",
" 2008 | \n",
" 5 | \n",
" manual(m5) | \n",
" f | \n",
" 20 | \n",
" 28 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 226 | \n",
" volkswagen | \n",
" new beetle | \n",
" 2.5 | \n",
" 2008 | \n",
" 5 | \n",
" auto(s6) | \n",
" f | \n",
" 20 | \n",
" 29 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 227 | \n",
" volkswagen | \n",
" passat | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 228 | \n",
" volkswagen | \n",
" passat | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy \\\n",
"74 ford expedition 2wd 4.6 1999 8 auto(l4) r 11 17 \n",
"75 ford expedition 2wd 5.4 1999 8 auto(l4) r 11 17 \n",
"76 ford expedition 2wd 5.4 2008 8 auto(l6) r 12 18 \n",
"77 ford explorer 4wd 4.0 1999 6 auto(l5) 4 14 17 \n",
"78 ford explorer 4wd 4.0 1999 6 manual(m5) 4 15 19 \n",
"79 ford explorer 4wd 4.0 1999 6 auto(l5) 4 14 17 \n",
"80 ford explorer 4wd 4.0 2008 6 auto(l5) 4 13 19 \n",
"81 ford explorer 4wd 4.6 2008 8 auto(l6) 4 13 19 \n",
"82 ford explorer 4wd 5.0 1999 8 auto(l4) 4 13 17 \n",
"83 ford f150 pickup 4wd 4.2 1999 6 auto(l4) 4 14 17 \n",
"84 ford f150 pickup 4wd 4.2 1999 6 manual(m5) 4 14 17 \n",
"85 ford f150 pickup 4wd 4.6 1999 8 manual(m5) 4 13 16 \n",
"86 ford f150 pickup 4wd 4.6 1999 8 auto(l4) 4 13 16 \n",
"87 ford f150 pickup 4wd 4.6 2008 8 auto(l4) 4 13 17 \n",
"88 ford f150 pickup 4wd 5.4 1999 8 auto(l4) 4 11 15 \n",
"89 ford f150 pickup 4wd 5.4 2008 8 auto(l4) 4 13 17 \n",
"90 ford mustang 3.8 1999 6 manual(m5) r 18 26 \n",
"91 ford mustang 3.8 1999 6 auto(l4) r 18 25 \n",
"92 ford mustang 4.0 2008 6 manual(m5) r 17 26 \n",
"93 ford mustang 4.0 2008 6 auto(l5) r 16 24 \n",
"94 ford mustang 4.6 1999 8 auto(l4) r 15 21 \n",
"95 ford mustang 4.6 1999 8 manual(m5) r 15 22 \n",
"96 ford mustang 4.6 2008 8 manual(m5) r 15 23 \n",
"97 ford mustang 4.6 2008 8 auto(l5) r 15 22 \n",
"98 ford mustang 5.4 2008 8 manual(m6) r 14 20 \n",
"207 volkswagen gti 2.0 1999 4 manual(m5) f 21 29 \n",
"208 volkswagen gti 2.0 1999 4 auto(l4) f 19 26 \n",
"209 volkswagen gti 2.0 2008 4 manual(m6) f 21 29 \n",
"210 volkswagen gti 2.0 2008 4 auto(s6) f 22 29 \n",
"211 volkswagen gti 2.8 1999 6 manual(m5) f 17 24 \n",
"212 volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 \n",
"213 volkswagen jetta 2.0 1999 4 manual(m5) f 21 29 \n",
"214 volkswagen jetta 2.0 1999 4 auto(l4) f 19 26 \n",
"215 volkswagen jetta 2.0 2008 4 auto(s6) f 22 29 \n",
"216 volkswagen jetta 2.0 2008 4 manual(m6) f 21 29 \n",
"217 volkswagen jetta 2.5 2008 5 auto(s6) f 21 29 \n",
"218 volkswagen jetta 2.5 2008 5 manual(m5) f 21 29 \n",
"219 volkswagen jetta 2.8 1999 6 auto(l4) f 16 23 \n",
"220 volkswagen jetta 2.8 1999 6 manual(m5) f 17 24 \n",
"221 volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 \n",
"222 volkswagen new beetle 1.9 1999 4 auto(l4) f 29 41 \n",
"223 volkswagen new beetle 2.0 1999 4 manual(m5) f 21 29 \n",
"224 volkswagen new beetle 2.0 1999 4 auto(l4) f 19 26 \n",
"225 volkswagen new beetle 2.5 2008 5 manual(m5) f 20 28 \n",
"226 volkswagen new beetle 2.5 2008 5 auto(s6) f 20 29 \n",
"227 volkswagen passat 1.8 1999 4 manual(m5) f 21 29 \n",
"228 volkswagen passat 1.8 1999 4 auto(l5) f 18 29 \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 \n",
"\n",
" fl class \n",
"74 r suv \n",
"75 r suv \n",
"76 r suv \n",
"77 r suv \n",
"78 r suv \n",
"79 r suv \n",
"80 r suv \n",
"81 r suv \n",
"82 r suv \n",
"83 r pickup \n",
"84 r pickup \n",
"85 r pickup \n",
"86 r pickup \n",
"87 r pickup \n",
"88 r pickup \n",
"89 r pickup \n",
"90 r subcompact \n",
"91 r subcompact \n",
"92 r subcompact \n",
"93 r subcompact \n",
"94 r subcompact \n",
"95 r subcompact \n",
"96 r subcompact \n",
"97 r subcompact \n",
"98 p subcompact \n",
"207 r compact \n",
"208 r compact \n",
"209 p compact \n",
"210 p compact \n",
"211 r compact \n",
"212 d compact \n",
"213 r compact \n",
"214 r compact \n",
"215 p compact \n",
"216 p compact \n",
"217 r compact \n",
"218 r compact \n",
"219 r compact \n",
"220 r compact \n",
"221 d subcompact \n",
"222 d subcompact \n",
"223 r subcompact \n",
"224 r subcompact \n",
"225 r subcompact \n",
"226 r subcompact \n",
"227 p midsize \n",
"228 p midsize \n",
"229 p midsize \n",
"230 p midsize \n",
"231 p midsize \n",
"232 p midsize \n",
"233 p midsize "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get Volkswagens and Fords\n",
"vw_ford = mpg[(mpg.manufacturer == 'volkswagen') | (mpg.manufacturer == 'ford')]\n",
"vw_ford"
]
},
{
"cell_type": "markdown",
"id": "948d7008-be09-4238-b223-ba44a00080f2",
"metadata": {},
"source": [
"You can use a double bracket notation to select a subset of columns.\n",
"\n",
"```{note}\n",
"Using single brackets or dot notation will get you a single column as a Series.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "fc9e81e4-f6d5-4264-be69-2d77405de3d1",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" class | \n",
" cty | \n",
" hwy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" compact | \n",
" 18 | \n",
" 29 | \n",
"
\n",
" \n",
" 1 | \n",
" compact | \n",
" 21 | \n",
" 29 | \n",
"
\n",
" \n",
" 2 | \n",
" compact | \n",
" 20 | \n",
" 31 | \n",
"
\n",
" \n",
" 3 | \n",
" compact | \n",
" 21 | \n",
" 30 | \n",
"
\n",
" \n",
" 4 | \n",
" compact | \n",
" 16 | \n",
" 26 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" midsize | \n",
" 19 | \n",
" 28 | \n",
"
\n",
" \n",
" 230 | \n",
" midsize | \n",
" 21 | \n",
" 29 | \n",
"
\n",
" \n",
" 231 | \n",
" midsize | \n",
" 16 | \n",
" 26 | \n",
"
\n",
" \n",
" 232 | \n",
" midsize | \n",
" 18 | \n",
" 26 | \n",
"
\n",
" \n",
" 233 | \n",
" midsize | \n",
" 17 | \n",
" 26 | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 3 columns
\n",
"
"
],
"text/plain": [
" class cty hwy\n",
"0 compact 18 29\n",
"1 compact 21 29\n",
"2 compact 20 31\n",
"3 compact 21 30\n",
"4 compact 16 26\n",
".. ... ... ...\n",
"229 midsize 19 28\n",
"230 midsize 21 29\n",
"231 midsize 16 26\n",
"232 midsize 18 26\n",
"233 midsize 17 26\n",
"\n",
"[234 rows x 3 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class_cty_hwy = mpg[[\"class\", \"cty\", \"hwy\"]]\n",
"class_cty_hwy"
]
},
{
"cell_type": "markdown",
"id": "9875bbd3-cb4e-416d-82d1-1092210ac696",
"metadata": {},
"source": [
"## Data Wrangling\n",
"\n",
"In addtion to selecting rows and columns from DataFrames, you can also use Pandas to do a wide variety of data transformations.\n",
"\n",
"### Sorting"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "15054f8a-7700-4ffe-8f24-f7af5b8aad5d",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" 117 | \n",
" hyundai | \n",
" tiburon | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 20 | \n",
" 28 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 120 | \n",
" hyundai | \n",
" tiburon | \n",
" 2.7 | \n",
" 2008 | \n",
" 6 | \n",
" manual(m6) | \n",
" f | \n",
" 16 | \n",
" 24 | \n",
" r | \n",
" subcompact | \n",
"
\n",
" \n",
" 122 | \n",
" jeep | \n",
" grand cherokee 4wd | \n",
" 3.0 | \n",
" 2008 | \n",
" 6 | \n",
" auto(l5) | \n",
" 4 | \n",
" 17 | \n",
" 22 | \n",
" d | \n",
" suv | \n",
"
\n",
" \n",
" 123 | \n",
" jeep | \n",
" grand cherokee 4wd | \n",
" 3.7 | \n",
" 2008 | \n",
" 6 | \n",
" auto(l5) | \n",
" 4 | \n",
" 15 | \n",
" 19 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 126 | \n",
" jeep | \n",
" grand cherokee 4wd | \n",
" 4.7 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l5) | \n",
" 4 | \n",
" 9 | \n",
" 12 | \n",
" e | \n",
" suv | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 130 | \n",
" land rover | \n",
" range rover | \n",
" 4.0 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 11 | \n",
" 15 | \n",
" p | \n",
" suv | \n",
"
\n",
" \n",
" 50 | \n",
" dodge | \n",
" dakota pickup 4wd | \n",
" 3.9 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 51 | \n",
" dodge | \n",
" dakota pickup 4wd | \n",
" 3.9 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
"
\n",
" \n",
" 125 | \n",
" jeep | \n",
" grand cherokee 4wd | \n",
" 4.7 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 14 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
"
\n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 11 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty \\\n",
"117 hyundai tiburon 2.0 2008 4 manual(m5) f 20 \n",
"120 hyundai tiburon 2.7 2008 6 manual(m6) f 16 \n",
"122 jeep grand cherokee 4wd 3.0 2008 6 auto(l5) 4 17 \n",
"123 jeep grand cherokee 4wd 3.7 2008 6 auto(l5) 4 15 \n",
"126 jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 9 \n",
".. ... ... ... ... ... ... .. ... \n",
"130 land rover range rover 4.0 1999 8 auto(l4) 4 11 \n",
"50 dodge dakota pickup 4wd 3.9 1999 6 auto(l4) 4 13 \n",
"51 dodge dakota pickup 4wd 3.9 1999 6 manual(m5) 4 14 \n",
"125 jeep grand cherokee 4wd 4.7 1999 8 auto(l4) 4 14 \n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 \n",
"\n",
" hwy fl class \n",
"117 28 r subcompact \n",
"120 24 r subcompact \n",
"122 22 d suv \n",
"123 19 r suv \n",
"126 12 e suv \n",
".. ... .. ... \n",
"130 15 p suv \n",
"50 17 r pickup \n",
"51 17 r pickup \n",
"125 17 r suv \n",
"0 29 p compact \n",
"\n",
"[234 rows x 11 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.sort_values(\"year\", ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "c2304da5-1edb-4ee5-beb0-80b9c241a17c",
"metadata": {},
"source": [
"### Counting"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f9cb36ba-bac5-4834-8c7a-c88820655808",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"manufacturer\n",
"dodge 37\n",
"toyota 34\n",
"volkswagen 27\n",
"ford 25\n",
"chevrolet 19\n",
"audi 18\n",
"hyundai 14\n",
"subaru 14\n",
"nissan 13\n",
"honda 9\n",
"jeep 8\n",
"pontiac 5\n",
"land rover 4\n",
"mercury 4\n",
"lincoln 3\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.value_counts(\"manufacturer\")"
]
},
{
"cell_type": "markdown",
"id": "cc5d90e4-0f1f-4f4a-a9a7-a69ab4feb0a0",
"metadata": {},
"source": [
"### Renaming Columns"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "650b6f1e-f66e-47d7-b209-a5d8113259e3",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 11 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"4 audi a4 2.8 1999 6 auto(l5) f 16 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p \n",
"\n",
" class \n",
"0 compact \n",
"1 compact \n",
"2 compact \n",
"3 compact \n",
"4 compact \n",
".. ... \n",
"229 midsize \n",
"230 midsize \n",
"231 midsize \n",
"232 midsize \n",
"233 midsize \n",
"\n",
"[234 rows x 11 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Note the use of a Python dictionary as this method's argument\n",
"mpg = mpg.rename({\"cty\":\"city\", \"hwy\": \"highway\"})\n",
"mpg"
]
},
{
"cell_type": "markdown",
"id": "aecb7749-7d10-48f1-a6f6-b40ea6a31236",
"metadata": {},
"source": [
"### Create new columns\n",
"\n",
"You can use `assign()` to create new columns based on existing ones."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "29d811a1-bab8-47a3-a071-144fb7a506d5",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
" 0.466667 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.600000 | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 12 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"4 audi a4 2.8 1999 6 auto(l5) f 16 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p \n",
"\n",
" class displ_per_cyl \n",
"0 compact 0.450000 \n",
"1 compact 0.450000 \n",
"2 compact 0.500000 \n",
"3 compact 0.500000 \n",
"4 compact 0.466667 \n",
".. ... ... \n",
"229 midsize 0.500000 \n",
"230 midsize 0.500000 \n",
"231 midsize 0.466667 \n",
"232 midsize 0.466667 \n",
"233 midsize 0.600000 \n",
"\n",
"[234 rows x 12 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg = mpg.assign(displ_per_cyl = mpg.displ/mpg.cyl)\n",
"mpg"
]
},
{
"cell_type": "markdown",
"id": "922d3f76-0cf1-4d1b-83f0-936f45a43c3d",
"metadata": {},
"source": [
"### Grouping and Summarizing\n",
"\n",
"This combines a couple functions that exist within Pandas to create *summary tables*.\n",
"\n",
"Pandas has a wide range of summary statistics that you can apply to individual columns."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ab1c6e56-acb0-40bd-ac8c-70b935ed3795",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"16.858974358974358"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Average city fuel efficiency\n",
"mpg.cty.mean()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "78b0d4a6-dd76-4896-9d56-19710311d2ab",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"5.9546434411664455"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Standard deviation of highway fuel efficiency\n",
"mpg.hwy.std()"
]
},
{
"cell_type": "markdown",
"id": "76f33e50-2f45-4cdc-abc2-50123fad5c91",
"metadata": {},
"source": [
"Pandas also has a `.groupby()` method (which returns a generator) that groups categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "09769bc5-8fb9-4c3d-8d4c-032248dcb931",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.groupby(\"manufacturer\")"
]
},
{
"cell_type": "markdown",
"id": "7613376b-bcf9-4938-86fa-751a21f6ce0c",
"metadata": {},
"source": [
"By itself, `.groupby()` doesn't show anything. It needs to be combined with a summary statistic to create a summary table."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "047aebe0-9dcf-4161-8568-bb8564b2e62d",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" cty | \n",
" hwy | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" manufacturer | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" audi | \n",
" 2.544444 | \n",
" 2003.500000 | \n",
" 5.222222 | \n",
" 17.611111 | \n",
" 26.444444 | \n",
" 0.484722 | \n",
"
\n",
" \n",
" chevrolet | \n",
" 5.063158 | \n",
" 2004.684211 | \n",
" 7.263158 | \n",
" 15.000000 | \n",
" 21.894737 | \n",
" 0.686842 | \n",
"
\n",
" \n",
" dodge | \n",
" 4.378378 | \n",
" 2004.108108 | \n",
" 7.081081 | \n",
" 13.135135 | \n",
" 17.945946 | \n",
" 0.616216 | \n",
"
\n",
" \n",
" ford | \n",
" 4.536000 | \n",
" 2002.600000 | \n",
" 7.200000 | \n",
" 14.000000 | \n",
" 19.360000 | \n",
" 0.633667 | \n",
"
\n",
" \n",
" honda | \n",
" 1.711111 | \n",
" 2003.000000 | \n",
" 4.000000 | \n",
" 24.444444 | \n",
" 32.555556 | \n",
" 0.427778 | \n",
"
\n",
" \n",
" hyundai | \n",
" 2.428571 | \n",
" 2004.142857 | \n",
" 4.857143 | \n",
" 18.642857 | \n",
" 26.857143 | \n",
" 0.509524 | \n",
"
\n",
" \n",
" jeep | \n",
" 4.575000 | \n",
" 2005.750000 | \n",
" 7.250000 | \n",
" 13.500000 | \n",
" 17.625000 | \n",
" 0.627604 | \n",
"
\n",
" \n",
" land rover | \n",
" 4.300000 | \n",
" 2003.500000 | \n",
" 8.000000 | \n",
" 11.500000 | \n",
" 16.500000 | \n",
" 0.537500 | \n",
"
\n",
" \n",
" lincoln | \n",
" 5.400000 | \n",
" 2002.000000 | \n",
" 8.000000 | \n",
" 11.333333 | \n",
" 17.000000 | \n",
" 0.675000 | \n",
"
\n",
" \n",
" mercury | \n",
" 4.400000 | \n",
" 2003.500000 | \n",
" 7.000000 | \n",
" 13.250000 | \n",
" 18.000000 | \n",
" 0.633333 | \n",
"
\n",
" \n",
" nissan | \n",
" 3.269231 | \n",
" 2003.846154 | \n",
" 5.538462 | \n",
" 18.076923 | \n",
" 24.615385 | \n",
" 0.589744 | \n",
"
\n",
" \n",
" pontiac | \n",
" 3.960000 | \n",
" 2002.600000 | \n",
" 6.400000 | \n",
" 17.000000 | \n",
" 26.400000 | \n",
" 0.615833 | \n",
"
\n",
" \n",
" subaru | \n",
" 2.457143 | \n",
" 2004.142857 | \n",
" 4.000000 | \n",
" 19.285714 | \n",
" 25.571429 | \n",
" 0.614286 | \n",
"
\n",
" \n",
" toyota | \n",
" 2.952941 | \n",
" 2002.705882 | \n",
" 5.117647 | \n",
" 18.529412 | \n",
" 24.911765 | \n",
" 0.573897 | \n",
"
\n",
" \n",
" volkswagen | \n",
" 2.255556 | \n",
" 2002.666667 | \n",
" 4.592593 | \n",
" 20.925926 | \n",
" 29.222222 | \n",
" 0.491049 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" displ year cyl cty hwy \\\n",
"manufacturer \n",
"audi 2.544444 2003.500000 5.222222 17.611111 26.444444 \n",
"chevrolet 5.063158 2004.684211 7.263158 15.000000 21.894737 \n",
"dodge 4.378378 2004.108108 7.081081 13.135135 17.945946 \n",
"ford 4.536000 2002.600000 7.200000 14.000000 19.360000 \n",
"honda 1.711111 2003.000000 4.000000 24.444444 32.555556 \n",
"hyundai 2.428571 2004.142857 4.857143 18.642857 26.857143 \n",
"jeep 4.575000 2005.750000 7.250000 13.500000 17.625000 \n",
"land rover 4.300000 2003.500000 8.000000 11.500000 16.500000 \n",
"lincoln 5.400000 2002.000000 8.000000 11.333333 17.000000 \n",
"mercury 4.400000 2003.500000 7.000000 13.250000 18.000000 \n",
"nissan 3.269231 2003.846154 5.538462 18.076923 24.615385 \n",
"pontiac 3.960000 2002.600000 6.400000 17.000000 26.400000 \n",
"subaru 2.457143 2004.142857 4.000000 19.285714 25.571429 \n",
"toyota 2.952941 2002.705882 5.117647 18.529412 24.911765 \n",
"volkswagen 2.255556 2002.666667 4.592593 20.925926 29.222222 \n",
"\n",
" displ_per_cyl \n",
"manufacturer \n",
"audi 0.484722 \n",
"chevrolet 0.686842 \n",
"dodge 0.616216 \n",
"ford 0.633667 \n",
"honda 0.427778 \n",
"hyundai 0.509524 \n",
"jeep 0.627604 \n",
"land rover 0.537500 \n",
"lincoln 0.675000 \n",
"mercury 0.633333 \n",
"nissan 0.589744 \n",
"pontiac 0.615833 \n",
"subaru 0.614286 \n",
"toyota 0.573897 \n",
"volkswagen 0.491049 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Averages by manufacturer\n",
"# set `numeric_only=True` to avoid a warning\n",
"mpg.groupby(\"manufacturer\").mean(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"id": "0dd45340-a43d-4359-8a5d-6d035dbc8b7d",
"metadata": {},
"source": [
"### Dropping Null Values\n",
"\n",
"For many statistical modeling tasks, you need to drop rows that contain null values. Pandas lets you do this easily with `.dropna()`.\n",
"\n",
"```{note}\n",
"Pandas typically stores null values as NaN, which stands for \"not a number.\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4b3d2bc1-e7e1-4954-9ffa-77b11cfe4416",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
" 0.466667 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.600000 | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 12 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"4 audi a4 2.8 1999 6 auto(l5) f 16 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p \n",
"\n",
" class displ_per_cyl \n",
"0 compact 0.450000 \n",
"1 compact 0.450000 \n",
"2 compact 0.500000 \n",
"3 compact 0.500000 \n",
"4 compact 0.466667 \n",
".. ... ... \n",
"229 midsize 0.500000 \n",
"230 midsize 0.500000 \n",
"231 midsize 0.466667 \n",
"232 midsize 0.466667 \n",
"233 midsize 0.600000 \n",
"\n",
"[234 rows x 12 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Drop any row that contains a null value in any column\n",
"mpg = mpg.dropna()\n",
"mpg"
]
},
{
"cell_type": "markdown",
"id": "54115521-0c5b-471c-9ca2-7a790bfaf279",
"metadata": {},
"source": [
"```{warning}\n",
"The `mpg` dataset doesn't contain any null values, so right now this code isn't accomplishing anything. But it will work when null values are present!\n",
"```\n",
"\n",
"You can also drop null values from only a subset of columns."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "e7808ee0-1988-4703-aa49-52bf8175d10e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l5) | \n",
" f | \n",
" 18 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 1 | \n",
" audi | \n",
" a4 | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 2 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 20 | \n",
" 31 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 3 | \n",
" audi | \n",
" a4 | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(av) | \n",
" f | \n",
" 21 | \n",
" 30 | \n",
" p | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 4 | \n",
" audi | \n",
" a4 | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" compact | \n",
" 0.466667 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 229 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s6) | \n",
" f | \n",
" 19 | \n",
" 28 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 230 | \n",
" volkswagen | \n",
" passat | \n",
" 2.0 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m6) | \n",
" f | \n",
" 21 | \n",
" 29 | \n",
" p | \n",
" midsize | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 231 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l5) | \n",
" f | \n",
" 16 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 233 | \n",
" volkswagen | \n",
" passat | \n",
" 3.6 | \n",
" 2008 | \n",
" 6 | \n",
" auto(s6) | \n",
" f | \n",
" 17 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.600000 | \n",
"
\n",
" \n",
"
\n",
"
234 rows × 12 columns
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty hwy fl \\\n",
"0 audi a4 1.8 1999 4 auto(l5) f 18 29 p \n",
"1 audi a4 1.8 1999 4 manual(m5) f 21 29 p \n",
"2 audi a4 2.0 2008 4 manual(m6) f 20 31 p \n",
"3 audi a4 2.0 2008 4 auto(av) f 21 30 p \n",
"4 audi a4 2.8 1999 6 auto(l5) f 16 26 p \n",
".. ... ... ... ... ... ... .. ... ... .. \n",
"229 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p \n",
"230 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p \n",
"231 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p \n",
"233 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p \n",
"\n",
" class displ_per_cyl \n",
"0 compact 0.450000 \n",
"1 compact 0.450000 \n",
"2 compact 0.500000 \n",
"3 compact 0.500000 \n",
"4 compact 0.466667 \n",
".. ... ... \n",
"229 midsize 0.500000 \n",
"230 midsize 0.500000 \n",
"231 midsize 0.466667 \n",
"232 midsize 0.466667 \n",
"233 midsize 0.600000 \n",
"\n",
"[234 rows x 12 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Drop any rows that contain null values in a subset of columns\n",
"mpg = mpg.dropna(subset=[\"model\", \"displ\"])\n",
"mpg"
]
},
{
"cell_type": "markdown",
"id": "537e88c3-3ce7-482b-88d3-67d137ac2b0e",
"metadata": {},
"source": [
"### Sampling\n",
"\n",
"Many statistical methods, especially hypothesis tests, require to take a random sample of your overall data. Again, Pandas provides an easy way to do this with the `.sample()` method.\n",
"\n",
"You can take a sample of rows from an entire dataframe."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "712f7ce7-c009-4e7e-ac55-cd6455016934",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 32 | \n",
" chevrolet | \n",
" malibu | \n",
" 2.4 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 19 | \n",
" 27 | \n",
" r | \n",
" midsize | \n",
" 0.600000 | \n",
"
\n",
" \n",
" 174 | \n",
" toyota | \n",
" 4runner 4wd | \n",
" 2.7 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" 4 | \n",
" 16 | \n",
" 20 | \n",
" r | \n",
" suv | \n",
" 0.675000 | \n",
"
\n",
" \n",
" 109 | \n",
" hyundai | \n",
" sonata | \n",
" 2.4 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 27 | \n",
" r | \n",
" midsize | \n",
" 0.600000 | \n",
"
\n",
" \n",
" 48 | \n",
" dodge | \n",
" dakota pickup 4wd | \n",
" 3.7 | \n",
" 2008 | \n",
" 6 | \n",
" manual(m6) | \n",
" 4 | \n",
" 15 | \n",
" 19 | \n",
" r | \n",
" pickup | \n",
" 0.616667 | \n",
"
\n",
" \n",
" 214 | \n",
" volkswagen | \n",
" jetta | \n",
" 2.0 | \n",
" 1999 | \n",
" 4 | \n",
" auto(l4) | \n",
" f | \n",
" 19 | \n",
" 26 | \n",
" r | \n",
" compact | \n",
" 0.500000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty \\\n",
"32 chevrolet malibu 2.4 1999 4 auto(l4) f 19 \n",
"174 toyota 4runner 4wd 2.7 1999 4 auto(l4) 4 16 \n",
"109 hyundai sonata 2.4 1999 4 manual(m5) f 18 \n",
"48 dodge dakota pickup 4wd 3.7 2008 6 manual(m6) 4 15 \n",
"214 volkswagen jetta 2.0 1999 4 auto(l4) f 19 \n",
"\n",
" hwy fl class displ_per_cyl \n",
"32 27 r midsize 0.600000 \n",
"174 20 r suv 0.675000 \n",
"109 27 r midsize 0.600000 \n",
"48 19 r pickup 0.616667 \n",
"214 26 r compact 0.500000 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get 5 random rows from mpg\n",
"mpg.sample(5)"
]
},
{
"cell_type": "markdown",
"id": "3a559981-f24a-4307-a5fe-00329f8fb7ce",
"metadata": {},
"source": [
"You can also get a sample of a specific column."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "c67abb42-9dc2-42e1-8860-f0321f85a9e3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"86 4.6\n",
"201 2.7\n",
"228 1.8\n",
"138 4.0\n",
"35 3.5\n",
"Name: displ, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get 5 sample engine displacement values, as a series\n",
"mpg.displ.sample(5)"
]
},
{
"cell_type": "markdown",
"id": "85e45ba0-9ee6-4ddd-9220-2d9162a2e143",
"metadata": {},
"source": [
"You can also sample with replacement. (This is also called \"bootstrap sampling.\") This makes it possible to have the same value in your sample twice."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "739621c8-f942-4004-a5f4-294adab50862",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"119 2.7\n",
"86 4.6\n",
"83 4.2\n",
"28 5.3\n",
"199 5.7\n",
"Name: displ, dtype: float64"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.displ.sample(5, replace=True)"
]
},
{
"cell_type": "markdown",
"id": "371bccfe-ec35-4c52-ae9a-801b0540e02d",
"metadata": {},
"source": [
"Pandas will also let you get a fraction of values instead of a set number in your sample."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "780bf156-f4be-4dae-854c-cded2d73d05e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" manufacturer | \n",
" model | \n",
" displ | \n",
" year | \n",
" cyl | \n",
" trans | \n",
" drv | \n",
" cty | \n",
" hwy | \n",
" fl | \n",
" class | \n",
" displ_per_cyl | \n",
"
\n",
" \n",
" \n",
" \n",
" 151 | \n",
" nissan | \n",
" pathfinder 4wd | \n",
" 3.3 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" 4 | \n",
" 15 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
" 0.550000 | \n",
"
\n",
" \n",
" 232 | \n",
" volkswagen | \n",
" passat | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 18 | \n",
" 26 | \n",
" p | \n",
" midsize | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 89 | \n",
" ford | \n",
" f150 pickup 4wd | \n",
" 5.4 | \n",
" 2008 | \n",
" 8 | \n",
" auto(l4) | \n",
" 4 | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
" 0.675000 | \n",
"
\n",
" \n",
" 189 | \n",
" toyota | \n",
" camry solara | \n",
" 2.4 | \n",
" 2008 | \n",
" 4 | \n",
" auto(s5) | \n",
" f | \n",
" 22 | \n",
" 31 | \n",
" r | \n",
" compact | \n",
" 0.600000 | \n",
"
\n",
" \n",
" 134 | \n",
" lincoln | \n",
" navigator 2wd | \n",
" 5.4 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" r | \n",
" 11 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
" 0.675000 | \n",
"
\n",
" \n",
" 196 | \n",
" toyota | \n",
" corolla | \n",
" 1.8 | \n",
" 2008 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 28 | \n",
" 37 | \n",
" r | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
" 21 | \n",
" chevrolet | \n",
" c1500 suburban 2wd | \n",
" 5.7 | \n",
" 1999 | \n",
" 8 | \n",
" auto(l4) | \n",
" r | \n",
" 13 | \n",
" 17 | \n",
" r | \n",
" suv | \n",
" 0.712500 | \n",
"
\n",
" \n",
" 211 | \n",
" volkswagen | \n",
" gti | \n",
" 2.8 | \n",
" 1999 | \n",
" 6 | \n",
" manual(m5) | \n",
" f | \n",
" 17 | \n",
" 24 | \n",
" r | \n",
" compact | \n",
" 0.466667 | \n",
"
\n",
" \n",
" 164 | \n",
" subaru | \n",
" forester awd | \n",
" 2.5 | \n",
" 2008 | \n",
" 4 | \n",
" auto(l4) | \n",
" 4 | \n",
" 18 | \n",
" 23 | \n",
" p | \n",
" suv | \n",
" 0.625000 | \n",
"
\n",
" \n",
" 55 | \n",
" dodge | \n",
" dakota pickup 4wd | \n",
" 5.2 | \n",
" 1999 | \n",
" 8 | \n",
" manual(m5) | \n",
" 4 | \n",
" 11 | \n",
" 17 | \n",
" r | \n",
" pickup | \n",
" 0.650000 | \n",
"
\n",
" \n",
" 40 | \n",
" dodge | \n",
" caravan 2wd | \n",
" 3.3 | \n",
" 1999 | \n",
" 6 | \n",
" auto(l4) | \n",
" f | \n",
" 16 | \n",
" 22 | \n",
" r | \n",
" minivan | \n",
" 0.550000 | \n",
"
\n",
" \n",
" 195 | \n",
" toyota | \n",
" corolla | \n",
" 1.8 | \n",
" 1999 | \n",
" 4 | \n",
" manual(m5) | \n",
" f | \n",
" 26 | \n",
" 35 | \n",
" r | \n",
" compact | \n",
" 0.450000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" manufacturer model displ year cyl trans drv cty \\\n",
"151 nissan pathfinder 4wd 3.3 1999 6 manual(m5) 4 15 \n",
"232 volkswagen passat 2.8 1999 6 manual(m5) f 18 \n",
"89 ford f150 pickup 4wd 5.4 2008 8 auto(l4) 4 13 \n",
"189 toyota camry solara 2.4 2008 4 auto(s5) f 22 \n",
"134 lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 \n",
"196 toyota corolla 1.8 2008 4 manual(m5) f 28 \n",
"21 chevrolet c1500 suburban 2wd 5.7 1999 8 auto(l4) r 13 \n",
"211 volkswagen gti 2.8 1999 6 manual(m5) f 17 \n",
"164 subaru forester awd 2.5 2008 4 auto(l4) 4 18 \n",
"55 dodge dakota pickup 4wd 5.2 1999 8 manual(m5) 4 11 \n",
"40 dodge caravan 2wd 3.3 1999 6 auto(l4) f 16 \n",
"195 toyota corolla 1.8 1999 4 manual(m5) f 26 \n",
"\n",
" hwy fl class displ_per_cyl \n",
"151 17 r suv 0.550000 \n",
"232 26 p midsize 0.466667 \n",
"89 17 r pickup 0.675000 \n",
"189 31 r compact 0.600000 \n",
"134 17 r suv 0.675000 \n",
"196 37 r compact 0.450000 \n",
"21 17 r suv 0.712500 \n",
"211 24 r compact 0.466667 \n",
"164 23 p suv 0.625000 \n",
"55 17 r pickup 0.650000 \n",
"40 22 r minivan 0.550000 \n",
"195 35 r compact 0.450000 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get a random sample of one twentieth the size of the dataset\n",
"mpg.sample(frac=.05)"
]
},
{
"cell_type": "markdown",
"id": "a8bcf350-1892-4031-b645-8fff6c1f4979",
"metadata": {},
"source": [
"There's one more trick you can do with sampling. Sometimes you don't need to get a smaller random sample: instead, you just want to reshuffle every row of the dataset. You can do this by setting `frac` to 1. In a way, you're taking a random sample that is 100% of the size of the dataset! (But make sure you do this *without* replacement.)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "04c4bd3f-253e-4494-b6cd-06f620cd9b64",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"51 3.9\n",
"205 4.0\n",
"83 4.2\n",
"73 5.9\n",
"122 3.0\n",
" ... \n",
"228 1.8\n",
"85 4.6\n",
"123 3.7\n",
"106 1.8\n",
"155 3.8\n",
"Name: displ, Length: 234, dtype: float64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.displ.sample(frac=1)"
]
},
{
"cell_type": "markdown",
"id": "23da4f36-0829-4513-ab15-a69bace3d963",
"metadata": {},
"source": [
"Pandas will remember the indices in your new Series, which means if you use this reordered sample it might put things back in order for you! To avoid this, you can reset the index and drop the old labels."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "29cd2eb1-7965-4b49-a627-a4c87e53de3e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 4.7\n",
"1 3.1\n",
"2 2.5\n",
"3 5.3\n",
"4 2.5\n",
" ... \n",
"229 2.4\n",
"230 4.0\n",
"231 4.6\n",
"232 2.8\n",
"233 2.0\n",
"Name: displ, Length: 234, dtype: float64"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpg.displ.sample(frac=1).reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"id": "0d2e41b2-e3e9-477a-9a49-3ebfb982e5b9",
"metadata": {},
"source": [
"```{seealso}\n",
"Wes McKinney's excellent book *Python for Data Analysis* has lots more examples and many additional function. In particular, you can check out [Chapter 5](https://wesmckinney.com/book/pandas-basics.html#pandas_frame) and [Chapter 8](https://wesmckinney.com/book/data-wrangling.html).\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}