DA 101, Dr. Ladd
Week 6
Consider two sample groups, A and B. (Such as the male and female groups of your alcohol use data.)
In a t-test, the null hypothesis would assume that the means of A and B are equal, that there is no difference between them, and that any observed difference we see is the result of randomness.
We attempt to disprove the null hypothesis by showing that the observed data isn’t the result of randomness.
If there’s a null hypothesis, there has to be an alternative hypothesis.
If the null hypothesis is that A and B are equal, then the alternative hypothesis would be that A and B are not equal (either smaller or bigger).
One-tailed: We only care about a non-equal result in one direction, i.e. if A > B but not if A < B.
Two-tailed: We care about differences in both directions, i.e. A != B but could be larger or smaller.
Different research questions lead to different alternative hypotheses.
Is the median house price in Granville larger than the median price in Newark?
Is the mean number of mountain lions per 100 km^2 equal in North and South America?
NHANES reports the average starting age of smoking is 19. Is this correct, or is the true mean lower than this?
Say you have two web pages, Page A and Page B, and you’ve measured the amount of time internet users spend on each page. You’re trying to decide whether to replace Page A with Page B.
The Null Hypothesis is that:
mean(A) = mean(B)
The Alternative Hypothesis is that:
mean(B) > mean(A)
(one-tailed)
We have two clear groups: the people who saw Page A and the ones who saw Page B. But we could reshuffle this data a thousand times in a thousand different configurations, where the session times are separated into equally sized but random groups.
In the end we’d have a distribution of how much the means differ among a thousand random groups.
I.e., how often the values were to the right of the dotted line.
In this case, that was about 12% percent of the time. That’s a lot! And that means that this observed difference isn’t all that unusual.
Instead, we can measure the probability of obtaining results as unusual as the observed result.
This probability is called the p-value!
Given a chance model that embodies the null hypothesis, the p-value is the probability of obtaining results as unusual or extreme as the observed result.
In our example, our 12% was a p-value of .12!
If the p-value is lower than .05 (5%), we can reject the null hypothesis.
If the p-value is higher than .05 (5%), we fail to reject the null hypothesis and our result could be random.
This is just a rule of thumb!
In our example of two groups in our data, we could test whether their difference in means is significant using a t-test.
A t-test estimates the random reshuffled distributions based on a series of assumptions about what that distribution should look like. It calculates a p-value based on that “t-distribution.”
Different statistical tests calculate p-values for other kinds of differences.
Consider results that are:
Type I Error (alpha-error) is rejecting the null hypothesis when it is true.
Type II Error (beta-error) is failing to reject the null hypothesis when it is false.
Misreading or overemphasizing the p-value can lead us to error!
The two-sample t-test: (This one is review!)
The one-sample t-test:
# Run the test to see if the mean of hwy
# is greater than 0.
t.test(mpg$hwy, mu=0, alternative="greater")
We can set the default mean (mu
) to any value. Remember that the variable must be normally distributed.
The Shapiro-Wilk test:
Important note: the variable likely has a normal distribution if the p-value is higher than .05. Always compare with a histogram!
Let’s try out rnorm()
. It takes 3 parameters: the number of observations, the mean, and the standard deviation.
Install and load the palmerpenguins
dataset. Then create a filtered dataset of only the Adelie penguins.
Make sure you’ve got tidyverse
imported, too.
Ask yourself: what test or function would you run? How would you run it?
Are the flipper lengths of Adelie penguins normally distributed?
Are the flipper lengths of Adelie penguins significantly less than 190mm?
Is there a significant difference in the flipper length of Adelie penguins vs. Gentoo penguins?
Let’s create a normally distributed variable with roughly the same mean and standard deviation as the flipper length of Adelie penguins, but with twice the number of observations.