DA 101, Dr. Ladd
I can grab a sample of 5 observations. Then I can “resample” 5 more. Then 5 more, and so on and so on.
Replacement means an item is returned to the sample before the next draw (i.e. you could wind up with the same observation multiple times).
It doesn’t matter whether the samples are normally distributed or whether their variance is equal. There are no assumptions in a permutation test.
In a permutation test, you rearrange groups randomly to determine a permutation distribution.
It shows you what the distribution would look like if the difference between the groups was the result of random variation.
Let’s look at the steps of a permutation test that would replace a two-sample t-test…
First name the function and define input. Then do something and return a result!
replicate
sample
permutation_func <- function(x, nA) {
idx_a <- sample(1:length(x), nA) # Get a sample the size of group 1
idx_b <- setdiff(1:length(x), idx_a) # Get the rest of the data
mean_diff <- mean(x[idx_a]) - mean(x[idx_b]) # Subtract the means
return(mean_diff)
}
You can reuse this!
Where does the 47 come from?
Where did .2 come from? Can we calculate it more accurately?
Why does this code work?
Is our result statistically significant? Is it practically significant?
Determine if users spend significantly more time on Page B than they do on Page A.