Comparing Means

Another common step in data analysis is to compare the means between groups to see if they are significantly different. The way we do that is based on the number of groups we have.

Comparing Two Groups: T-test

The typical t-test is a statistical test used to compare the means of two groups to see if they are different than one another. This is commonly called a two sample t-test because you have two groups you’re comparing against each other. You could also compare your data to an expected value or to see if it is different than zero and this would be a one sample t-test. The other option is a paired t-test where you’re comparing the same groups at different time points, often before and after a treatment condition. But the gist of all of them is that you have two groups, and only two groups, of numerical data that you want to see if they differ from each other.

Let’s say we want to compare penguin body weights based on sex. In R, a t-test is performed with the function t.test(), which uses the same ~ notation as lm(). So, would specify the y variable that we think might be different based on grouping in the x variable. The code is this:

You might be thinking “wait, weren’t there NAs in that data?” NAs don’t count as group levels, so we still just have ‘male’ and ‘female’ to compare and the NAs will be removed.

This will output a t-value, degrees freedom, and a p-value. It will also show the means in each group and the 95% confidence interval around the calculated difference in means.

Comparing More Than Two Groups: ANOVA

An ANOVA stands for analysis of variance, and it is a test used to see if there are differences between three or more groups. So, if you have two groups, run a t-test, and if you have any more than that run an ANOVA.

To run an ANOVA in R, you use the function aov() and the function summary() to understand the results. The aov() function will build us a model that uses an F-statistic to compare variance within groups to variance between groups and give a p-value to tell if that difference is significant.

We can run our ANOVA to see if body weight varies by species. The code is similar to the t.test() but we want to store the model so we can run summary() as we did with lm().

Tukey Test for After an ANOVA

One caveat about the ANOVA is that all it says is that there is a difference somewhere within your groups. It does not say which groups. If you want to see all the differences between your groups, you can run a Tukey test, which is referred to as a post-hoc test because you always run it after running the ANOVA. The code for that is:

The output here shows us the multiple comparisons between all the penguin species.

Practice

Write code to determine if there is a difference in flipper lengths of Adelie penguins between the islands.