Adding Significance Values
Often the whole point of making graphs is to look for meaningful differences within our data, and you’re probably aware that for those differences to be meaningful they must be statistically significant. So, let’s work on adding significance to our graphs using the packages {ggsignif} and {ggpubr}.
We’ll work with boxplots of the penguin data for this, so a typical starting graph would be something like this:
Add Significance with {ggsignif}
To add significance symbols to our graph we have to specify what groups we want compared and in what way, meaning what statistical test should R run. We do this within the command geom_signif. Below we list all the comparisons possible between our three species, we specify t.test for our mode of comparison. Try this code and see how it works out:
You should see something that looks like it’s working but not graphing well. This is because all our bracket comparisons are overlapping. If we want them staggered to different heights, we need to specify the y-axis value that we want them to appear at. Since we have three comparisons, we need to state three values.
You can change those numbers around to get the graph looking exactly how you’d like, but the gist is that you have a bracket showing which groups are being compared and above that you have the p-value for that comparison.
Often in papers, you will see asterisks instead of the actual numbers, and we can duplicate that here by adding another argument to the geom_signif() function. By default, it will put the actual p-values, but you can turn on asterisks with map_signif_level.
This graph currently shows all comparisons, but usually we just want to show the comparisons that are significant. With {ggsignif} the way to do this is just to go back and remove the comparison that is not significant from the list. If you want something that will do this automatically use {ggpubr}.
Adding Significance with {ggpubr}
Unlike with the error bars, this time the {ggpubr} function we want can be directly added to our ggplot code. The syntax is similar, but slightly different, to that we used with {ggsignif}. Here’s the code using stat_compare_means:
You’ll see this shows the asterisks rather than the p-values themselves. Let’s say you actually want the p-values. Use the documentation for stat_compare_means to change the code to show the real values.
Run ?stat_compare_means and scroll down until you see all the possible options for the label argument.
Change the code to read label = "p.format".
Now if we want to only show the significant comparisons there’s an option for that. You would add hide.ns = TRUE to the above code.
Another nice option within {ggpubr} is the ability to add a reference group if you have a single group that you want everything compared to. This is useful if you have one control and many experimental conditions because you don’t have to right out each comparison. Here’s the same graph as before but this time using Adelie as a reference group. (This obviously isn’t the most useful for our penguin example, but let’s just pretend Chinstrap and Gentoo are experimental penguins.)
You should note that within both geom_signif and stat_compare_means there are ways to adjust what the brackets look like (line length, tip length) and to have legends or other labels. Also, I’ve only show t-tests here, but you can enter a wide variety of statistical tests.