Adding Errorbars

One of the main things you might want to add to a graph to make it more meaningful are error bars. There are a few different ways to do that, and we’ll go through two.

Error Bars with `geom_errorbar()`

geom_errorbar() is a function from the basic {ggplot2} package. It relies on you calculating your own averages and standard error to plot error bars. So, first you’ll need to summarize your data. Note that here I’m calculating standard error, which is different than standard deviation or confidence interval. With geom_errorbar() you will manually calculate whatever value you want to be added and subtracted from the mean, so you can dictate whatever statistic you would like.

This code takes the penguin data and creates a new summary table that includes columns for mean and standard error for penguin mass by species.

Using this table we can now plug our values into geom_errorbar():

Go back and change width to be a higher number to see what changes (but keep it between 0 and 1).

This isn’t the most informative graph in the world, but you get the idea. Let’s try to separate out the columns by sex to see if it gets more informative.

First we’ll add a sex group to our summarizing code and we’ll filter out NA sexes.

Now we’ll need to alter our graph code to have position_dodge to separate out our columns. Here’s the code without error bars, see if you can alter it to include error bars.

Answer

You can change around the width measurements to see how that changes bar placement. If you want the error bars centered on the columns the widths need to match, but you can see what happens if they don’t match.

Error Bars with `stat_summary()`

Within ggplot there is an option to have the statistics summarized for you within the graph code. It’s not quite as easy as you’d like it to be, but we’ll go through how to do it for our bar plot.

The main thing is that since it is calculating mean and error from scratch it uses the original dataset. But this means that you need to tell it every aspect of what you want on the graph, so rather than using geom_col() to create columns, we need to use stat_summary().

The code goes as follows:

The first stat_summary() line creates our bars for the mean. The second creates the error bars. fun stands for function and will return a single value, fun.data allows for multiple values per data point, so we need that to get an upper and lower bound for our error bars.

Error Bars with `{ggpubr}`

Another way to add error bars is with the package {ggpubr}. This package has a lot of functions for making figures publication ready. It has a little bit of a different syntax than ggplot2 though. You can go to the {ggpubr} documentation to see all of the functions this package has.

This time let’s use a line graph and add confidence intervals to our graph. We’ll plot penguin weight over time for each species. The nice thing about {ggpubr} is that it will calculate our stats for us, so we just need to use the main dataset and not a summarized one.

The downside to using {ggpubr} is that you’ll need to use its syntax if you now want to change the axes or labels on this plot. Most people learn graphing in R by using {ggplot2}, so even though {ggpubr} has a fairly simple syntax, it’s still a little annoying to have to learn, and it doesn’t have quite as much customization as {ggplot2} Why didn’t the creators just build off {ggplot2} instead of making their own new syntax? I’m sure they’ll tell you the answer is “simplicity”, so take that as you will.

By running ?ggline you can see the possible options available for changing the graph.

Why is there no good way of using basic ggplot code and having the error bars calculated for you? Who knows. Maybe you could write a package some day. I would like it.

Error Bars with geom_errorbar()

Error Bars with stat_summary()

Error Bars with {ggpubr}

Error Bars with `geom_errorbar()`

Error Bars with `stat_summary()`

Error Bars with `{ggpubr}`