Separate Graphs by Additional Variables
Our penguin graphs aren’t too complex, so showing everything within one box works pretty well. But sometimes you have more information than fits neatly on one graph or you just want to separate variables to make them easier to compare.
We’ll do this for our penguin boxplots using mass and sex. First, I’m going to filter the data to remove missing values for either of those because right now NA will count as a graphable sex for penguins and I don’t want it to.
Separating by One Variable
Now we’ll make our boxplots separating species and sex:
This looks fine, but we can change it to automatically make separate plots for either variable. We’ll separate by sex using the function facet_wrap(). With this function, you specify which facet of the graph (which variable) you’d like to wrap around (meaning become separate plots). Here’s the code:
The ~ in this code comes from the way R likes model y ~ x, which can be read as y by x or y as a factor of x. So here, we’re taking the information from above and factoring (or separating) by sex.
You’ll notice that the two graphs are placed side by side, but they don’t have to be. You can specify the number of columns or rows you want there to be. So, here we could either say we want one column or two rows and that would work equivalently.
By default, the scale of the y-axis (or x-axis) will be the same on each graph. Our penguin weights aren’t radically different so this doesn’t leave a ton of extra space on either graph, but if you have values that are very different between variables, you might want to allow the axis to span only the range of the data within it. We can do this with the scales argument. It’s set to “fixed” by default but let’s change it to “free”.
You’ll see now that the scale for the y-axis has shrunk for the female penguins since they don’t have any with masses above 6000. This feature is sometimes useful, but it can be misleading if a reader isn’t carefully observing the graph, so be deliberate if you’re going to use it.
Separating by Multiple Variables
facet_wrap() is best to use if you have one variable you want as the separator, if you want more than one use facet_grid(). Here instead of the ~ notation, you will just specify which variable you want to be the rows and which the columns.
Note: This graph might look a little off at first but that’s because not every penguin species lives on every island.
With both facet_wrap() and facet_grid() you can change the additional labels that are on the graph using the labeller argument. The format is old_name = new_name:
Note that you have to both specify labeller as an argument and use the function labeller(), so you’re writing “labeller” twice. Yes, that’s a little annoying.
If you don’t want the legend there you just drop it using the same syntax we went over in the Graphing Basics workshop: