Graph Customization

R is great with graph customization. You can basically make a graph look however you want.

Labeling Axes

So far all our axis labels have been the names of our variables, but often that does not look very professional, so it’s easy to change them. We can add a labs() command and then specify the axis labels within that. If we have a color or fill variable, we can also use that to change the legend title.

Let’s start by adding a new graph and create a histogram. For histograms you are always plotting the frequency of a variable, so there is no need to specify the y-axis, it will be frequency. Here is code to create a histogram of penguin weights:

You can make it more complex by adding an argument for fill. Add code to make separate histograms by species.

To change the axis you will add a + to the end of the line and add the function labs(). Then specify within labs() the labels you’d like to add.

There are many options for adding titles, changing the legend, and adding captions. ?labs will help you figure out some of them or see the graphing workshops.

Themes

We can also add a line of code to change the hideous gray background. We do this by changing the theme. I like either theme_bw() or theme_classic(), but you can see all the background options here. Try out different ones in place of theme_classic().

You can also change whether or not grid lines show by adding either or both of these lines of code:

# remove minor grid lines
theme(panel.grid.minor = element_blank())

# remove major grid lines
theme(panel.grid.major = element_blank())

Adding Trend Lines

If you make a scatterplot, you might want to add a trend line to it. Here’s the code for our scatterplot again.

If we want to add a trend line we just add another line of code using the command geom_smooth(). For geom_smooth, you need to specify what type of line you want to add, often we’re just wanting to add a straight line that best fits the data. This line is technically a linear model of our data, so we want to tell geom_smooth to use “lm”. (Other options might be something like an exponential curve or a poisson curve.)

You’ll notice that the blue trend line also has a gray zone around it. This is a 95% confidence interval shown around the line. The default is to have it on, but you can turn it off by saying “se = FALSE”, where se stands for standard error.

If you have variables separated by color, R will automatically know to make separate trend lines for them.

Remember, use color for lines and points and fill for shapes you want filled in.

Note that now we see the opposite trend than when the data was analyzed all together. This shows the importance of including meaningful separations in your data.