Boxplot
Boxplots are good for showing distributions and for comparing them across groups. They show the median as a line inside a box that represents the interquartile range, meaning the range of data from 25%-75% of the distribution. The lines that extend from a boxplot are called whiskers and they show spread outside of the quartiles. The dots show outliers.
If you need any help interpreting this boxplot, this site has a good boxplot explanation.
Again we start with the same basic structure, but now we say geom_boxplot() as the graph type. Let’s plot the weight (body_mass_g) for each penguin species. See if you can add our variables:
Adding Groups
Right now these boxplots show all the data for each penguin species summarized together. However, there may be sexual dimorphism (different physical attributes) between male and female penguins. To better display our data we should break each species boxplot into two boxplots, one for each sex.
We can do this by adding a statement in our aes that tells R we want to divide species into parts to be analyzed separately. The easiest way to do this is to color in the boxplots based on the different sexes. And, indeed, we will add “fill = sex” to fill in the boxplots with different colors for each sex.
You should see three sexes for Adelie and Gentoo because these contain missing values for sex.
We could have also used “color = sex” to specify how we wanted our boxplots divided. Run that code to see how R treats fill and color differently.
Basically, the gist is that R uses fill for filling in shapes and color for coloring lines and points. Good usage would be fill for geom_bar() or geom_col() and color for geom_point() and geom_line(). R will choose the colors automatically, but you can change them to be whatever you would like.