mutate()

The mutate() function let’s us add or change columns. We can add a new column simply by giving it a name and a definition.

Let’s say that we wanted to find out the ratio between penguin bill length and depth We can create a new column called bill_ratio and do some column math for its value:

You should now see that instead of a tibble that is 344 x 8 observations, we have 344 x 9, with the new column bill_ratio appearing at the end. If you wanted to keep that bill measurement with the other bill measurements, you could use select to change the order of the columns.

We can also overwrite a column if there’s something in it we need to change. Let’s say that after this study the researchers found out that their scale for measuring the penguin body mass was actually off by 20g and now we need to subtract that amount from each weight. What we can do is just set the body_mass_g column equal to the new formula for weight:

Since body_mass_g is a numeric column we can just subtract 20 from all numbers in the column. Then we set that equal to itself so that the new weights overwrite the old weights.

Keep in mind, I am not assigning this new change to a variable, so the new weights aren’t stored. If I wanted to store them, I would have to use the assign operator as below:

If for some reason you wanted to rename a column but also still keep the original column you could use mutate to duplicate the column, just by setting it equal to the new name:

Now I would have an extra column called weight but I would also keep the body_mass_g column.

For practice, let’s say that instead of recording the year as 2007, 2008, and 2009 we wanted to just have them labeled years 1, 2, and 3. Write two separate code statements: one that changes year to the preferred system and one that creates a new column called called rep that stores the new numbering system.