More Pivoting Wider
We can add a few more arguments to pivot wider for special use cases.
Combining Columns
Sometimes in long form you will have multiple variables in one column where you might actually want them separated into their own columns. Here’s a built in dataset of all United States rent and income estimates with margin of error:
If we’re analyzing this, it might be easier to have rent and income in different columns, but we still need to to pair those with the correct state. We can rearrange that table to do that using pivot_wider(), we just say the column we want broken up for the names_from argument and what values we want to go with it in the values_from argument.
Now each state has its own row and the data is separated out into separate columns.
We can perform the opposite operation if we want to columns to be combined into a single column. Here we have a column for city and a column for state, but we might want to have every column in our table just represent a unique location. We can now add multiple arguments to names_from() and leave a single argument in values_from().
See if you can complete the pivot_wider() code:
##Answer
By default, pivot_wider() will use an _ to separate column names. If you’d like to change this you could add a names_sep argument to the code.
Summarizing Data
We can also use pivot_wider() to summarize data in long format when we transition it to wide format. This only works if there are multiple values for the same factor. In our example data above every value had a unique factor it matched to, meaning a unique combination of city, state, and year. The combo (Austin, Texas, 2020) only matches the value 90. But if we remove the year column and just have values, as seen below, then each city, state combination has multiple values. Now the combo (Austin, Texas) matches both 90 and 73.
If your values uniquely match other columns, you can’t* summarize the data, but if they don’t then you can perform operations like mean(). Here we add the argument values_fn to take the mean of all the non-unique values.
*Technically, you still can, but the mean of a single number is just that number, so it’s pointless.