Using pivot_longer() with Patterns

Now let’s start adding on to these basic functions. There are a lot of built in arguments or add-on functions we can use to help us make our pivoting code simpler.

starts_with()

We’ll start with a function that helps us pick out columns based on their starting characters. In our billboard example, we wanted to pivot around all of the week columns. All of these had similar names starting with “wk” and then a number. We can change our code from explicitly naming each of these columns to just picking every column that starts with “wk”. This will be especially useful for the full billboard dataset because it actually includes 76 weeks. (We used a truncated version before but now we’ll use the full thing.)

Rather than type out every “wk#” column, we can use the same format as before, but now add starts_with() to our cols argument:

You should see that, as before, the columns are rearranged so the individual weeks are all listed under week.

names_prefix

There’s another argument from the pivot_longer() function that’ll help us out with this even more. Right now all of our week data is in a string or character format, but we might want it to be numerical. Meaning, instead of “wk1”, “wk2” we just have “1”, “2” as our column values. We can remove the “wk” with the help of the names_prefix argument.

Note: It can sometimes be confusing what is an argument and what is a function. Arguments are options that exist within functions, while functions are commands themselves and can be used by themselves, within other functions, or within an argument. The way to know is that functions will always have () while arguments won’t.

The names_prefix argument will remove whatever you set it equal to. Since you’re looking to match an exact pattern, put it in quotes.

Remember: you can always learn more about a function by running ?function_name. ?pivot_longer will let you see what names_prefix does and show you other add-on options.

names_sep

Now let’s say we have a dataset of experimental data we recorded. Here’s a short example of data with measurements for control and experimental variables for three groups with two repetitions.

Here we want to break up our data so that we have one column for experimental or control and one column for repetition and then one for the measurement value. So far, we have only wanted to make one column name into one new column, but we can split column names and make them into two columns using either names_sep.

First we’ll use the ! operator to identify that we want to pivot around every column except group name. Then we’ll use names_sep to break up the column name by the _. We’ll create two columns in names_to for each part to go into: condition and repetition.

Remember: the ! symbol is equivalent to saying “not”, so here we’re saying not the group column but everything else.

With names_sep you can have anything within the quotes be the separator. Let’s try that again, but this time let’s remove the underscore and the word “rep” so that we’re left with only condition and the numeric value for the rep. Fill in the value for names_sep:

names_pattern

We can achieve similar results with more complicated names by using the names_pattern argument.

Let’s make our data a little more complicated by adding an additional “drug” variable to our column names.

Now we want to split each column name into three columns: condition, drug, and rep. We can do this by specifying the pattern that we want the names_pattern argument to look for. We want to keep exp/cont, X/Y, 1/2 and delete everything else.

The way names_pattern works is it will use what are called “regular expressions” to look for a pattern and then remove anything not contained inside (). We can use the regular expression .* to mean any characters in that position. So this will keep anything that comes before _drug, anything between _drug and _rep and anything after _rep. Try it and see:

See if you can write the code to pivot the following data into long format with columns for section, group, and grade.

Add your code here: