select()

Today we’ll use the {palmerpenguins} package that was used in Workshop 3, and we’ll continue to use this package going forward, so head back to that workshop if you want more information about it.

If you’re running this code in your own version of R outside this document (either on the server or the desktop), you’ll need to load the Palmer penguins package with the command library(palmerpenguins).

Selecting and Deleting Columns

If you want to select several columns or delete columns you will use the command select().

What the select() function does is essentially subset our data by column name. You will add within the () the names of the columns you want to see and they will appear in the order you list, so this is also helpful if you want to rearrange the data.

First let’s use the function glimpse() to see what is contained in the penguins dataset:

We can see that there are 8 columns of various types of data. Let’s say we just wanted to look at the species and sex column, we would code the following:

Your output should be a two column dataset with only species and sex. They are in the same order as they are in the original dataset, but you can change the order by changing their order in the select function:

Now let’s say we want the whole dataset but not the island or year column. You can put a minus - in front of a column name to remove it from the output.

So far we have just printed out the output, but if you want to save your changes you can make a new data object. Remember that this is done by using the assign character <-. The following code creates a new data object called “just_mass” that has a subset of the original columns:

Remember, when you run code such as the above that creates a new data object, it will show the object in the variable environment, but won’t print it to the screen unless you directly tell it to. So, while the above has no console output, the following will print the new dataset you made.