filter()

Filtering is a way to subset a dataset by rows and keep only the rows that meet a certain condition. To do this, we use the filter() function, which will remove rows that don’t satisfy the condition in the parentheses.

An example would be if we want to find all the penguins over a certain weight. We would put our weight cutoff as the condition to be met. Here we’ll find all penguins that weigh over 3500 grams.

You should see that the resulting tibble is only 264 x 8 down from the original dataset size of 344 observations. This means that 80 rows were removed from the dataset.

You may be wondering, what if a penguin weighs exactly 3500g? In this case, that penguin would not be included in our resulting data since we are specifying more than 3500. If we wanted to include that penguin in our data we would do body_mass_g >= 3500, which means more than or equal to 3500g.


You can filter also filter on strings or characters. To do this we’ll use the first of what are called “logical operators”. Logical operators are symbols that are used to evaluate a condition and return a true or false answer. The above example gave a true or false assignment to each penguin based on if it was more than 3500g and then returned only the true values.

EQUALS

The first one we’ll learn is the EQUALS operator, symbolized by ==. This does what it sounds like, this operator will find values that are an exact match for the given value. We use a double equals to show that we want to match the value and not that we want to set something equal to that value. If I filtered by species == "Adelie" the result would be all the Adelie penguins.

Importantly, notice that Adelie is in quotes and that it is capitalized. We want to match with a species based on its name, which is a character, so we need the quotes, and R is always case sensitive.

One thing to take not of is that you can make filters that might seem like they run but they don’t actually pull out any results. In the following code, we are trying to make a subset that is just Adelie penguins and even though we have the wrong case for the name, a subset will be created.

However, when you actually go to see what is in adelie you will find that it is empty.

Try changing the above code to be the correct case and see what the result is.

Now we’ll learn a few more logical operators.