Joining Tables

Author

Josie Griffin ’09

Agenda

In this workshop, we’ll explore how to combine data from multiple tables using joins. Joining tables is a technique that allows you to merge data together based on identifying information contained in each. Joining tables is a very common step you’ll need to perform before you can analyze and visualize more complex relationships. We’ll focus on the most common types of join and practice using them effectively. We will primarily be using the {dplyr} package, a part of {tidyverse}.

After today you should be able to:

  • bind tables by rows or columns
  • join tables by key
  • join tables with different column names for the same variable
  • filter tables with inner_join() and anti_join()
  • understand “many-to-many” joins

Packages to load:

To use commands from specific packages they must first be loaded using the command library(package_name). If you have not installed the package you will first need to run install.packages("package_name"). You only need to install once, but you need to load a library each time you start a new script.

  • {tidyverse} or {dplyr}
  • {usdata}

This workshop contains code that can be run solely inside this web document, but you are encouraged to copy the code to either the Reed R Server or your desktop version of R so that you can save it as a file of your own. The ability to run code in this web-based document is made possible through the package {webr}. The code will run in the browser the same way it would in RStudio, so you can save variables, see error messages, see graphs, etc. To run the code, you can either press the “Run Code” button or highlight sections and run them by pressing control or command and enter, as you would in RStudio.

Here’s an example:

(Run Code and the green play button may not appear right away, there may be a yellow circle while the interface loads. Just wait until you see the green button to do anything.)

Just like in R, if you try to run code that depends on objects that haven’t been created, you will get an error. So you need to run all chunks of code in order for everything to execute properly.

Additionally, if you’re running things on your own R version, then you will need to load the libraries of any packages mentioned. (You may also need to install the packages with install.packages.)


This workshop is part of a series. If you would like to see previous workshops or see the topics of future workshops, please check the workshop schedule.

If you have questions beyond what is covered in the workshops, please feel free to contact Josie at either griffinj@reed.edu or data@reed.edu.

You can also drop in to the open hours at the DataLab for help with R or any other quantitative subject!