Agenda
Agenda
Today we’ll be learning some steps for cleaning and wrangling data. “Cleaning” data generally refers to fixing incorrect data, removing missing data, and standardizing names. “Wrangling” data typically refers to transforming data so that it is ready for analysis. Both of these steps often end up being 90% of working on a project. As you’ll see later in this series, once your data is in the right format, you can run complex analyses with a single line of code. However, getting it into that format can take some time and patience.
Today we’ll use three functions from the {dplyr} package within {tidyverse} to help us start on cleaning and wrangling.
After today you should be able to:
rearrange columns in a spreadsheet
remove columns or add new ones
filter your data based on a condition
understand basic logical operators for NOT, AND, and OR
This workshop contains code that can be run solely inside this web document, but you are encouraged to copy the code to either the Reed R Server or your desktop version of R so that you can save it as a file of your own. The ability to run code in this web-based document is made possible through the package {webr}
. The code will run in the browser the same way it would in RStudio, so you can save variables, see error messages, see graphs, etc. To run the code, you can either press the “Run Code” button or highlight sections and run them by pressing control or command and enter, as you would in RStudio.
Here’s an example:
(Run Code and the green play button may not appear right away, there may be a yellow circle while the interface loads. Just wait until you see the green button to do anything.)
This workshop is part of a series. If you would like to see previous workshops or see the topics of future workshops, please check the workshop schedule. All workshops are held on Mondays at 5:30 in ETC 208.
If you have questions beyond what is covered in the workshops, please feel free to contact Josie at either griffinj@reed.edu or data@reed.edu.
You can also drop in to the open hours at the DataLab for help with R or any other quantitative subject!