Practice

The following questions will use datasets from the package {usdata}. If you’re working on your own version of R, be sure to load that library for the code to execute. If you’re working in the browser, you’re good to go.

1. Joining County Data

Add code to join the population statistics from the dataset county to the demographic statistics from county_2019. I am adding the as_tibble() code here so it doesn’t show you hundreds of rows.

Note: You may need to note the number of rows and columns in each table and the resulting table to see if this works since it will only display a small preview.

Answer

You should have a resulting table of 3,142 rows and 108 columns. (15 columns in county plus 95 columns in county_2019 minus two since we don’t have state and name twice.)

2. Pulling Data for Select States Add data from the urban_rural_pop dataset to this (fake) income dataset.

The income data:

Add your code here (but remember to run the above first).

Answer

Now print out a table of urban/rural data for all of the states not in the table above.

Answer

3. Aggregating Population Data

Combine the data from two datasets of population data.

Datasets:

Add your code here:

Answer

What do you notice is not great about the resulting table and why is this happening? How could you prevent it from happening?

Answer

The issue is that the columns year and cohort_pop are the same in both datasets, so when we combine them we would end up with columns with the same name in a single table. R doesn’t like this, so it will add .x to the first dataset’s column names and .y to the second.

To avoid this we could rename the columns something unique before combining, ex: year_2010 and year_2015 instead of year in both.

4. Multiple Tables

Combine these three datasets into one and preserve all data.

Answer

Notice that even though all three have the same key of state there isn’t a way to combine them all at once. You join the first two, then join the third.

How would I combine the state_pop and state_bird datasets so that the result has no NA values?

Answer