Regression Questions
This is a problem set question from a previous semester when there was a problem set on Regression. This is here as an example of the sort of things I might ask about regression.
Question 2 (50 points)
Load in the file acs
, which contains demographic information on every country. This dataset also includes the columns cases.per.1000
, which gives the total number of COVID cases per 1000 residents recorded in that county on April 30, 2020.
library(rio)
acs <- import("https://github.com/marctrussler/IIS-Data/raw/main/ACSCountyCOVIDData.Rds")
#> Warning: Missing `trust` will be set to FALSE by default
#> for RDS in 2.0.0.
Plot the bivariate relationship between
percent.transit.commute
on the x-axis andcases.per.1000
on the y-axis. Describe what you find.Estimate the bivariate regression: \(\hat{Cases.per.1000} = \hat{\alpha} + \hat{\beta_{t}Perc.Transit_i}\). Evaluate and explain the intercept and \(\hat{\beta_{t}}\). Interpret the hypothesis test being performed for \(\hat{\beta_{t}}\).
In words: why might this relationship between transit commuting and COVID cases be misleading?
Look in the data set for a variable that might be an omitted variable for the regression in (b) (there is not one answer here), why does variable meet the criteria? Use correlation to show it meets the minimum criteria to be an omitted variable.
Now re-estimate the regression from (b) and add in your proposed omitted variable. Interpret the intercept and the two slope coefficients. How did the coefficient on \(\hat{\beta_{t}}\) change?