Midterm Exam Answers
Opens: 12:00pm on Wednesday March 4th
Due: 1:00pm on Wednesday March 4th.
18.5 Instructions:
This is an in-class open-note test. You have 1 hour to complete the exam.
I do not accept late work. Exams handed in after 60 minutes will not be graded.
The test is open notes. You can use any course material you wish, including this textbook, your problem sets, the problem set answer keys, and your own notes. You cannot google things, and all use of AI is prohibited.
As with the problem sets, you will be graded on your comprehension of the material in this specific course.
You may not use Chat-GPT or any other AI tool to answer the questions. Violating this policy will result in a score of 0 for the midterm and an immediate referral to the Center for Community Standards & Accountability to decide further appropriate disciplinary action.
Your exam submission will be identical to how you’ve submitted problem sets. You will submit a .rmd file and a knitted html.
The exam will be graded anonymously. Please put your student number on the exam only.
Good luck!
We are going to work with panel data from the Cooperative Election Study (CES). The CES is a large national survey that asks Americans about their political opinions and behavior. In this dataset each respondent was surveyed twice: once in 2010 and once in 2014.
Along with some stable basic demographic information, we have a measure of respondents’ fiscal policy preferences deficit.fix in each interview year. This is a 100 point scale asking how the federal budget deficit should be reduced, where 0 means “all from tax increases” (the liberal option) and 100 means “all from spending cuts” (the conservative option).
There is no missing data in this dataset.
You can load the data in here:
dat <- rio::import("https://github.com/marctrussler/IDS-Data/raw/refs/heads/main/PSCI1800Midterm2026DataLong.Rds", trust=T)1. How many respondents are in these data? How many rows are in these data? Based on that information, and by looking at the dataset, what is the unit of analysis of these data?
head(dat)
#> respondent.id year deficit.fix gender state county
#> 1 1 2010 92 Male TX Gregg
#> 2 1 2014 83 Male TX Gregg
#> 3 24 2010 48 Female IL McLean
#> 4 24 2014 26 Female IL McLean
#> 5 48 2010 99 Male TX Hockley
#> 6 48 2014 99 Male TX Hockley
#> county_fips birth.yr income
#> 1 48183 1934 $100k or more
#> 2 48183 1934 $100k or more
#> 3 17113 1971 $100k or more
#> 4 17113 1971 $100k or more
#> 5 48219 1965 $100k or more
#> 6 48219 1965 $100k or more
nrow(dat)
#> [1] 15458
length(unique(dat$respondent.id))
#> [1] 7729There are 15430 rows in the data and 7715 unique respondents. This is because each respondent is in the data twice, once for each survey year. As such, the unit of analysis of these data is “respondent-year”.
2. Un-comment and edit the code below to reshape the data so that the unit of analysis is the individual respondent. Two new variables will be created in the process. (You do not need to edit the names_prefix option, which I’ve added so that we get the same sensible variable names to work with going forward.)
library(tidyr)
dat <- pivot_wider(dat,
names_from = "year",
values_from = "deficit.fix",
names_prefix = "deficit.fix.")
head(dat)
#> # A tibble: 6 × 9
#> respondent.id gender state county county_fips birth.yr
#> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 1 Male TX Gregg 48183 1934
#> 2 24 Female IL McLean 17113 1971
#> 3 48 Male TX Hockley 48219 1965
#> 4 56 Male MA Essex 25009 1947
#> 5 66 Female WI Dane 55025 1961
#> 6 71 Male PA Berks 42011 1948
#> # ℹ 3 more variables: income <chr>, deficit.fix.2010 <dbl>,
#> # deficit.fix.2014 <dbl>
nrow(dat) == length(unique(dat$respondent.id))
#> [1] TRUEIf you are not able to successfully complete this step, use this code to load in a re-shaped version of the data so you can continue with the exam
dat <- rio::import("https://github.com/marctrussler/IDS-Data/raw/refs/heads/main/PSCI1800Midterm2026DataWide.Rds", trust=T)3. What is the correlation between people’s birth year and deficit.fix.2010? What is the correlation between people’s birth year and deficit.fix.2014? Create a new variable deficit.change which finds the difference between a person’s opinion on this question in 2010 and in 2014. This variable should be calculated such that positive values indicate that someone has moved in a conservative direction (i.e. towards preferring spending reductions). What is the correlation between people’s birth year and this new variable? What do you conclude from these three correlations?
#Correlation of birth year and 2010 deficit opinion
cor(dat$birth.yr, dat$deficit.fix.2010)
#> [1] -0.0452572
#Correlation of birth year and 2014 deficit opinion
cor(dat$birth.yr, dat$deficit.fix.2014)
#> [1] -0.004956423
#Correlation of birth year and change in deficit opinion
dat$deficit.change <- dat$deficit.fix.2014 - dat$deficit.fix.2010
cor(dat$birth.yr, dat$deficit.change)
#> [1] 0.05502341The correlation between birth year and deficit opinion is weakly negative in both 2010 and 2014, indicating that older Americans are slightly less likely to prefer tax increases to spending cuts to reduce the deficit. The correlation between birth year and the change in this measure is positive, however. This indicates that older Americans were more likely to shift towards wanting tax increases as their method of reducing the deficit between 2010 and 2014, and younger Americans were more likely to shift towards wanting spending decreases between 2010 and 2014.
4. What proportion of men born before or during 1975 moved in a conservative direction (towards wanting spending cuts) from 2010 to 2014? What proportion of men born after 1975 moved in a conservative direction? What does this tell you? Hint: What does the mean of a boolean variable represent?
mean(dat$deficit.change[dat$birth.yr<=1975 & dat$gender=="Male"]>0)
#> [1] 0.3318109
mean(dat$deficit.change[dat$birth.yr>1975& dat$gender=="Male"]>0)
#> [1] 0.4512821Men born after 1975 were far more likely to move in a conservative direction compared to men born before 1975. This means that younger men (those with a later birth year) on average became more conservative in this period compared to older men) those with an earlier birth year.
5. Edit the code below to answer this question: For each unique county in the dataset, find the maximum value for deficit.change and the minimum value for deficit.change. Which county has the biggest difference between the individual with the maximum and minimum change? Hint: an easy mistake to make here is to not identify all the unique counties in the data.
#county <- ?????
#max.deficit.change <- rep(NA,???????)
#min.deficit.change <- rep(NA,???????)
###
#??????
###
#deficit.delta <- max.deficit.change - min.deficit.change
county <- unique(dat$county_fips)
max.deficit.change <- rep(NA,length(county))
min.deficit.change <- rep(NA,length(county))
for(i in 1:length(county)){
max.deficit.change[i] <-max(dat$deficit.change[dat$county_fips==county[i]])
min.deficit.change[i] <- min(dat$deficit.change[dat$county_fips==county[i]])
}
deficit.delta <- max.deficit.change - min.deficit.change
which(deficit.delta==max(deficit.delta))
#> [1] 35
county[35]
#> [1] "04013"
#
unique(dat$county[dat$county_fips==county[35]])
#> [1] "Maricopa"6. Below I calculate the mean and standard deviation of deficit.fix.2010. What is the standard error of this sample mean? In words, what does that mean?
The standard error of the sample mean is \(\frac{sd(x)}{\sqrt{n}}\). Which in this case is:
Every time we take a new sample of 7729 people we are going to get a slightly different sample. That means that the mean of deficit.fix.2010 will be slightly different in each sample. The standard error tells us how much these sample means will vary from the truth in the population, on average. So each time we take a sample we expect around .32 points of variation. On a scale of 0-100 that’s not that much!