Final Project Instructions and Rubric

For the final project of PSCI3802 you are going to be a one-person survey firm.

While you worked in teams to write questions, the final project will be done solo. I am fine with you discussing things with your classmates, but all work handed in must be your own. In particular, no code should be shared.

Provided to you is the survey instrument that we fielded (on Canvas) along with the raw data I downloaded from Qualtrics. Truly: besides stripping out a couple of pieces of metadata, this is exactly what I downloaded from the website.

The raw data file has been uploaded to github. You can load it using the code and url below. For simplicity I’ve uploaded the data as a list of the 5 datasets. As such, the start of your code should look something like:

all.modules <- rio::import("https://github.com/marctrussler/IIS-Data/raw/refs/heads/main/F2025ProjectRawData.Rds", trust=T)
names(all.modules)
#> [1] "imm"   "econ"  "ai"    "democ" "elect"
dat <- all.modules$imm

Note that you can access information about the variables using the attributes() function.

From these data you will generate three documents:

  1. An article detailing the results of your module of the survey. You can write this using Rmarkdown or in a normal word processor. If you do the latter, please submit as a pdf. I want you to strive to make this similar to the things we put out for NBC. See examples, here, here, and here. You do not need to discuss each and every number from your module in this article. Indeed, that would be a bad call. I want you to use your judgement to figure out what the interesting things in the survey are, and to write those in a way that normal people would want to read. Also included with this document will be a methodology statement that clearly explains the data cleaning and weighting steps that you took, and discusses the sources and magnitude of error in the poll. For examples of this see the methodology statements at the bottom of this document, or from the NYT.

  2. A results document that shows the topline results for each question as well cross-tabs for whatever demographic and political groups you think are relevant. See an example from NBC here, and NYTimes here. Your crosstabs do not need to be as extensive as this, but you should strive to have results broken down into categories people would want to see.

  3. An R script that starts from the raw data and clearly and accurately produces the numbers that are present in both (1) and (2). This file should start with the exact input you download, and then walk through the steps of cleaning, weighting, and analyzing the data. You are welcome to use Rmarkdown, in which case you can do some version of combining documents 1 & 3. You can either have cleaning script that exports cleaned and weighted data that is then used in an Rmarkdown to create the article, or simply have all your code in the Rmarkdown. In either case, make sure that (1) there is no raw R code or output in your final article; (2) that all constituent parts (the R script and the Rmd file) will run/knit on my computer.

Rubric

Written Component (50 points)

  • Article looks professional, and contains no code or raw R output.

  • Article presents a thoughtfully curated set of results that tells a cohesive story.

  • At least one main result is analyzed with subgroups.

  • Article is written in a way that would be interesting and digestible to a lay audience.

  • Jargon is minimized, and if present, explained to a lay audience.

  • Figures are simple, clearly labeled, and explained in the text.

  • Methodology statement discusses in clear terms the cleaning and weighting decisions that went into the poll, as well as the target population and sample size. Statement includes discussion of the potential sources of the error including, but not limited to, sampling error.

Topline/Crosstabs Document (25 points)

  • Document looks professional and is easy to read and get the necessary information from.

  • Topline and crosstab numbers are given for every question asked of the respondents (i.e. the questions in your module and all common questions.)

  • Results are clearly labeled with question text.

  • Crosstabs are presented for all questions across an appropriate set of demographic and political variables (Probably 3-4 demographic categories and 2-3 political categories, but use your best judgement). Crosstabs include the (weighted) size of each group.

R script (25 points)

  • The R script loads the raw data from the github using the code and link above. When I press Run on the R script the whole script runs with no errors. -10 points off immediately if the file produces an error and does not run.

    • If any additional data sources are used: (1) include those files in your submission; (2) Have your R script load those files at the very top of the script so that I can load them in too.
  • Code is well organized and easy to read.

  • Code comments are used to explain what each step is doing.

  • Survey is weighted to appropriate population targets using the “pewmethods” package.

  • Code checks that survey weights were appropriately applied.

  • All topline and crosstab estimates are calculated using the survey weights.

  • Code produces all numbers that are present in the article and results document.