Syllabus

Current as of 2026-02-16


Lecture

MW 12:00-1:00 (ARCH 208)

Recitations

R: 10:15-11:15 (PCPE 202)

R: 12:00-1:00 (PCPE 202)

F: 10:15-11:15 (PCPE 225)

F: 12:00-1:00 (PCPE 225)


Dr. Marc Trussler

TAs:

Course Description

Understanding and interpreting large, quantitative data sets is increasingly central in social science and the business world. Whether one seeks to understand political communication, international trade, inter-group conflict, or a host of other issues, the availability of large quantities of digital data has revolutionized how questions are asked and answered. The ability to quickly and accurately find, collect, manage, and analyze data is now a fundamental skill for quantitative researchers. The answers to a range of important questions lie in publicly available data sets, whether they are election returns, survey results, journalists’ dispatches, or a range of other data types.

Becoming an effective data scientist requires two related, but distinct, skill sets: technical proficiency and theoretical knowledge of statistics. Most courses try to teach both at once. This course, instead, will focus primarily in the first: building your skills in data acquisition, management, and visualization. Leaving this course, students will be able to acquire, format, analyze, and visualize various types of data using the statistical programming language R.

A secondary learning goal of this class is to be able to write and talk about statistics in a concise and clear fashion. Being able to run the most complicated statistics in the world is unhelpful if you can not explain (particularly to non-specialists) what you have found and why they should care. Too many high school and college classes emphasize long essays, when the primary skill you will need is to write short reports (or, let’s be honest, emails) to quickly communicate an idea or finding. In this class we will emphasize this type of writing.

While this course is not a statistics class, we will discuss (in non-technical terms) the fundamental nature of inferential statistics. The expectation is that you take further courses to build on this knowledge. PSCI 1801 Statistical Methods, PSCI 3800 Applied Data Science, and PSCI 3802 Political Polling are designed to be direct follow-ups to this course.

While no background in statistics, political science, or computer science is required, students are expected to be generally familiar with contemporary computing environments (e.g. know how to use a computer, download new software, find the path to saved files etc.) and have a willingness to learn a wide variety of data science tools. Instructions will follow on software to be installed prior to the first class.

Expectations and policies

Course Slack Channel

We will use Slack to communicate as a class. You will receive an invitation to join our channel shortly after the start of class. One of the better things to come about through the pandemic is the use of Slack for classroom communications. It is a really good tool to allow us to send quick and informal messages to individual students or groups (or for you to message us). Similarly, it allows you to collaborate with other students in the class, and is a great place to get simple questions answered.

Because I will be making announcements exclusively via Slack, it is extremely important you get this set up.

Format/Attendance

The course will have two components: weekly lectures and a recitation.

The lectures will be in person. While they will be more instructional/lecture based in format, there is an expectation of some amount of participation and feedback. The lectures will not be recorded, though this textbook contains my notes, and the accompanying R code will be provided. There is no need to inform me if you are going to miss class.

While it is your decision whether or not to attend class, you are responsible for the knowing the things that I say in class. This includes any announcements or clarifications I make regarding assignments or tests.

The recitations will also be in person. Attendance will not be taken, though you are highly encouraged to participate. The purpose of the recitations is to provide a smaller class format for you to ask questions, practice techniques, and to debug code with the TA. The answers to problem sets will also be covered in these sessions.

Academic integrity

I expect all students to abide by the rules of the University and to follow the Code of Academic Integrity.1

For Problem Sets: Collaboration on problem sets is permitted. However, the write-up and code that you turn in must be your own creation. Code cannot be copy/pasted between students.

For Exams: Exams will be taken individually in-person without collaboration. The use of “Chat-GPT” or other AI software on exams is prohibited.

Re-grading of assignments

All student work will be assessed using fair criteria that are uniform across the class. If, however, you are unsatisfied with the grade you received on a particular assignment (beyond simple clerical errors), you can request a re-grade using the following protocol. First, you may not send any grade complaints or requests for re-grades until at least 24 hours after the graded assignment was returned to you. After that, you must document your specific grievances in writing by submitting a PDF or Word Document to the teaching staff. In this document you should explain exactly which parts of the assignment you believe were mis-graded, and provide documentation for why your answers were correct. Dr. Trussler will then re-score the entire assignment (including portions for which you did not have grievances), and the new score will be the one you receive on the assignment (even if it is lower than your original score).

Late policy

Notwithstanding everything below: exceptions to all of these policies will be made for health reasons, extraordinary family circumstances, religious holidays etc. I am extremely reasonable and lenient, as long as you discuss with me potential issues before the deadline.

For problem sets: You are granted 5 “grace days” throughout the semester. Over the course of the semester you can use these when you need to turn problem sets in late. You can only use 3 grace days on any given assignment. You do not have to ask to use these days. This is counted in whole days, rounded up. If a problem set is turned in at 7:01pm the day it is due (i.e. 1 minute late) you will have used 1 grace day. If you turn the problem set in at 7:01pm the day after it is due (i.e. 24 hours and 1 minute late) you will have used 2 grace days etc. Choosing to not complete a problem set (see policy below) does not affect your grace days.

Outside of this grace day policy (including when you have used up your grace days) I do not accept late work.

Assessment and grading

  • Participation (6%)

    This portion of your grade mixes three components, each worth 2% of your final grade:

    1. Traditional participation including: asking and answering questions in lecture and in recitations, asking and answering questions on the course Slack, attending office hours, or working with teaching staff on your final paper.

    2. The completion of weekly “check-in” quizzes on Canvas. These will be available each week, will only take a few minutes, and will be graded by completion (not correctness).

    3. You will share your “main finding” of your final paper in recitation on Thursday April 23rd and 24th and receive peer feedback.

  • Problem sets (16%)

    • Four problem sets.

    • Completed using Rmarkdown. Submissions will include a knitted html file and the associated .RMD file.

    • Scored out of 100. Having answers that strictly produce the “Correct” output from R will result in a grade of 90/100. 90+ grades are reserved for submissions that have all the correct answers, have code that is cleanly and effectively written, and have written explanations that clearly and concisely articulate the findings.

    • There are many ways to do things in R. This course is designed with a particular sequence of knowledge designed to maximize your potential as an R user. As such, in problem sets you will be assessed on the degree to which you have learned the content of this course specifically.

    • You are free to do as many of the problem sets as you like. If you do not complete a problem set, the percentage points for that assignment will be transferred to the midterm (for PS1 and PS2), or the final exam (for PS3, PS4). For example if you don’t complete PS2, the midterm would then be worth 27% of your final grade (23% + 4%). If you don’t complete PS3 & PS4, the final exam would be worth 43% of your final grade (35% + 4% + 4%).

  • Midterm (23%)

    • An in-class exam that will take place during our usual class period on March 4th.

    • The test is open notes. You can use any course material you wish, including this textbook, your problem sets, the problem set answer keys, and your own notes. You cannot Google things or use any external sources.

    • You may not use Chat-GPT or any other AI tool to answer the questions. Anyone caught using these tools will get a 0 on the exam and will be immediately referred to the office of student affairs.

  • Paper (20%)

    • Due: April 29th. The final paper of this course is to produce a short (less than 600 words) data-journalism style blog post that makes use of data. For this project you will find your own data and use it to produce an article suitable for a non-technical audience. Please see the assignment detail and rubric on Canvas for specifics. This project brings together the two learning goals of this course: the technical ability to find, clean, and present data; as well as the ability to write about your findings in a clear and persuasive way. Accordingly, you will be graded on both the quality and rigorousness of your statistical findings, as well as the coding, presentation, and writing of the piece. To emphasize: a major component of this project and of your grade is determined by how you code and how you write your results up. 600 words is short for a final paper. As such, I would highly encourage you to start work on this early. Many undergrads spend 95% of their time writing and 5% of their time editing. (In your working life post-undergrad these two percentages will be almost exactly flipped!) Given the amount of time and the light word count, my expectation is that you meet with the teaching team to talk about your research question relatively early, and spend the majority of the time editing your work, not writing.
  • Final Exam (35%)

    • An in-person exam that will take place during the final exam period.

    • The test is open notes. You can use any course material you wish, including this textbook, your problem sets, the problem set answer keys, and your own notes. You cannot Google things or use any external sources.

    • You may not use Chat-GPT or any other AI tool to answer the questions. Anyone caught using these tools will get a 0 on the exam and will be immediately referred to the office of student affairs.

Grade scale

Letter grades at the conclusion of the class will be assigned using the following scale. I do not round grades. If your grade is in one of the bands below you will receive that grade.

\[\begin{aligned} 97 \leq Grade: &A+\\ 93 \leq Grade <97: &A\\ 90 \leq Grade <93: &A-\\ 87 \leq Grade <90: &B+\\ 83 \leq Grade <87: &B\\ 80 \leq Grade <83: &B-\\ 77 \leq Grade <80: &C+\\ 73 \leq Grade <77: &C\\ 70 \leq Grade <73: &C-\\ 67 \leq Grade <70: &D+\\ 63 \leq Grade <67: &D\\ 60 \leq Grade <63: &D-\\ Grade <60: &F \end{aligned}\]

Computing

The course will require students to have access to a personal computer in order to run the statistics software. If this is not possible, please consult with one of the instructors as soon as possible. Support to cover course costs is available through Student Financial Services.

We will use R in this class, which you can download for free at https://www.r-project.org/. R is completely open source and has an almost endless set of resources online. Virtually any data science job you could apply nowadays to will require some background in R programming.

While R is the language we will use, RStudio is a free program that makes it considerably easier to work with R. After installing R, you will install RStudio https://www.rstudio.com. Please have both R and RStudio installed by the end of the first week of classes.

If you’re having trouble installing either program, there are more detailed installation instructions on the course Canvas page.

Textbooks

The reading load for this course will be relatively light, with the expectation that your primary task outside of class hours will be working on problem sets and reviewing material. That being said, textbook chapters that supplement the lectures are included, and reading through them before lecture will be helpful.

We will be using three supplementary textbooks for this course. The first two are available for free online through the library website., the second is a free online textbook.

Three additional books that I have found helpful in my development as a data scientist:

  • Data Analysis for Social Science: A Friendly and Practical Introduction. Elena Llaudet & Kosuke Imai (This is also the textbook for PSCI 1801)

  • The Functional Art: An introduction to information graphics and visualization by Alberto Cairo

  • On Writing Well by William Zinsser

Class Schedule

Week 0: (No Monday Class) January 14

What is Data Science?

Week 1: (No Monday Class) January 21

What is Data?

Leonard Mlodinow. The Drunkard’s Walk: How Randomness Rules Our Lives. (Excerpts on Canvas)

Week 2: January 26 - January 28

Basic R & RMarkdown

Davies Chapter 2

R Markdown: The Definitive Guide Chapter 2 Basics

January 27: course selection period ends

Week 3: February 2 - February 4

Conditional Logic and Sub-Setting

Freeman and Ross Chapter 7.3-7.5

Week 4: February 9 - February 11

Dataframes

Davies 5.2

Problem Set 1 Due Wednesday 7pm.

Week 5: February 16 - February 18

Cleaning and Reshaping

Freeman and Ross Chapter 12 (tidyr reshaping)

Week 6: February 23 - February 25

For Loops

Davies Chapter 10

Problem Set 2 Due Wednesday 7pm.

February 23: Drop period ends

Week 7: March 2 - March 4

Review/Midterm

Midterm Exam in-class Wednesday, March 4.

Week 8: March 9 - March 11

Spring Break

Week 9: March 16 - March 18

Collecting and Merging Data

Freeman and Ross Chapter 11.5

Week 10: March 23 - March 25

If/Functions

Problem Set 3 Due Wednesday 7pm.

March 23: Grade type change deadline.

Week 11: March 30 - April 1

Tidyverse I

R4DS Chapter 3

March 31: Withdrawal deadline

Week 12: April 6 - April 8

Tidyverse II

R4DS Chapter 5; Chapter 12.5; Chapter 19; Chapter 25, Chapter 26

Week 13: April 13- April 15

Writing and Visualizing

Zinsser. On Writing Well. (Excerpts on Canvas).

Badger et al. 2018. . NYT Upshot.

Problem Set 4 Due Wednesday 7pm.

Week 14: April 20 - April 22

Regression I

Final Paper Peer Presentations During your Recitation Times.

Slides to be submitted by Wednesday 7pm.

Week 15: April 27 - April 29

Regression II

Davies 20.1 - 20.3 & 20.5

Final Paper due Wednesday, April 29th at 11:59pm.

Final Exam

To be scheduled by the university.