Syllabus

Current as of 2025-01-09


Lecture: MW 12-1:30pm (PCPE 101)


Dr. Marc Trussler

TA: Dylan Radley

  • dradley@sas.upenn.edu

  • Fox-Fels Hall 35 (3814 Walnut Street)

  • Office Hours

    • Monday 10:30-11:30

    • Wednesday 2:30-4:30

Course Description

The first step of many data science sequences is to learn a great deal about how to work with individual data sets: cleaning, tidying, merging, describing, and visualizing data. These are crucial skills in data analytics (and are covered extensively in PSCI1800 and PSCI3800), but describing a data set is not our ultimate goal. The ultimate goal of data science is to make inferences about the world based on the small sample of data that we have.

PSCI 1801 shifts focus to this goal of inference. Using a methodology that emphasizes intuition and simulation over mathematics, this course will cover the key statistical concepts of probability, sampling, distributions, hypothesis testing, and covariance. The goal of the class is for students to ultimately have the knowledge and ability to perform, customize, and explain bivariate and multivariate regression. Students who have not taken PSCI-1800 should have basic familiarity with R, including working with vectors and matrices, basic summary statistics, visualizations, and for() loops.

Expectations and policies

Prerequisite knowledge

PSCI 1800 or similar R course. To help us better understand the nature of inferential statistics, we will be running quite a lot of simulations in R. Students entering the class should have a working knowledge of the R programming language, and in particular know how to use square brackets to index vectors and to run for() loops. We will be doing a short refresher on these concepts in the first two weeks of class.

While math will be minimized in this class I assume knowledge of basic algebra (decimals, fractions, exponents, equations, variables, inequalities). There will be a small amount of calculus in the course, but I will assume no previous knowledge.

Course Slack Channel

We will use Slack to communicate with the class. You will receive an invitation to join the our channel shortly after the start of class. One of the better things to come through the pandemic is the use of Slack for classroom communications. It is a really good tool to allow us to send quick and informal messages to individual students or groups (or for you to message us). Similarly, it allows you to collaborate with other students in the class, and is a great place to get simple questions answered. Because we will be making announcements via Slack, it is extremely important you get this set up.

Format/Attendance

The lectures will be in person. While this is not a discussion-based class, there is an expectation of some amount of participation and feedback. Attendance will not be recorded, though do note you are scored on participation.

Computers

The course will require students to have access to a personal computer in order to run the statistics software. If this is not possible, please consult with one of the instructors as soon as possible. Support to cover course costs is available through (https://srfs.upenn.edu/sfs)[Student Financial Services].

Academic integrity

We expect all students to abide by the rules of the University and to follow the Code of Academic Integrity.1

For Problem Sets: Collaboration on problem sets is permitted. Ultimately, however, the write-up and code that you turn in must be your own creation.

For Exams: Exams will be taken individually in-person without collaboration. The use of “Chat-GPT” or other AI software on exams is prohibited.

Late policy

Late work will not be accepted. Notwithstanding that, the teaching staff are extremely reasonable and lenient. If there are circumstances such as religious holidays or family emergencies that are impeding your ability to keep up with the class please let us know. Importantly, we are much better at helping you if we are made aware of potential issues before deadlines.

Assessment and grading

  • Participation (5%)

    • Traditional participation including: asking and answering questions in lecture and in recitations, asking and answering questions on the course Slack, or attending office hours. All of these are complements for one another.
  • Problem sets (45%)

    • The purpose of the problem sets is for you to have an opportunity to do self-directed learning – applying the material we have covered in class and preparing for the exams

    • Five problem sets (roughly every two weeks)

    • Feedback will be given on all problems.

    • Any question in which you make a good-faith attempt to provide an answer (even if that answer is wrong or incomplete) will be given a score of 100%.

    • That’s right: as long as you try to answer the question (or at least discuss your thought process) you will get 100%. I want to incentive learning, not for you to worry about getting things right. The problem sets are there for you to explore, try, and practice. They are absolutely the best preparation for the….

  • In-class midterm (20%)

    • An in-class exam that will take place during our usual class period.

    • The test is open book. You can use any material you wish, including this textbook, your problem sets, the problem set answer keys, and your own notes. You can Google things – though you are almost certainly better off just using the class notes.

    • You may not use Chat-GPT or any other AI tool to answer the questions. Anyone caught using these tools will get a 0 on the exam and will be referred to the office of student affairs.

  • Final Exam (30%)

    • In-person final exam during the university final-exam period.

    • The test is open book. You can use any material you wish, including this textbook, your problem sets, the problem set answer keys, and your own notes. You can Google things – though you are almost certainly better off just using the class notes.

    • You may not use Chat-GPT or any other AI tool to answer the questions. Anyone caught using these tools will get a 0 on the exam and will be referred to the office of student affairs.

Grade scale

Letter grades at the conclusion of the class will be assigned using the following scale. I do not round grades. If your grade exceeds one of the thresholds below you will receive the grade.

\[\begin{aligned} 97 \leq Grade: &A+\\ 93 \leq Grade <97: &A\\ 90 \leq Grade <93: &A-\\ 87 \leq Grade <90: &B+\\ 83 \leq Grade <87: &B\\ 80 \leq Grade <83: &B-\\ 77 \leq Grade <80: &C+\\ 73 \leq Grade <77: &C\\ 70 \leq Grade <73: &C-\\ 67 \leq Grade <70: &D+\\ 63 \leq Grade <67: &D\\ 60 \leq Grade <63: &D-\\ Grade <60: &F \end{aligned}\]

Computing

We will use R in this class, which you can download for free at https://www.r-project.org/. R is completely open source and has an almost endless set of resources online. Virtually any data science job you could apply nowadays to will require some background in R programming.

While R is the language we will use, RStudio is a free program that makes it considerably easier to work with R. After installing R, you should install RStudio https://www.rstudio.com. Please have both R and RStudio installed by the end of the first week of classes.

If you’re having trouble installing either program, there are more detailed installation instructions on the course Canvas page.

Textbook

There is one mandatory textbook for this course and two optional:

  • Data Analysis for Social Science: A Friendly and Practical Introduction. Elena Llaudet & Kosuke Imai. (Mandatory).

    • I have chosen this book because it does a really good job of weaving in the basics of statistics with the use of R. Generally speaking the assigned readings from this book will be slightly less technical than what is in the class notes. This book is available at the bookstore and from Amazon. There is only one edition, but be sure to get the (way cheaper) paperback version.
  • Quantitative Social Science: an Introduction. Kosuke Imai.

    • This is the original, graduate level, textbook the Llaudet and Imai textbook is based on. The chapters are largely the same, but this textbook is much more math intensive. I have included below the equivalent readings (labeled QSS) if you want to go into greater detail. These readings are completely optional.
  • Statistics: Fourth Edition. Freedman, Pisani, Purves. (Optional).

    • This textbook has a slightly more conversational and intuitive approach, but does not incorporate those lessons with R. While having this book is not mandatory I really like the style and common-sense explanations of this book. It’s a great companion to have around.

Class Schedule

Week 1: August 28 (No Monday Class)

Introduction: The population is the point.

Excerpt from Mlodinow (on Canvas).

Week 2: (No Monday class) - September 4

R Review

Llaudet & Imai 1

Week 3: September 9 - September 11

Probability

Llaudet & Imai 6.1,6.2,6.7

(QSS 4.11, 6.1)

September 10: course selection period ends

Week 4: September 16 (No Wednesday Class)

Conditional probability and independence

(QSS 6.3)

Week 5: September 23 - September 25

Random Variables I: Discrete

Llaudet & Imai 6.4.1

(QSS 6.3)

Problem Set 1 Due Wednesday 7pm.

Week 6: September 30 - October 2

Random Variables II: Continuous

Llaudet & Imai 6.4.2-6.4.4

(QSS 6.4)

Week 7:October 7 - October 9

Sampling and confidence intervals

Llaudet & Imai 6.5.1,6.5.2

(QSS 7.1)

October 7: Drop period ends

Problem Set 2 Due Wednesday 7pm.

Week 8: October 14 - October 16

Review

First Midterm Exam period Wednesday in class period..

Week 9: October 21 - October 23

Standard error of the mean/Hypothesis Tests

Llaudet & Imai 6.5.3

October 25: Grade type change deadline.

Week 10: October 28 - October 30

Standard error of the mean/Hypothesis Tests

Llaudet & Imai 7.1 7.3 7.4

(QSS 7.2)

On Monday the 28th we will take a class field trip to the NBC News Decision Desk.

Problem Set 3 Due Wednesday 7pm.

Week 11: November 4 - November 6

Power

November 4: Withdrawal deadline

November 6 class may be cancelled as it’s the day after the election.

Week 12: November 11 - November 13

Two continuous variables and covariation

Llaudet & Imai 3.5

(QSS 3.6)

Week 13: November 18 – November 20

Correlation and bivariate regression

Llaudet & Imai 4.3

(QSS 4.2)

Problem Set 4 Due Wednesday 7pm.

Week 14: November 25 - (No Wednesday Class)

Multivariate Regression I

Llaudet & Imai 2.1-2.4

Week 15: December 2- December 4

Multivariate regression II

Llaudet & Imai 5.1-5.5

Week 16: December 9

Interaction and Prediction with regression

Excerpt from Kam and Franzese (Canvas)

Llaudet & Imai 4.5-4.6

(QSS 7.3.1,7.3.2)

Problem Set 5 Due Monday 7pm.

Final Exam period: December 12 - December 19

In Person Final Exam. December 13, 3pm.

(Check the final exam schedule but as of this writing that is when it is scheduled.)