PSCI1800: Intro to Data Science
Undergraduate course, University of Pennsylvania, Program on Opinion Research and Election Studies, 2024
Understanding and interpreting large, quantitative data sets is increasingly central in social science and the business world. Whether one seeks to understand political communication, international trade, inter-group conflict, or a host of other issues, the availability of large quantities of digital data has revolutionized how questions are asked and answered. The ability to quickly and accurately find, collect, manage, and analyze data is now a fundamental skill for quantitative researchers. The answers to a range of important questions lie in publicly available data sets, whether they are election returns, survey results, journalists’ dispatches, or a range of other data types.
Becoming an effective Data Scientist requires two related, but distinct, skill sets: technical proficiency and theoretical knowledge of statistics. Most courses try to teach both at once. This course, instead, will focus primarily in the first: building your skills in data acquisition, management, and visualization. Leaving this course, students will be able to acquire, format, analyze, and visualize various types of data using the statistical programming language R.
A secondary learning goal of this class is to be able to write and talk about statistics in a concise and clear fashion. Being able to run the most complicated statistics in the world is unhelpful if you can not explain (particularly to non-specialists) what you have found and why they should care. Too many high school and college classes emphasize long essays, when the primary skill you will need is to write short reports (or, let’s be honest, emails) to quickly communicate an idea or finding. In this class we will emphasize this type of writing.
While this course is not a statistics class, we will discuss (in non-technical terms) the fundamental nature of statistics, particularly the important concepts of uncertainty and causality. The expectation is that you take further courses to build on this knowledge. PSCI 3800 “Applied Data Science” & PSCI 1801 “Statistical Methods” are designed to be a direct follow-ups to this course.