Problem Set 1
Problem Set Due Wednesday February 12 at 7pm on Canvas.
For this problem set you will hand in a .Rmd file and a knitted html output. There is a .Rmd template file on the assignment page on Canvas you can use to write your answers.
Please make life easy on us by using comments to clearly delineate questions and sub-questions.
Comments are not mandatory, but extra points will be given for code that clearly explains what is happening and gives descriptions of the results of the various tests and functions.
Reminder: Put only your student number on the assignments, as we will grade anonymously.
Collaboration on problem sets is permitted. Ultimately, however, the write-up and code that you turn in must be your own creation. You can discuss coding strategies or code debugging with your classmates, but you should not share code in any way. Please write the names of any students you worked with at the top of each problem set.
Note: As I mentioned in class a key skill with R is to use square brackets to access and subset of information. The purpose of this assignment is to get use to using the square brackets to index vectors and matrices with column and row numbers. That is what you are to do in all questions. If you use the $
operator to answer any of these questions you will not receive full credit.
Question 1 (45 points)
Below is a table of data on electric vehicle registrations and population for 5 US states. This data is from the US Census Bureau (https://www.census.gov/quickfacts/fact/table/NY,PA,CO,FL,WA/PST045222) and the Department of Energy’s Alternative Fuels Data Center (https://afdc.energy.gov/data)
state | pop | electric.vehicles |
---|---|---|
1 | 19.67 | 84670 |
2 | 12.97 | 47440 |
3 | 5.84 | 59910 |
4 | 22.25 | 167990 |
5 | 7.78 | 104050 |
(a) Using R, create three vectors: state
, pop
(which is the state population in millions) and electric.vehicles
, which correspond to the data in this table.
(b) Create an object cars
, which combines these three columns into a matrix.
(c) Using built in R functions, report what the mean, median, max, and min is of the 2nd column of cars
. In words: what do these numbers represent?
(d) Using built in R functions, report what the mean, median, max, and min is of the 2nd row of cars
. In words: what do these numbers represent?
(e) Create a new vector, electric.vehicles.per.1000
that is the number of electric vehicles per 1000 people in the state. Add this vector to your matrix.
(f) Create a scatterplot with population on the x-axis, and electric.vehicles.per.1000
on the y-axis. (Bonus, using the text()
function, replace the “dots” with the state abbreviations. The states are: “NY”,“PA”,“CO”,“FL”, “WA”)
(g) What would you say about the relationship between the size of a state and the adoption of electric vehicles in these data? Are you confident that this relationship is true in the real world?
Question 2 (45 points)
Run the following code to load the data frame soc.media
into R:
library(rio)
soc.media <- import("https://github.com/marctrussler/IDS-Data/raw/main/PS1Data.Rds", trust=T)
(a) Use R to report the class of each variable in soc.media
.
(b) How many men spend 3 or more hours per day on social media? What is the preferred network of men who spend more than 3 hours per day on social media?
(c) How many women over 45 say their favorite network is facebook or instagram?
(d) Create and interpret a scatterplot where age is on the x-axis and daily social media use is on the y-axis.
(e) Using the variable much.time
(which indicates whether the respondent thinks they spend too much time on social media, or not), alter the scatterplot created in (d) so that we can see the relationship between age and daily social media use separately for those who think they spend too much time on social media and those that do not.
Question 3 (10 points)
Pretend I run a monthly survey of Americans where I determine the proportion of Americans who approve of Donald Trump. Below is a table for two months (m
) that gives the number of people in the survey (n
), the proportion who support Trump (p
), and the standard deviation of the amount of support for Trump (s
).
m | n | p | s |
---|---|---|---|
1 | 10 | 0.56 | 0.25 |
2 | 10000 | 0.51 | 0.25 |
In which month are you most confident that the true level of support for Trump among all Americans is not 50%? You do not need a precise answer to this question, but you should do a couple of calculations to help you decide.