# 代做report | assignment | R代写 | 统计代写 – Homework # 2 Assignment

### Homework # 2 Assignment

For this assignment (and all assignments in this class), you are to turn in your work as an R Markdown document. Submit your solutions to GradeScope in PDF format.

#### Problem 1

For this problem, we will be using the netflix_titles.csv file which is a dataset from a Netflix Movies and TV Shows study. Most of the column names are reasonably descriptive. The only columns that may be confusing is AgeCHDdiag which is the age the patient was diagnosed as having CHD ( Chronic heart disease ) and MRW which stands for Metropolitan Relative Weight , an old medical index of a ratio of a persons weight compared to a desirable weight for that persons height and gender.

``````a. For each attribute, identify (by replacing the xxxin the table) whether it is a measure (quantitative) or a
category (qualitative) and then for the measures whether it is of type interval or ratio and for the categories
whether it is of type ordinal or nominal: (10 points)
``````
``````netflix <- read.csv("netflix_titles.csv")
sapply(netflix, class)
``````
``````## show_id type title director cast country
## "character" "character" "character" "character" "character" "character"
## date_added release_year rating duration listed_in
## "character" "integer" "character" "character" "character"
``````
``````Attribute Category/Measure Interval/Ratio or Nominal/Ordinal
``````
``````show_id Category Nominal
``````
``````type Category Nominal
``````
``````title Category Nominal
``````
``````director Category Nominal
``````
``````cast Category Nominal
``````
``````country Category Nominal
``````
``````date_added Measure Interval
``````
``````release_year Measure Interval
``````
``````rating Category Nominal
``````
``````duration Measure Interval
``````
``````listed_in Category Nominal
``````
``````b. Read the file into a tibble, convert the char columns to factors, and add a new binary column PG_13 that
has a value of 1 if the movie is rated as PG-13 and a 0 otherwise (5 points)
``````
``````library (tidyverse)
``````
``````##  Attaching packages
tidyverse 1.3.1
``````
``````##  ggplot2 3.3.5  purrr 0.3.
##  tibble 3.1.6  dplyr 1.0.
##  tidyr 1.1.4  stringr 1.4.
``````
``````##  Conflicts
tidyverse_conflicts()
``````
``````netflix <- as_tibble(read.csv("netflix_titles.csv"))
for (col in colnames(netflix)){
if (class(netflix[[col]]) == "character"){
netflix[[col]] <- as.factor(netflix[[col]])
}
}
netflix[["PG-13"]] <- 0
netflix[netflix[["rating"]] == "PG-13","PG-13"] <- 1
``````
``````c. Write a command to calculate the average duration for TV Shows and Movies. (5 points)
``````
``````get_duration <- function (duration){
return (as.numeric(strsplit(duration, split = " ")[[ 1 ]][ 1 ]))
}
``````
``````digit_duration <- NULL
for (row in c( 1 :nrow(netflix))){
digit_duration <- c(digit_duration, get_duration(as.character(netflix[["duration"]][ro
w])))
}
netflix[["digit_duration"]] <- digit_duration
aggregate(netflix[["digit_duration"]], by=list(netflix[["type"]]), mean, na.rm=T)
``````
``````Group.
<fct>
``````
``````x
<dbl>
``````
``````Movie 99.
``````
``````TV Show 1.
``````

2 rows

``````Previous 1 2 3 4 5 6 ... 19 Next
``````
``````d. Write a command to calculate the total number of movies in each country and the number of those
movies that are rated PG-13. Return the result in descending order of number of PG-13 movies. (5 points)
``````
``````get_countries <- function (country){
return (strsplit(country, split=",")[[ 1 ]])
}
``````
``````movie_df <- netflix[netflix[["type"]] == "Movie",]
countries <- NULL
for (country in movie_df[["country"]]){
countries <- c(countries, get_countries(country))
}
movie_tibble <- as_tibble(table(countries))
``````
``````movie_pg_13_df <- movie_df[movie_df[["PG-13"]] == 1 , ]
countries <- NULL
for (country in movie_pg_13_df[["country"]]){
countries <- c(countries, get_countries(country))
}
movie_pg_13_tibble <- as_tibble(table(countries))
country_movie <- merge(movie_tibble, movie_pg_13_tibble, by.x = "countries", by.y="count
ries", all.x=T)
colnames(country_movie) <- c("Country", "Movie_num", "PG-13_movie_num")
country_movie[is.na(country_movie[["PG-13_movie_num"]]), "PG-13_movie_num"] = 0
country_movie <- country_movie[order(country_movie\$`PG-13_movie_num`, decreasing = T),]
country_movie
``````
``````Country
<chr>
``````
``````Movie_num
<int>
``````
``````PG-13_movie_num
<dbl>
``````
``````182 United States 2364 362
``````
``````104 United States 388 71
``````
``````181 United Kingdom 382 51
``````
``````103 United Kingdom 152 33
``````
``````33 France 155 27
``````
``````19 Canada 132 25
``````
``````34 Germany 104 24
``````
``````22 China 55 9
``````
``````8 Australia 33 8
``````
``````128 France 148 8
``````
``````1-10 of 187 rows
``````

Problem 2

``````Previous 1 2 3 4 5 Next
``````

The file video_game_data.csv contains a variety of data about video games. Read this data into a tibble and convert it to be in tidy format.

``````video_games <- as_tibble(read.csv("video_game_data.csv"))
spread(video_games, key = "details", value = "value")
``````
``````VideoGame
<chr>
``````
``````client
<chr>
``````
``````onsale
<chr>
``````
``````price
<chr>
``````
``````publish_year
<chr>
``````
``````sales
<chr>
``````
``````big friend war PC FALSE 39.99 2020 20
``````
``````Flying Mario PC FALSE 39.99 2021 20
``````
``````Human Falls4 Mobile TRUE 6.99 2019 3025
``````
``````3 rows
``````

Problem 3

The file orders.csv contains the quantities of clothing items purchased by various companies. Read this data into a tibble and then convert it to be in tidy format.

``````orders <- as_tibble(read.csv("Orders.csv"))
orders_tibble <- aggregate(orders[, c( 2 : 10 )], by=list(orders\$Company), sum)
gather(orders_tibble, type, value, -Group.1)
``````
``````Group.
<chr>
``````
``````type
<chr>
``````
``````value
<int>
``````
``````Company A Hat.Small 753
``````
``````Company B Hat.Small 285
``````
``````Company C Hat.Small 475
``````
``````Company D Hat.Small 803
``````
``````Company E Hat.Small 532
``````
``````Company A Hat.Medium 793
``````
``````Company B Hat.Medium 487
``````
``````Company C Hat.Medium 495
``````
``````Company D Hat.Medium 524
``````
``````Company E Hat.Medium 659
``````
``````1-10 of 45 rows
``````

Problem 4

The file 10_year_sales.xlsx contains weekly sales data for a 10 year period by city. Note that there are 10 tabs in this file.

``````a. Use the tidyverse read_excel function to read in each of these tabs and consolidate them into a single
tibble that is in a tidy format.
``````
``````library (tidyverse)
``````
``````get_df <- function (file, sheet){
year <- rep(as.numeric(sheet), nrow(df)- 1 )
month <- df[ 2 :nrow(df), 1 ]
day <- df[ 2 :nrow(df), 2 ]
for (col in ( 3 :ncol(df))){
if (col %% 2 == 1 ){
city <- rep(colnames(df)[col], nrow(df)- 1 )
price <- df[ 2 :nrow(df), col]
quantity <- df[ 2 :nrow(df), col+ 1 ]
temp <- tibble(year, month, day, city, price, quantity)
colnames(temp) <- c("year", "month", "day", "city", "price", "quantity")
if (col == 3 ){
result <- temp
} else {
result <- rbind(result, temp)
}
}
}
return (result)
}
``````
``````for (sheet in excel_sheets("10_year_sales.xlsx")){
if (sheet == "2010"){
sales <- get_df("10_year_sales.xlsx", sheet)
} else {
sales <- rbind(sales, get_df("10_year_sales.xlsx", sheet))
}
}
``````
``````## New names:
``````

#### ## * “ -> …

``````## New names:
``````

## * …

sales

``````year
<dbl>
``````
``````month
<chr>
``````
``````day
<chr>
``````
``````city
<chr>
``````
``````price
<chr>
``````
``````quantity
<chr>
``````
``````2010 1 1 Noblesville 7.6498290378267 2529.
``````
``````2010 1 8 Noblesville 12.8198861460368 2342.
``````
``````2010 1 15 Noblesville 10.009212503725 2403.
``````
``````2010 1 22 Noblesville 12.5136330390549 2264.
``````
``````2010 1 29 Noblesville 11.8081908578277 2312.
``````
``````2010 2 5 Noblesville 6.14827970115427 2582.
``````
``````2010 2 12 Noblesville 9.37422641842372 2448.
``````
``````2010 2 19 Noblesville 12.8221199240939 2306.
``````
``````2010 2 26 Noblesville 9.2498885747989 2413.
``````
``````2010 3 5 Noblesville 12.1420270377188 2338.
``````
``````1-10 of 1,380 rows Previous 1 2 3 4 5 6 ... 138 Next
``````
``````b. Generate a tibble showing the total revenue (price times quantity) for each city from the years 2015 through
2019.
``````
``````select_sales <- sales[sales[["year"]] > 2014 , ]
select_sales[["revenue"]] <- as.numeric(select_sales[["price"]]) * as.numeric(select_sal
es[["quantity"]])
city_revenue <- aggregate(select_sales[["revenue"]], by=list(select_sales[["city"]]), su
m, na.rm=T)
colnames(city_revenue) <- c("city", "revenue")
city_revenue
``````
``````city
<chr>
``````
``````revenue
<dbl>
``````
``````Bloomington 10191661
``````
``````Fort Wayne 5447501
``````
``````Muncie 10767048
``````
``````Noblesville 9447614
``````
``````South Bend 8085843
``````
``````5 rows
``````

Problem 5

You are to use the tables found in Orders – Data Integration.csv and Product Costs – Data Integration.csv to generate a report of the total profit from each of your customers. Perform the following steps:

``````a. Read in the two files and determine if either (or both) of the files are not in Tidy format. (5 points)
``````
``````orders <- as_tibble(read.csv("Orders - Data Integration.csv"))
products <- as_tibble(read.csv("Product Costs - Data Integration.csv"))
``````

Summarize below which table(s) are not in Tidy format and why:

``````The Product Costs - Data Integration table is not in Tidy format because the column headers 2019 Cost
and 2020 Cost contain values.
b. If either table is not in Tidy format, correct it. Also, convert the costs from character to numeric types.
``````
``````colnames(products) <- c("Product.Code", "2019", "2020")
products <- gather(products, key="Year", value="Cost", "2019", "2020")
products
``````
``````Product.Code
<chr>
``````
``````Year
<chr>
``````
``````Cost
<dbl>
``````
``````Previous 1 2 3 4 Next
``````
``````Previous 1 2 Next
``````
``````Product.Code
<chr>
``````
``````Year
<chr>
``````
``````Cost
<dbl>
``````
``````X1388 2019 540.
``````
``````ZZ52 2019 196.
``````
``````X1189 2019 244.
``````
``````Y12 2019 322.
``````
``````HCK15 2019 335.
``````
``````ZY13 2019 303.
``````
``````ZZ522 2019 448.
``````
``````HC155 2019 282.
``````
``````BW243 2019 367.
``````
``````GG2554 2019 428.
``````

1-10 of 32 rows

``````c. Join the two tables (be careful which type of join to use)
``````
``````total <- merge(orders, products, by.x=c("Year.Purchased", "Product.Code"), by.y=c("Year"
, "Product.Code"))
total
``````
``````Year.Purchased
<int>
``````
``````Product.Code
<chr>
``````
``````Order_N...
<int>
``````
``````Customer
<chr>
``````
``````Line.Item
<int>
``````
``````Quantity
<int>
``````
``````Unit.Price
<int>
``````
``````Cost
<dbl>
``````
``````2019 A33 1000 Customer A 2 63 698 467.
``````
``````2019 BW243 1000 Customer A 3 75 483 367.
``````
``````2019 X1189 1000 Customer A 1 65 314 244.
``````
``````2019 X1388 1000 Customer A 4 20 684 540.
``````
``````2019 X1388 1003 Customer D 1 62 684 540.
``````
``````2019 Y12 1000 Customer A 5 82 474 322.
``````
``````2020 BW243 1001 Customer B 2 29 483 380.
``````
``````2020 GG2554 1001 Customer B 3 70 595 447.
``````
``````2020 GG2554 1002 Customer C 5 98 595 447.
``````
``````2020 HC155 1002 Customer C 2 73 404 303.
``````

1-10 of 16 rows

``````d. Create a new column in the joined table that is equal to the total profit for that line item (note: profit equals
the price that you charge for an item minus the cost that you have to pay to get that item from your
``````
``````supplier) (5 points)
``````
``````total[["total_profit"]] <- total[["Quantity"]] * (total[["Unit.Price"]] - total[["Cost"
]])
total
``````
``````e. Generate a table that contains the total profit received from each of your customers. Print out that table in a
nice format using the knitr::kable() function. (5 points)
``````
``````library (knitr)
tables <- aggregate(total[["total_profit"]], by=list(total[["Customer"]]), sum)
colnames(tables) <- c("Customer", "Customer Profit")
kable(tables)
``````

Customer Customer Profit

Customer A 43006.

Customer B 23080.

Customer C 68533.

Customer D 8905.

``````f. Extra credit: Install the formattable library to display the Customer Profit column in your table in a
currency format
``````
``````#install.packages("formattable")
library (formattable)
currency(tables\$`Customer Profit`)
``````
``````Previous 1 2 Next
``````
``````Year.Purchased
<int>
``````
``````Product.Code
<chr>
``````
``````Order_N...
<int>
``````
``````Customer
<chr>
``````
``````Line.Item
<int>
``````
``````Quantity
<int>
``````
``````Unit.Price
<int>
``````
``````Cost
<dbl>
``````
``````2019 A33 1000 Customer A 2 63 698 467.
``````
``````2019 BW243 1000 Customer A 3 75 483 367.
``````
``````2019 X1189 1000 Customer A 1 65 314 244.
``````
``````2019 X1388 1000 Customer A 4 20 684 540.
``````
``````2019 X1388 1003 Customer D 1 62 684 540.
``````
``````2019 Y12 1000 Customer A 5 82 474 322.
``````
``````2020 BW243 1001 Customer B 2 29 483 380.
``````
``````2020 GG2554 1001 Customer B 3 70 595 447.
``````
``````2020 GG2554 1002 Customer C 5 98 595 447.
``````
``````2020 HC155 1002 Customer C 2 73 404 303.
``````
``````1-10 of 16 rows
``````
``````## [1] \$43,006.18 \$23,080.80 \$68,533.48 \$8,905.
``````

Problem 6

For this problem, you are to use the tables found in files Roster.csv, ISE535 Spring 2021.csv, ISE599 Fall 2020.csv, and ISE535 Fall 2021.csv

Perform the following steps:

``````a. Read the four csv files into tibbles and combine them into a single table that is in tidy format (call it
gradebook). Load the lubridate library and use it to convert the Date of Birth attribute to be a date type
and covert the grade to a factor type. (15 points)
``````
``````roster <- as_tibble(read.csv("Roster.csv"))
``````
``````colnames(roster) <- roster[ 1 , ]
roster <- roster[ 2 :nrow(roster), ]
``````
``````colnames(ise535_spring) <- c("Student ID", "Spring Semester 2021")
ise535_spring <- ise535_spring[ 2 :nrow(ise535_spring),]
``````
``````colnames(ise599_fall) <- c("Student ID", "Fall Semester 2020")
ise599_fall <- ise599_fall[ 2 :nrow(ise599_fall),]
``````
``````colnames(ise535_fall) <- c("Student ID", "Fall Semester 2021")
ise535_fall <- ise535_fall[ 2 :nrow(ise535_fall),]
``````
``````gradebook <- merge(roster, ise535_spring, all.x = T)
``````
``````library (lubridate)
``````
``````##
## Attaching package: 'lubridate'
``````
``````## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
``````
``````gradebook[["Date of Birth"]] <- mdy(gradebook[["Date of Birth"]])
``````
``````b. Add a column that contains the grade point value for the class (A = 4, B = 3, C = 2) and then use that
column to display the GPA for each student. Hint: look at the Tidyverse case_when command. (10 points)
``````
``````get_point <- function (grade){
return ( 0 )
} else if (grade == "A"){
return ( 4 )
} else if (grade == "B"){
return ( 3 )
} else if (grade == "C"){
return ( 2 )
} else {
return ( 0 )
}
}
``````
``````grade_book_long <- gather(gradebook, key="semester", value="level", "Spring Semester 202
1", "Fall Semester 2020", "Fall Semester 2021")
points = NULL
for (row in c( 1 :nrow(grade_book_long))){
}
mean)
colnames(results) <- c("Student Name", "GPA")
results
``````
``````Previous 1 2 3 Next
``````
``````Student ID
<chr>
``````
``````Student Name
<chr>
``````
``````Home Country
<chr>
``````
``````Date of Birth
<date>
``````
``````Spring Semester 2021
<fct>
``````
``````Fall Seme
<fct>
``````
``````Student ID
<chr>
``````
``````Student Name
<chr>
``````
``````Home Country
<chr>
``````
``````Date of Birth
<date>
``````
``````Spring Semester 2021
<fct>
``````
``````Fall Seme
<fct>
``````
``````161677 Danika Floyd USA 1994-08-21 A A
``````
``````180376 Helena Gibbs USA 1994-06-21 A B
``````
``````239998 Frankie Shields Japan 1992-03-
``````
``````246074 Elliott Watts Mexico 1998-03-
``````
``````265902 Dario Holt India 1996-03-
``````
``````327654 Jadyn Henry India 1992-11-
``````
``````383626 Keely Beasley USA 1995-10-13 B C
``````
``````457169 Grayson Riddle Mexico 1990-02-21 A
``````
``````476440 Gary Cohen India 1990-05-15 B
``````
``````542015 Alfred Lang India 1994-11-18 A B
``````

1-10 of 24 rows | 1-6 of 7 columns

##### NA
``````Previous 1 2 3 Next
``````
``````Student Name
<chr>
``````
##### GPA
``````<dbl>
``````
``````Student Name
<chr>
``````
##### GPA
``````<dbl>
``````
``````Jadyn Henry 0.
``````
``````Janiyah Downs 0.
``````
``````Tyree Levine 0.
``````
``````Alfred Lang 2.
``````
``````Charlie Franklin 0.
``````
``````Ciara Hendrix 2.
``````
``````Danika Floyd 3.
``````
``````Dario Holt 0.
``````
``````Darryl Rodriguez 0.
``````
``````Elliana Patton 2.
``````

1-10 of 24 rows