R语言代写:Analysing London using Open Data

R语言代写/数据分析代写:使用R语言进行数据分析,是一个R语言代写相关的project
Analysing London using Open Data
1. Correlation Analysis
Calculate the Pearson, Spearman and Kendall correlation between the “Average GCSE score” and “Turnout
at Mayoral election 2012” per wards, and “Employment rate” and “Happiness score” per borough.
# reading files
ward <- read.csv(“ward-profiles-excel-version.csv”, fileEncoding = “iso-8859-1”,
header = T, sep = “,”, stringsAsFactors=F, check.names=T)
borough <- read.csv(“london-borough-profiles.csv”, fileEncoding = “iso-8859-1”,
header = T, sep = “,”, stringsAsFactors=F, check.names=T)
Ward dataset
Column 54: “Average.GCSE.capped.point.scores. . . 2014”
Column 67: “Turnout.at.Mayoral.election. . . 2012”
Borough dataset
Column 29: “Employment.rate. . . . . . 2014.”
Column 75: “Happiness.score.2011.14..out.of.10.”
cor(as.numeric(ward[,54]), as.numeric(ward[,67]), method = “pearson”, use = “complete.obs”)
## Warning in is.data.frame(y): NAs introduced by coercion
## [1] 0.5410463
cor(as.numeric(ward[,54]), as.numeric(ward[,67]), method = “spearman”, use = “complete.obs”)
## Warning in is.data.frame(y): NAs introduced by coercion
## [1] 0.5240463
1
cor(as.numeric(ward[,54]), as.numeric(ward[,67]), method = “kendall”, use = “complete.obs”)
## Warning in is.data.frame(y): NAs introduced by coercion
## [1] 0.3706765
cor(as.numeric(borough[,29]), as.numeric(borough[,75]), method = “pearson”, use = “complete.obs”)
## [1] 0.3498277
cor(as.numeric(borough[,29]), as.numeric(borough[,75]), method = “spearman”, use = “complete.obs”)
## [1] 0.4452778
cor(as.numeric(borough[,29]), as.numeric(borough[,75]), method = “kendall”, use = “complete.obs”)
## [1] 0.3376605
2. Regression Analysis
Perform regression analysis between the same variables (as used in exercise 1) per ward and per borough.
fit_ward <- lm(as.numeric(ward[,54]) ~ as.numeric(ward[,67]))
## Warning: NAs introduced by coercion
fit_borough <- lm(as.numeric(borough[,29]) ~ as.numeric(borough[,75]))
3. Plotting
Plot the results of the regression analysis using the ggplot2 command discuss during the lecture.
library(“ggplot2”)
## Warning: package ‘ggplot2’ was built under R version 3.2.4
ggplot(ward, aes(x = as.numeric(ward$Average.GCSE.capped.point.scores…2014),
y = as.numeric(ward$Turnout.at.Mayoral.election…2012))) +
geom_point(shape=1) + geom_smooth(method=lm) + xlab(“Average GCSE score”) +
ylab(“Turnout at Mayoral election 2012”)
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
2
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
20
30
40
50
275 300 325 350 375 400
Average GCSE score
Turnout at Mayoral election 2012
ggplot(borough, aes(x = as.numeric(borough$Employment.rate……2014.),
y = as.numeric(borough$Happiness.score.2011.14..out.of.10.))) +
geom_point(shape=1) + geom_smooth(method=lm) +
xlab(“Employment rate”) + ylab(“Happiness score”)
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
3
6.0
6.5
7.0
7.5
60 65 70 75 80
Employment rate
Happiness score
4. Discussion of the Results
Starting from the results briefly discuss your findings. In particular think about the problem of having only
correlation and not causation in the results you are observing.
4

发表评论

电子邮件地址不会被公开。 必填项已用*标注