big data | report | 代写java | scheme代做 | oop代写 | hadoop代写 – Distributed and Cloud Computing Assignment 1

Distributed and Cloud Computing Assignment 1

big data | report | 代写java | 分布式代写 | hadoop代写 – 这道题目是利用hadoop进行的编程代写任务, 涵盖了big data/report/java/分布式/hadoop等程序代做方面

report代写 代做report

Due date: 15 March, 2023 (18:00) Write mapreduce programs that run on the big data Cluster to solve the following problems. Hints: Practice Labs 1 and 2 for a warm-up exercise of programming on the cluster. See Appendix A to prepare for your deliverables. A possible marking scheme is presented in Appendix B.

  1. In the file vaccination-rates-over-time-by-age.txt on Moodle, each row has the following format: date, age-group, gender, Sinovac 1st dose, Sinovac 2nd dose, Sinovac 3rd dose, Sinovac 4th dose, Sinovac 5th dose, Sinovac 6th dose, BioNTech 1st dose, BioNTech 2nd dose, BioNTech 3rd dose, BioNTech 4th dose, BioNTech 5th dose, BioNTech 6th dose A row records the number of a gender of an age group taken the ith dose of Sinovac and the jth BioNTech vaccines on the date, where i = 1,…, 6 and j = 1,…, 6. For example, consider 2021 – 02 – 22,30-39,M,1,0,0,0,0,0,0,0,0,0,0,0. It states that on 2021- 02 – 22, there is 1 male who take the 1 st dose of Sinovac vaccine. Because of data entry problem, the age group 12 – 19 is transcribed as 19 – Dec in the txt file. Write a mapreduce program to replace all 19 – Dec with 12 – 19 in vaccination-rates-over-time- by-age.txt. Output the result to a new file named vaccination-rates-over-time-by-age-v2.txt for the subsequent questions. Hints: Use a Had oop command to output the content of an output directory to the screen and use a unix command to redirect the screen to a file: hadoop fs cat output-dir/* > vaccination-rates-over-time-by-age-v2.txt (10 marks)
  2. Compute the total number of the ith dose of Sinovac (i = 1,…, 6) for each age group during 2021- 02 – 22 to 2023- 01 – 15. Similarly, compute the total number of the jth dose of BioNTech for each age group during that period. (20 marks)
  1. To study the vaccination around the fifth wave of Covid-19 in Hong Kong, compute the total number of people who received Sinovac (regardless of the number of dose) in 12-2021, 01-2022, 02 – 2022 and 03 – 2022, respectively. Similarly, compute the total for BioNTech in those months. (20 marks)
  2. Compute the numbers of days in 12-2021, 01-2022, 02-2022 and 03-2022, respectively, that have vaccination records. (20 marks)
  3. Compute the differences of the total number of vaccinations between each two consecutive months. For example, suppose the total number of vaccinations of 03-2021 is 484,400 and that of 04- 2021 is 908,693. The difference is 424,293. (30 marks)

Appendix A. Deliverables

  1. For each question, you are required to submit the java source, jar file, and a README that contains the instructions/commands to run your program.
  2. A brief pdf report that contains (i) the screenshots of the input and output of the program and (ii) an explanation of the major steps of your program.
  3. The output file of your program.
  4. Put ALL your deliverables into ONE zip file to Moodle on or before the deadline. Appendix B. Marking scheme of each question: Completeness of the deliverables 10% Correct outputs 10% Runnable jar 10% Clear mapper logic 30% Clear reducer logic 30% Efficient key-value pair 10% *Smart solutions can be awarded 10 marks bonus.