COMP4057 Distributed and Cloud Computing
big data作业 | report | java代写 | assignment代写 | 代写lab – 该题目是一个常规的big data的练习题目代写, 涵盖了big data/report/java等程序代做方面, 这个项目是lab代写的代写题目
assignment 1
Answer the following 4 questions. The major steps are show in Appendix A. See Appendix B to prepare for your deliverable.
- Each line of bible.txt is given as follows.
Book Chapter:Verse[tab]Content[Line return]
An example is shown below.
Genesis 1:1 In the beginning God created the heaven and the earth.
Gensis is the book name. Chapter is 1 and Verse is 1. The content is In the beginning God
created the heaven and the earth.
Write a method that takes a line of bible.txt as input (assume it is a String object) and returns
a Line Object, where Line contains the attributes book, chapter, verse and content.
(20 marks)
- Write down your pseudocode to output the books that have more than 100 chapters. Implement your pseudocode in Java. See Appendix to prepare for your deliverable.
(20 marks)
- Write down your pseudocode to determine if there are any words (not case sensitive and not stopword) in Bible that occurrences more than 3000 times. Implement your pseudocode in Java. See Appendix B to prepare for your deliverable. ( 3 0 marks)
- Write down your pseudocode to output the shortest verse (in terms of words, including stop words) of each book of Bible. Implement your pseudocode in Java. See Appendix to prepare for your deliverable.
(30 marks)
END
Appendix A. Major steps
-
Write a mapreduce program that runs on the big data Cluster to solve the problem. Practice lab 1 for a warm-up exercise of programming on the cluster.
-
Download bible.txt.zip from Moodle. Unzip it to get bible.txt, which contains the text of the Bible. Transfer it to a CSR machine of the cluster and copy it to HDFS.
-
You may convert all alphabets to lower case for the ease of processing.
-
Download Stopwords.java. Stopwords.isOneOfThem(String in) returns true if in is a stopword. In Question 3, stopwords do need to be considered. That is, stopwords can be filtered.
-
To count the occurrences of words and filter the stopwords in Bible, convert all words into lower case.
-
Filter the Digits (0-9) and punctuations in Bible. That is, do not process commas, fullstops, numbers, etc.
Appendix B. Deliverables
- The java sources, runnable jar file, and a README that contains the instructions/commands to
run your program.
- A brief pdf report that contains (i) the screenshots of the input and output of the program and (ii) an explanation of the major steps of your program.
- The output file of your program.
- Put all your deliverables into one zip file to Moodle on or before the deadline.
A possible marking scheme:
Completeness of the deliverables 10%
Correct outputs 10%
Runnable jar 10%
Clear mapper logic 30 %
Clear reducer logic 30 %
Efficient key-value pair 10%
Smart solutions can be awarded 10 marks bonus.