Assignment 2: Advanced Map-Reduce, Similar
Network | network代做 | Algorithm | java代写 | oop代做 | project | hadoop | assignment – 该题目是一个常规的hadoop的练习题目代写, 是比较典型的Network/network/Algorithm/java/oop/hadoop等代写方向, 这是值得参考的assignment代写的题目
Items and Data Streams
Formative, Weight (15%), Learning objectives (1, 2 ,3),
Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)
Due date:11 : 59pm, 11 April, 2022
1 Overview
This assignment must be done in groups consisting ofTWOstudents. You MUST use the created groups for assignment 2, accessible from the People tab on MyUni to enrol yourself to a group before submission. Submissions made without enrolling in a group on MyUni may NOT be marked. Every group need to make ONLY ONE submission. If you have problems/questions regarding grouping or require assistance, please use the discussion forum, or email the teaching assistants directly.
Exercise 1S-curve (exercise 3.4.1 in Leskovec, Rajaraman and Ullman) ( points)
Write a program that evaluates and plots the S-curve 1(1sr)bfor the valuess[0,1], for the following values of r and b (one separate evaluation for each of the three):
- r=3 and b=10.
- r=6 and b=20.
- r=5 and b=50.
Note for this exercise you are allowed to use either of the following programming languages: C++, java or Python. Note the readibility of the plots is important here, i.e. Make sure you have approperiate axes names and range of values in your visualisations. Please submit both your code and include the visualisations in your report.
Exercise 2Filtering Streams (similar to Exercises of 4.3 in Leskovec, Rajara- man and Ullman) (15 points)
Solve the following problems:
- For the situation of our running example of Section 4.3.1 with changed conditions (10 billion bits, 2 billion members of the setS), calculate the false-positive rate when using three hash functions. Do the same for four hash functions.
- As a function ofn, the number of bits andmthe number of members in the setS, what number of hash functions minimizes the false-positive rate?
Exercise 3Map-Reduce: Friend Recommendation System (40 points)
Write a MapReduce program in Had oop that implements a People You Might Know social network friendship recommendation algorithm. The key idea is that if two people have a lot of mutual friends, then the system will recommend them to one another to get connected. You will have to run the program in Hadoop, using the system setup completed in Assignment 1 in order to receive points for this exercise.
Input File: Download the input file from the assignment page, to your Friend Recommendation System. This input file contains the adjacency list of friends and has multiple lines in the following format:
Here,
Algorithm: You will need to design an Algorithm such that, for each user U, the algorithm recommendsN= 10 users who are not already friends with U, but have the most number of mutual friends in common with U.
Expected Output: The output has to contain one line per user in the following format:
where
is a comma separated list of unique IDs corresponding to the algorithms recom- mendation of people that
Note: It is possible to solve this question with a single MapReduce job. But if your solution requires multiple map reduce jobs, then that is fine too.
Your submission is expected to include the following:
- Include all your source code, in the exact system setup that you have run the program. This includes all the log files, input/ outputs etc.
- Write-up: describe the algorithm you have designed to solve this problem in enough details to understand your approach. Note this is different to writing comments about your code. It should describe the high level idea of your algorithm. It may include diagrams, graphs etc. for ease of understanding.
- Results: in your write-up include the recommendations for the users with following IDs: 924, 8941, 8942, 9019, 9020, 9021, 9022, 9990, 9992, 9993.
Exercise 4Data streams (10 points)
Follow the scenario 1 and 2 below and answer the related questions regarding the Flajolet-Martin Algorithm. The hash functions are of the formh(x) =ax+b mod 32 for some a and b. You should treat the result as a 5-bit binary integer
- Scenario 1: Suppose a data stream consists of the integers 3, 1, 4, 6, 5, 9.
Determine (a) the maximum tail length for each stream element and (b)
the resulting estimate of the number of distinct elements for the hash
functions in Question 1-3 below.
- Question 1: Hash function:h(x) = (2x+ 1) mod 32
- Question 2: Hash function:h(x) = (3x+ 7) mod 32
- Question 3: Hash function:h(x) = 4xmod 32
- Scenario 2: Suppose a data stream consists of the integers 4, 5, 6, 7, 10, 15.
Determine (a) the maximum tail length for each stream element and (b)
the resulting estimate of the number of distinct elements for the hash
functions in Question 4-6 below.
- Question 4: Hash function:h(x) = (6x+ 2) mod 32
- Question 5: Hash function:h(x) = (2x+ 5) mod 32
- Question 6: Hash function:h(x) = 2xmod 32
Exercise 5Summary of 3.6 and 3.7 (20 points)
For this exercise you have to read Section 3.6 and 3.7 in Leskovec, Rajaraman, Ullman (third edition, 2020).
- Summarize the content of 3.6 in your own words (approx. 600 words).
- Summarize the content of 3.7 in your own words (approx. 600 words).
Note: it is expected that you demonstrate understanding of the above Sections. You may do so by explaining the content in your own words (i.e. paraphrasing) and using digrams/ figures the convey the concept.
2 General assignment submission guidelines
As stated in the beginning of the assignment, work MUST be submited using the groups interface on MyUni, and a single submission per group, ONLY. The submissions will include the following, at minimum:
- PDF file of your solutions for theoretical exercises, descriptions of the coding exercises or the results as requested per each exercise above.
- all source files, in the exact original form that is used on your system to run the program. This includes your code, all the hadoop log files and all related project files.
- a README.txt file containing instructions to run the code, the two group members names, student IDs, and email addresses.
- the submissions that do not follow the above guidlines may lose points accordingly.
Please do not hesitate to reach out using the discussion forum, workshops, or the contact details of the teaching assistants on the home page of MyUni, should you have any questions or concerns.