Network代做 | 代写network | Algorithm | java | oop代做 | project | hadoop | assignment – Assignment 2: Advanced Map-Reduce, Similar

Assignment 2: Advanced Map-Reduce, Similar

Network代做 | 代写network | Algorithm | java | oop代做 | project | hadoop | assignment – 该题目是一个常规的hadoop的练习题目代写, 涵盖了Network/network/Algorithm/java/oop/hadoop等程序代做方面, 这是值得参考的assignment代写的题目

hadoop代写 代做hadoop 大数据代写 代做大数据 big data代做

Items and Data Streams

Formative, Weight (15%), Learning objectives (1, 2 ,3),
Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)
Due date:11 : 59pm, 11 April, 2022

1 Overview

This assignment must be done in groups consisting ofTWOstudents. You MUST use the created groups for assignment 2, accessible from the People tab on MyUni to enrol yourself to a group before submission. Submissions made without enrolling in a group on MyUni may NOT be marked. Every group need to make ONLY ONE submission. If you have problems/questions regarding grouping or require assistance, please use the discussion forum, or email the teaching assistants directly.

Exercise 1S-curve (exercise 3.4.1 in Leskovec, Rajaraman and Ullman) ( points)

Write a program that evaluates and plots the S-curve 1(1sr)bfor the valuess[0,1], for the following values of r and b (one separate evaluation for each of the three):

  1. r=3 and b=10.
  2. r=6 and b=20.
  3. r=5 and b=50.

Note for this exercise you are allowed to use either of the following programming languages: C++, java or Python. Note the readibility of the plots is important here, i.e. Make sure you have approperiate axes names and range of values in your visualisations. Please submit both your code and include the visualisations in your report.

Exercise 2Filtering Streams (similar to Exercises of 4.3 in Leskovec, Rajara- man and Ullman) (15 points)

Solve the following problems:

  1. For the situation of our running example of Section 4.3.1 with changed conditions (10 billion bits, 2 billion members of the setS), calculate the false-positive rate when using three hash functions. Do the same for four hash functions.
  2. As a function ofn, the number of bits andmthe number of members in the setS, what number of hash functions minimizes the false-positive rate?

Exercise 3Map-Reduce: Friend Recommendation System (40 points)

Write a MapReduce program in Had oop that implements a People You Might Know social network friendship recommendation algorithm. The key idea is that if two people have a lot of mutual friends, then the system will recommend them to one another to get connected. You will have to run the program in Hadoop, using the system setup completed in Assignment 1 in order to receive points for this exercise.

Input File: Download the input file from the assignment page, to your Friend Recommendation System. This input file contains the adjacency list of friends and has multiple lines in the following format:

Here, is a unique integer ID corresponding to a unique user and is a comma separated list of unique IDs corresponding to the friends of the user with the unique ID. Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B then B is also friend with A.

Algorithm: You will need to design an Algorithm such that, for each user U, the algorithm recommendsN= 10 users who are not already friends with U, but have the most number of mutual friends in common with U.

Expected Output: The output has to contain one line per user in the following format:

whereis a unique ID corresponding to a user and

is a comma separated list of unique IDs corresponding to the algorithms recom- mendation of people thatmight know, ordered in decreasing number of mutual friends. Even if a user has less than 10 second-degree friends, output all of them in decreasing order of the number of mutual friends. If there are recommended users with the same number of mutual friends, then output those user IDs in numerically ascending order. Also, please provide a description of how you are going to use MapReduce jobs to solve this problem.

Note: It is possible to solve this question with a single MapReduce job. But if your solution requires multiple map reduce jobs, then that is fine too.

Your submission is expected to include the following:

  • Include all your source code, in the exact system setup that you have run the program. This includes all the log files, input/ outputs etc.
  • Write-up: describe the algorithm you have designed to solve this problem in enough details to understand your approach. Note this is different to writing comments about your code. It should describe the high level idea of your algorithm. It may include diagrams, graphs etc. for ease of understanding.
  • Results: in your write-up include the recommendations for the users with following IDs: 924, 8941, 8942, 9019, 9020, 9021, 9022, 9990, 9992, 9993.

Exercise 4Data streams (10 points)

Follow the scenario 1 and 2 below and answer the related questions regarding the Flajolet-Martin Algorithm. The hash functions are of the formh(x) =ax+b mod 32 for some a and b. You should treat the result as a 5-bit binary integer

  1. Scenario 1: Suppose a data stream consists of the integers 3, 1, 4, 6, 5, 9.
Determine (a) the maximum tail length for each stream element and (b)
the resulting estimate of the number of distinct elements for the hash
functions in Question 1-3 below.
  • Question 1: Hash function:h(x) = (2x+ 1) mod 32
  • Question 2: Hash function:h(x) = (3x+ 7) mod 32
  • Question 3: Hash function:h(x) = 4xmod 32
  1. Scenario 2: Suppose a data stream consists of the integers 4, 5, 6, 7, 10, 15.
Determine (a) the maximum tail length for each stream element and (b)
the resulting estimate of the number of distinct elements for the hash
functions in Question 4-6 below.
  • Question 4: Hash function:h(x) = (6x+ 2) mod 32
  • Question 5: Hash function:h(x) = (2x+ 5) mod 32
  • Question 6: Hash function:h(x) = 2xmod 32

Exercise 5Summary of 3.6 and 3.7 (20 points)

For this exercise you have to read Section 3.6 and 3.7 in Leskovec, Rajaraman, Ullman (third edition, 2020).

  1. Summarize the content of 3.6 in your own words (approx. 600 words).
  2. Summarize the content of 3.7 in your own words (approx. 600 words).

Note: it is expected that you demonstrate understanding of the above Sections. You may do so by explaining the content in your own words (i.e. paraphrasing) and using digrams/ figures the convey the concept.

2 General assignment submission guidelines

As stated in the beginning of the assignment, work MUST be submited using the groups interface on MyUni, and a single submission per group, ONLY. The submissions will include the following, at minimum:

  • PDF file of your solutions for theoretical exercises, descriptions of the coding exercises or the results as requested per each exercise above.
  • all source files, in the exact original form that is used on your system to run the program. This includes your code, all the hadoop log files and all related project files.
  • a README.txt file containing instructions to run the code, the two group members names, student IDs, and email addresses.
  • the submissions that do not follow the above guidlines may lose points accordingly.

Please do not hesitate to reach out using the discussion forum, workshops, or the contact details of the teaching assistants on the home page of MyUni, should you have any questions or concerns.