代写project | 大数据 | big data代写 | mapreduce – Lab 3: A Simple MapReduce-Style Wordcount Application

Lab 3: A Simple MapReduce-Style Wordcount Application

代写project | 大数据 | big data代写 | mapreduce – 这个项目是project代写的代写题目,涉及了大数据相关的内容

project代写 代写project

CMPSC 473, SUMMER 2022

Released on July 21, 2022, due on August 04, 2022, ll:59:59pm

Raj Pandey and Bhuvan Urgaonkar

1 Purpose and Background

This project is designed to give you experience in writing multi-threaded programs by

implementing a simplified MapReduce-style wordcount application. By working on this project:

  • You will learn to write multi-threaded code that correctly deals with race conditions.
  • You will carry out a simple performance evaluation to examine the performance impact of (i) the degree of parallelism in the mapper stage and (ii) the size of the shared buffer which the two stages of your application will use to communicate.
Inp ut
File
read
fappers Buffer
produce
Reducer
consume write
Output
File
Figure 1: Overview of our Mapreduce-style multi-threaded wordcount application.

The wordcount application takes as input a text file and produces as output the counts for all uniquely occurring words in the input file arranged in an alphabetically increas ing order. We will assume that the words within our input files will only contain letters of the English alphabet and the digits 0-9 (i.e., no punctuation marks or other special characters). Our wordcount will consist of two stages. The first stage, called "mapper,"