Create & Execute First Hadoop MapReduce Project in Eclipse

This article will provide you the step-by-step guide for creating Hadoop MapReduce Project in Java with Eclipse. The article explains the complete steps, including project creation, jar creation, executing application, and browsing the project result.

Let us now start building the Hadoop MapReduce WordCount Project.

Hadoop MapReduce Project in Java With Eclipse

Prerequisites:

  1. Hadoop 3: If Hadoop is not installed on your system, then follow the Hadoop 3 installation guide to install and configure Hadoop.
  2. Eclipse: Download Eclipse
  3. Java 8: Download Java

Here are the steps to create the Hadoop MapReduce Project in Java with Eclipse:

Step 1. Launch Eclipse and set the Eclipse Workspace.launching of eclipse in Hadoop MapReduce project

Step 2. To create the Hadoop MapReduce Project, click on File >> New >> Java Project.create Hadoop MapReduce project in java

Provide the Project Name:giving name of Hadoop MapReduce project

Click Finish to create the project.complete Hadoop MapReduce project

Step 3. Create a new Package right-click on the Project Name >> New >> Package.create package in Hadoop MapReduce project

Provide the package name:give package name in Hadoop MapReduce project

Click Finish to create the package.creation of MapReduce project

Step 4. Add the Hadoop libraries (jars).

To do so Right-Click on Project Name >>Build Path>> configure Build Path.building path to Hadoop libraries

Add the External jars.add external jars

For this go to hadoop-3.1.2>> share >> hadoop.move to share in Hadoop MapReduce Project

Now we will move to share >> Hadoop in Hadoop MapReduce Project.moving to share hadoop

A. Add the client jar files.client jar files

Select client jar files and click on Open.Hadoop MapReduce Project - client jar

B. Add common jar files.add common jar files

Select common jar files and Open.selecting common jar files

Also, add common/lib libraries.add common libraries

Select all common/lib jars and click Open.select all common libraries

C. Add yarn jar files.add yarn libraries

Select yarn jar files and then select Open.select yarn jar files in Hadoop

D. Add MapReduce jar files.add MapReduce jar files

Select MapReduce jar files.Hadoop MapReduce Project - MapReduce jar files

Click Open.

E. Add HDFS jar files.add hdfs jar files

Select HDFS jar files and click Open.Hadoop MapReduce Project - HDFS jar files

Click on Apply and Close to add all the Hadoop jar files.apply selected jar files in MapReduce project

Now, we have added all required jar files in our project.

Step 5. Now create a new class that performs the map job.

Here in this article, WordCountMapper is the class for performing the mapping task.

Right-Click on Package Name >> New >> Classcreate wordcount mapper class

Provide the class name:provide class name to wordcount mapper

Click Finish.wordcount mapper creation in MapReduce project

Step 6. Copy the below code in your class created above for the mapper.

package com.projectgurukul.wc;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.LongWritable;

public class WordCountMapper extends Mapper <LongWritable, Text, Text, IntWritable>
{
  private Text wordToken = new Text();

  public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
  {
    StringTokenizer tokens = new StringTokenizer(value.toString()); //Dividing String into tokens
    while (tokens.hasMoreTokens())
    {
      wordToken.set(tokens.nextToken());
      context.write(wordToken, new IntWritable(1));
    }
  }
}

Java code wordcount mapper

Press Ctrl+S to save the code.

Step 7. Now create another class (in the same way as we used above), for creating a class that performs the reduce job.

Here in this article, WordCountReducer is the class to perform the reduce task.wordcount reducer - Hadoop MapReduce Project

Click Finish.

Step 8. Copy the below code in your class created above for the reducer.

package com.projectgurukul.wc;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable>
{
  private IntWritable count = new IntWritable();

  public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
  {
    // gurukul [1 1 1 1 1 1....]

    int valueSum = 0;
    for (IntWritable val : values)
    {
      valueSum += val.get();
    }
    count.set(valueSum);
    context.write(key, count);
  }
}

Java code wordcount reducer

Press Ctrl+S to save the code.

Step 9. Now create the driver class, which contains the main method. Here in this article, the driver class for the project is named “WordCount”.wordcount class - hadoop MapReducer Project

Click Finish.

Step 10. Copy the below code in your driver class, which contains the main method.

package com.projectgurukul.wc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount
{
  public static void main(String[] args) throws Exception
  {
    Configuration conf = new Configuration();
    String[] pathArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (pathArgs.length < 2)
    {
      System.err.println("MR Project Usage: wordcount <input-path> [...] <output-path>");
      System.exit(2);
    }
    Job wcJob = Job.getInstance(conf, "MapReduce WordCount");
    wcJob.setJarByClass(WordCount.class);
    wcJob.setMapperClass(WordCountMapper.class);
    wcJob.setCombinerClass(WordCountReducer.class);
    wcJob.setReducerClass(WordCountReducer.class);
    wcJob.setOutputKeyClass(Text.class);
    wcJob.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < pathArgs.length - 1; ++i)
    {
      FileInputFormat.addInputPath(wcJob, new Path(pathArgs[i]));
    }
    FileOutputFormat.setOutputPath(wcJob, new Path(pathArgs[pathArgs.length - 1]));
    System.exit(wcJob.waitForCompletion(true) ? 0 : 1);
  }
}

Hadoop MapReduce Project - Java code wordcount

Press Ctrl+S to save the Code.

Step 11. Creating the Jar File of the Project

Before running created Hadoop MapReduce word count application, we have to create a jar file.

To do so Right-click on project name >> Export.exporting project jar

Select the JAR file option. Click Next.exporting Java jar file in MapReduce project

Provide the Jar file name:providing project jar file name

Click Next.exporting step - Hadoop MapReduce Project

Click Next.

Now select the class of the application entry point.

Here in this Hadoop MapReduce Project article, the class for the application entry point is the WordCount class.providing entry class name

Click Finish.

Step 12. Execute the Hadoop MapReduce word count application using the below execution command.

hadoop jar <project jar file path> <input file path> <output directory>

hadoop jar /home/gurukul/WordCount.jar /wc_input /wc_output

Here in this command,

  • <project jar file path> is the path of the jar file of the project created above.
  • <input file path> is the file in HDFS, which is input to the Hadoop MapReduce Word Count Project.
  • <output directory> is the directory where the output of the Hadoop MapReduce WordCount program is going to be stored.

command to execute MapReduce projectThis will start the execution of MapReduce jobthe execution command result

Now we have run the Map Reduce job successfully. Let us now check the result.

Step 13. Browse the Hadoop MapReduce Word Count Project Output.

The output directory of the Project in HDFS contains two files: _SUCCESS and part-r-00000

The output is present in the /part-r-00000 file.

You can browse the result using the below command.

hadoop fs -cat <output directory/part-r-00000>

hadoop fs -cat /wc_output/part-r-00000

the output of Hadoop MapReduce project

Summary

We have Successfully created the Hadoop MapReduce Project in Java with Eclipse and executed the MapReduce job on Ubuntu. If you have any doubts in any of the steps, ask in the comment section.

1 Response

  1. Ashwin says:

    Very Superb.. Good job.

Leave a Reply

Your email address will not be published. Required fields are marked *