MongoDB MapReduce Example: The Complete Guide

To some extent, the map and reduce phases were inspired by the map and reduce, the high-order functions widely used and well known in the functional programming world in JavaScript. As the name MapReduce implies, the map job is always performed before the reduce job.

The Mapper will start by reading the data collection and building the Map with the required fields that we need to process and group them into one array. And then, this key-value pair is fed into the Reducer, which will transform the values. 

MapReduce is a framework that allows parallelizing extensive data sets across many physical or virtual servers. The typical Map/Reduce program consists of two phases:

  1. map phase: filter / transform / convert data
  2. reduce phase: perform aggregations over the data

MongoDB MapReduce

MapReduce is the data processing mechanism for condensing large volumes of data into useful aggregated results. MongoDB uses the MapReduce command for map and reduce operations. MapReduce is used for processing large data sets. In straightforward terms, the MapReduce command takes two primary inputs, the mapper function and the reducer function.

Example

Let us say; we have the following data.

[
    {
        name: Krunal,
        age: 26
    },
    {
        name: Krunal,
        age: 25
    },
    {
        name: Ankit,
        age: 24
    },
    {
        name: Ankit,
        age: 25
    },
    {
        name: Rushabh,
        age: 26
    },
    {
        name: Rushabh,
        age: 27
    }
]

Now, we need to apply the MapReduce function and see what we get as an output.

And we want to count the age for all the customers with the same name. So we will run this data through the Mapper function and then the Reducer to achieve the result.

When we ask the Mapper function to process the above data without any conditions, it will generate the following result.

Key
Krunal [26,25]
Ankit [24, 25]
Rushabh [26,27]

So, we have applied the map method and generate a key-value pair, where the key is the name and the value is an array of grouped age values based on their keys.

Now, we can apply the reduce method to add their ages and return the grouped reduced values. So we have used these operations to process the values and get the desired output. So, in the reducer function, we get the first row from the above table, and we need to process it. We will iterate through all the values and add them. It will be the sum for the first row, and next, the reducer will receive the second, and it will do the same thing till all the rows are completed.

Name Total
Krunal 51
Ankit 49
Rushabh 53

So now you can understand why the Mapper function is called a Mapper (because it will create a map of data in the form of a key-value pair) & why the Reducer is called the Reducer (because it will reduce the data that the mapper has generated to a more simplified form).

MapReduce on MongoDB

First, create a mongodb database and collection using the following command.

use mapreduce
db.createCollection("blogs")

Now, insert some documents.

db.blogs.insert({
  "title" : "AppDividend",
  "published" : "2017-03-27",
  "authors": [
      { "firstName" : "Krunal",  "lastName" : "Lathiya" }
  ],
  "categories" : [ "Angular", "React", "Vue" ]
})

db.blogs.insert({
  "title" : "Demonuts",
  "published" : "2016-12-10",
  "authors": [
      { "firstName" : "Krunal",  "lastName" : "Lathiya" }
  ],
  "categories" : [ "Android", "PHP" ]
})

db.blogs.insert({
  "title" : "Scotch.io",
  "published" : "2011-12-10",
  "authors": [
      { "firstName" : "Chris",  "lastName" : "Sevilleja" }
  ],
  "categories" : [ "React", "Laravel", "Vue", "Angular", "MongoDB", "PHP", "Android" ]
})

The map-reduce function first queries the collection and then maps the result documents to emit the key-value pairs.

The syntax is following.

db.collection.mapReduce(
   function() {emit(key,value);},  //map function
   function(key,values) {return reduceFunction}, {
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)
  1. The map is a javascript function that maps a value with a key and emits a key-value pair.
  2. The reduce is a javascript function that reduces or groups all the documents having the same key.
  3. The out specifies the location of the map-reduce query result.
  4. The query specifies the optional selection criteria for selecting documents.
  5. The sort specifies the optional sort criteria.
  6. The limit specifies the optional maximum number of documents to be returned.

Now, we will find how many blogs one author has with its first name and last name.

For that, the command is the following.

db.runCommand( {
    mapReduce: "blogs",
    map: function(){
        for (let index = 0; index < this.authors.length; ++index) {
            let author = this.authors[ index ];
            emit( author.firstName + " " + author.lastName, 1 );
        }
    },
    reduce: function(author, counters){
        count = 0;

        for (let index = 0; index < counters.length; ++index) {
            count += counters[index];
        }

        return count;
    },
    out: { inline: 1 }
} )

MongoDB MapReduce Example Tutorial

So, this is how we can process the large data with MapReduce Function in MongoDB.

That’s it for this tutorial.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.