Introduction to MapReduce

RapidQL is a very efficient tool for fetching data from multiple data sources. Combined with MapReduce, it let's you aggregate the fetched data, conveniently analyzing data found in multiple data sources.

A MapReduce pipeline is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). [Wikipedia].

400400

Fig 1. Map Reduce Overview