## Saturday, 27 August 2016

In this post I will be exploring how can we use MapReduce to implement Gradient Descent algorithm in Hadoop for large scale data. As we know Hadoop is capable of handling peta-byte scale/size of the data.

• Writable

Before starting, first we need to understand what is Gradient Descent and where can we use it. Below is an excerpt from Wikipedia:
Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.
Gradient descent is also known as steepest descent, or the method of steepest descent. Gradient descent should not be confused with the method of steepest descent for approximating integrals.

If you look at the algorithm, it is an iterative optimisation algorithm. So if we are talking about millions of observations, then we need to iterate those millions of observations and adjust out parameter (theta).

Some mathematical notations:

where

And

Now, the question is how can we leverage Hadoop to distribute the work load to minimize the cost function and find the theta parameter?

MapReduce programming model comprises two phases. 1 Map, 2. Reduce shown in below picture. Hadoop gives programmer to only focus on map and reduce phase and rest of the workload is taken care by Hadoop. Programmers do not need to think how I am going to split data etc.

When user uploads data to HDFS, the data are splited and saved in various data nodes.
Now we know Hadoop will provide subset of data to each Mapper. So we can program our mapper to emit PartialGradientDescent serializable object. For instance if one split has 50 observations, then that mapper will return 50 partial gradient descent objects.

One more thing, there is only ONE reducer in this example, so reducer will get whole lot of data, it would be better to introduce combiner so that reducer will get low number of PartialGradientDescent objects or you can apply in-memory combining design pattern for MapReduce which I will cover in next post.

Now let’s get into java map reduce program. Before reading further it would be better you understand the Writable concept in Hadoop and some matrix algebra.

Mapper code:

We can see that map task is emitting partialGradientDescent object with lot of information. Like sum0, sum1 and 1. These information will be required in reducer to update the theta.

Now let's have a look at reducer code:

We can see from Reducer code that we are summing up all given partial gradients. This can be improved if we supply combiner that does some partial sum before reaching to reducer. For instance if we have 50 mapper, then after each mapper the combiner will sum and send to reducer in that case reducer will get 50 partial gradient objects.