Saturday, 27 August 2016

Implementing Gradient Descent Algorithm in Hadoop for large scale data

In this post I will be exploring how can we use MapReduce to implement Gradient Descent algorithm in Hadoop for large scale data. As we know Hadoop is capable of handling peta-byte scale/size of the data.

In this article I will be using following concept:
  • Gradient Descent algorithm
  • Hadoop Map Reduce Job
  • Writable
  • Reading from HDFS via Hadoop API
  • Hadoop Counter

Before starting, first we need to understand what is Gradient Descent and where can we use it. Below is an excerpt from Wikipedia:
Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.
Gradient descent is also known as steepest descent, or the method of steepest descent. Gradient descent should not be confused with the method of steepest descent for approximating integrals.

If you look at the algorithm, it is an iterative optimisation algorithm. So if we are talking about millions of observations, then we need to iterate those millions of observations and adjust out parameter (theta).

Some mathematical notations:



Now, the question is how can we leverage Hadoop to distribute the work load to minimize the cost function and find the theta parameter?

MapReduce programming model comprises two phases. 1 Map, 2. Reduce shown in below picture. Hadoop gives programmer to only focus on map and reduce phase and rest of the workload is taken care by Hadoop. Programmers do not need to think how I am going to split data etc.

Please visit to know about MapReduce framework.

When user uploads data to HDFS, the data are splited and saved in various data nodes.
Now we know Hadoop will provide subset of data to each Mapper. So we can program our mapper to emit PartialGradientDescent serializable object. For instance if one split has 50 observations, then that mapper will return 50 partial gradient descent objects.

One more thing, there is only ONE reducer in this example, so reducer will get whole lot of data, it would be better to introduce combiner so that reducer will get low number of PartialGradientDescent objects or you can apply in-memory combining design pattern for MapReduce which I will cover in next post.

Now let’s get into java map reduce program. Before reading further it would be better you understand the Writable concept in Hadoop and some matrix algebra.

Mapper code:

We can see that map task is emitting partialGradientDescent object with lot of information. Like sum0, sum1 and 1. These information will be required in reducer to update the theta.

Now let's have a look at reducer code:

We can see from Reducer code that we are summing up all given partial gradients. This can be improved if we supply combiner that does some partial sum before reaching to reducer. For instance if we have 50 mapper, then after each mapper the combiner will sum and send to reducer in that case reducer will get 50 partial gradient objects.

and custom writable (ie. PartialGradientDescent)

and the last piece of the puzzle is the Driver program that trigger the Hadoop job based on number of iterations you need.

That's it for now. Stay tuned.

Tuesday, 15 March 2016

Timer job – from on premise to Cloud World (Azure) using WebJob

The SharePoint Timer service runs in background to do long running tasks. The Timer service does some important SharePoint clean up tasks in the background but can also be used to provide useful functional tasks. For instance, there may be  a situation when you want to send newsletters to your users on regular basis or want to keep your customer up to date with  some regular timed information.

This is part two of the series. Please read first post at of series "From SharePoint On-Prem to Office365"

I will be using SharePoint Timer Service to send an email to newly registered customers/users for this demo. The newly registered customers/users are stored in SharePoint list with a status field capturing whether an email has been sent or not.
There are some implementation choices when developing a SharePoint Timer service:
  1. Azure Web Job
  2. Azure Worker Role
  3. Windows Service (can be hosted on premise or vm on Cloud)
  4. Task Scheduler (hosted on premise)
I am choosing WebJob as it is free of cost and I can leverage my Console application as WebJob. Please check why to choose Web Job.

Azure web job does not live it its own. It sits under Azure Web Apps. For this purpose I am going to create a dummy web app and host my Azure web job. I will be hosting all my CSOM code in this web job.

There are two types of web job:
  • Continuous best fit for queuing application where it keeps receiving messages from queue.
  • On Demand can be scheduled for hourly, weekly and monthly etc.
The Web Job is used to host and execute CSOM code to get information about the user/customers from SharePoint to send email. Following code snippets show what web job is doing:

Querying SharePoint using CSOM and CAML Query:

Sends Email using Office365 Web Exchange:

Composing email using Razor Engine templating engine:

And finally update SharePoint list item using CSOM:

You can download full source code from Codeplex:
When writing a Web Job, the following points should be considered to make your web job diagnosable and reusable:
  1. Do not absorbs exceptions. Handle it first throw it to let web job know something went wrong.
  2. Try to use interfaces so that it can be mocked for unit testings
  3. Always log major steps and errors using Console.WriteLine etc
  4. Make your code like it can be used as console application so that it can be used in Task scheduler
  5. Try to avoid hardcoding. Maximise the use of configuration. It can be plugged from Azure portal as well.
It is time to publish this web job. There are lots of article out there how to create schedule for the web job. I would simply be using Visual Studio to create the schedule before publish it. On Visual Studio, right click the project and click “Publish as Azure Web Job…” and it will launch a UI to specify your schedule as shown below:
Schedule settings
That’s it. Happy SharePointing :)

Tuesday, 7 April 2015

From SharePoint On-Premise to Office365

In this post I will show you how you can convert you SharePoint farm solutions (.wsp) to solution that works with Office365/cloud.
Following is the road map from Microsoft that shows how Microsoft is transforming Office product for every platform:

Picture copied from Microsoft Office365 Developer site.
It looks like we have now more audiences (Developers) or we can say that we have more options to develop solutions that target Microsoft Office product.
Following are the common tasks that we usually do when we develop SharePoint Farm solutions:
1.      Site Definition
2.      Site creation
3.      Item Receiver code
4.      Feature (I.e. To create site columns, content types, list or document library etc)
5.      Site Columns and Content Types
6.      List or document library creations
7.      Workflow
8.      File upload using Module
9.      Timer job
10.   Querying sites, lists and other SharePoint objects
11.   Item creations in list
12.   Branding (customising master page, page layouts etc)
13.   WebPart development
To convert Farm solutions code to cloud compatible we have got following choices to host code:
1.      ASP.NET MVC application
2.      ASP.NET Web-Form application
3.      Console Application (for continues integration environment)
4.      Windows Phone/Table client
5.      Php application
6.      Android / iOS application
I will start with Console application that query SharePoint objects from On-Premise SharePoint farm as well as Office365. By doing this way we can prepare ourselves for cloud.
Let’s start writing some code that works in both environment.
bool forCloud = true;

Console.WriteLine("Connecting to Office365 at");

// Open connection to Office365 tenant
ClientContext clientContext = new ClientCon-text("");
clientContext.AuthenticationMode = ClientAuthenticationMode.Default;

if (forCloud)
   //creating secure string
   SecureString password = new SecureString();
   foreach (char c in Office365Password)

   clientContext.Credentials = new SharePointOnlineCredentials(Office365UserId, password);
   //Comment this line if you want to use your default network credential
   clientContext.Credentials = new NetworkCredential("UserName", "Password", "Domain");

Console.WriteLine("Executing query...");

//load web
Web web = clientContext.Web;

//displaying title of the web.
Console.WriteLine("Web title: {0}", web.Title);

Console.WriteLine("Loading lists...");
ListCollection lists = web.Lists;

foreach (List list in lists)                
  Console.WriteLine("List title: {0}", list.Title);                


For on premise just set onCloud to false and rest of the code would be same for both environment. After executing it will show web site name and all lists within the site.
Using this simple technique we can develop app-part, which I will blog soon, to show aggregated data from various lists in Office365.
In next post, I will show you how we can leverage ASP.NET MVC application to host SharePoint CSOM code that will be used to create site definition and other stuff.

That's it for now. Please leave your valuable feedback.

Wednesday, 18 March 2015

Deploy Content Type with Lookup field

In this post I am going to show how we can deploy content type declaratively (through xml) with fields using known GUIDs.

One of the challenge with the look up field is that it binds to Lookup list via ListId (i.e. GUID). There are couple of ways you can achieve this. One is to use SharePoint Object Model to create fields and content types via Feature code.

First you need to create the lookup list via site scope feature. You might be thinking why I am creating a list in site scope feature? The reason is simple I just want to create the lookup list in root web of the site collection and that will be accessible to site columns.

Here is the project structure shown in image:

The LookupListProvisioner feature receiver will create a list that can be used as look up field. Here is the code:

Next, you need to create site column using SiteColumnsProvisioner feature receiver using below code:

As you can see in the code, we are using existing field GUID that can be used in Content Type. In ContentTypeProvisioner feature, I have used Elements.xml file to create the content type like shown below:

As you can see in above code, I have used pre-defined ContentTypeId and Field Guid. I have written a tool, that generates xml for fields and content types from SharePoint environment, can be found at
That’s it for now. Happy coding J

Monday, 24 November 2014

PowerView - Getting Started

In this post I am going to show you how you can use PowerView using basic data model. In next post I will demonstrate how you can use PowerView to visualise data from SQL Server, Analysis Service or other data sources.

Assume we have following data in excel called “CountryNStates” as shown below image:

Now add these tables into Data Model using option available from ribbon as shown below image:

Once all your tables are added to data model, you can design a Data Model with proper relationship. Below is shown for above two tables:

Now it’s time to create some PowerView report.

To add a PowerView report in Excel, go to “Insert” tab and click on PowerView icon. You will see a new sheet for PowerView report. You should also see all the tables you created in the data model.

Here is the snapshot of the report:

Without doing too much, PowerView provides, filters, auto measures (i.e. SUM) etc. In above report the numbers are not formatted so I will show you how to format numbers etc.

Go back to Data Model design page by clicking on Manage button under “POWERPIVOT” tab and use formatting tab as shown below image:

When you update it, Excel will prompt you in the report section that you have updated you Data Model. Now refresh the report you will see the numbers are formatted.

Now let’s add a measure that will show the percentage contribution to sales. I will call it “% Sale”. Here we will be using DAX (Data Analysis eXpression) to achieve this.

To create a measure, you need to go back to Data Modeling page. The formula for percentage would be like below:

% Sale of Queensland = ( Total Sales for Queensland / Total Sales in Australia ) * 100

Now let’s calculate it.

Below are measures in DAX:

Total Sales:=CALCULATE(SUM(Sales[Sales]), ALL(Sales))

% of Sales:=SUM(Sales[Sales]) / Sales[Total Sales]

Just make sure you need to display it like percentage. You can do this by right clicking on the measure and selecting formatting option then select “Percentage” option.

Once you have done this, you should see following measures in Sales table under measures area as shown below:

The beauty of % Sale measure is that it knows the context so we don’t need to say calculate % sale for Queensland etc.

Now add above % sales in the report. Here is the final report after adding % sales.

Please leave your comments if you like or dislike it. In next post I will show you how you can create BI Semantic Model using Visual Studio and deploy it to Analysis Service – TABULAR and consume it from PowerView in SharePoint environment.

That’s it for now. Happy SharePointing and PowerViewing J

Sunday, 3 August 2014

Reading Settings from a SharePoint List in Nintex Workflow

Recently one of my client asked me to create a notification email service to send email for content refresh after certain period (x number of days). But I was not allowed to write any server side otherwise I could have written SharePoint Service.

As I was not allowed to write server side code, so I thought to create a Nintex workflow as they are using it in their environment. If anyone has got any other technique for this task please comment it.

Here is my setting list:

and Edit form

Note: I am using KeyName as a choice so that user cannot modify it by mistake.

And below is a snapshot of my test workflow that only logs the setting value:


When you configure Querying Workflow Settings list, from there you can create a workflow variable that will hold the setting value. For each setting value, you need a workflow variable.

Here is the snapshot of Workflow Variables popup:

And below is the screenshot of the configuration of Query List Action:


In Filter area, you need to specify the exact match so that Query List Action can fetch the values for that setting.

For instance, I have got a key name in the list as “FrequencyInDays” as you can see in the screenshot of the settings list.

Once you have filtered the key, you need to select the field, which has got the value for that setting and click “Add” button. After clicking on add button, that field name (i.e. KeyValue) will appear at the bottom as I have highlighted in red and then you have to select the workflow variable to hold the value from that variable.

By using this technique, you can read all required settings value from the list once a workflow has initialised or can be queried before using it in other workflow actions.

That’s it.

Monday, 12 May 2014

Customising SharePoint 2013 People Search Result

Customising SharePoint 2013 People Search Result

1.       Copy an existing Display Template for People residing at Master page and page layouts gallery under Display Templates\Search\Item_People.html

2.       Name it whatever you want. For instance Item_KI_People.html

3.       To check the exact name of the field you are adding to ManagedPropertyMapping, go to your Search Service Application and click on Search Schema and view the property name of your desired property as shown below:


4.       Edit Item_KI_People.html file in your notepad and add following highlighted/boxed text:


5.       Upload as Item_KI_People.html and provide value for the metadata as shown below image:


6.       Navigate to Site Settings > Result Types

7.       Click on existing Person result type and select Copy

8.       Provide values as shown in image:

9.       Click Save.

10.   Now go to http://kispapp-dev:8080/Search/Pages/peopleresults.aspx?k=masood and replace kispapp-dev:8080 with your site hosting search center.

11.   Edit peopleresults.aspx page

12.   Edit People Search Core Results web part and select your desired Item Display Template shown in below image:

13.   Click OK and perform search and you would see new changes as shown below:

14.   That’s it. Enjoy and comment if it helped you save your time.