Pages

Monday, 22 October 2018

Use of Power BI PushDatasets


In PowerBI, real-time streaming can be consumed in three (please visit https://docs.microsoft.com/en-us/power-bi/service-real-time-streaming ) different ways listed below:

  •  Consuming streaming data from Azure Stream Analytic
  •  Consuming streaming data from PubNub
  •  Consuming streaming data vis PushDataset


If you have got sensors which are emitting data and you want to visualize it in real-time, you can use PowerBI in conjunction with Azure Stream Analytics to build the dashboard. But if data is less frequent and you want to have a dashboards that auto-refreshes then you can use any one of three methods. In this post I will show how Push Datasets can be used to develop a dashboard.

Below is a simple architecture for this post:













We need following components to build a complete end-to-end simple solution shown in above diagram:

  1. Data Generator (to simulate data is coming to db at every x interval). This could your source which generates data less frequently.
  2. Console App (that pushes data to PowerBI PushDataset)
  3. PowerBI Dashboard


I have created a sample code that generates some data and inserts into database. Please see below code snippet:



The console application will be leveraging PowerBI REST API programming interface for pushing data. For this reason, console app needs to authenticate/authorize with PowerBI. So you need to register your console app with Azure AD in order to get OAuth token from Azure Authorization server. Please follow https://docs.microsoft.com/en-us/power-bi/developer/walkthrough-push-data-register-app-with-azure-ad to register your app.

OAuth provides four different OAuth flows based how you want to authenticate/authorize your application with Authorization server. Please visit https://auth0.com/docs/api-auth/which-oauth-flow-to-use to know which flow is best suited for your scenario.

I will be using Client Credential Flow (an OAuth flow which can be read at https://oauth.net/2/grant-types/client-credentials/ ) as console app will be treated as trusted application and also there would not be any human interaction if any authorization popup appears it would not be able to deal with.

Below is the code to get oauth token using Microsoft.IdentityModel.Clients.ActiveDirectory version 2.29.0 of nuget package:



I treat this scenario as syncing two systems (from source to target but target is PowerBI). Most of the syncing solution, we need to maintain what we have synced so far so that next time system should pick delta of data.
For this purpose, we are using ROWVERSION datatype which is auto generated by database. Please visit https://www.mssqltips.com/sqlservertip/4545/synchronizing-sql-server-data-using-rowversion/ for how to use rowversion for syncing scenario.

To maintain what has been synced, I have created a table to keep track the last row version, console application has sent to PowerBI, against a table like shown below:












For the first time, last row version should 0x000.

I also created a stored procedure that returns delta with the help of last row version and table name. Below is the stored procedure code:



Now, we got the data (delta amount), we need to send it to PushDataset in PowerBI. Every PushDataset has a unique id, and data needs to be sent to correct id.

I have created a dataset called “DeviceTelemetry” using REST API. To find the dataset id, you need to call the Power BI REST API like shown below:










And result is shown like below:








Now we got the Dataset Id as GUID, we need to use it to send data to Power BI. We will use PowerBI REST API to do this. You can do it in your console app to fetch all the datasets and grab the id for which you want to send to. For demonstration purpose I have shown you how you achieve it.

Now, the console app can you use dataset id and keep pushing data to it. Again you can leverage Power BI REST API to send data into batches or one by one. Below is a snapshot how I am sending data:









Here is the code that wraps to add rows to PowerBI PushDatasets leveraging api wrapper:


Here is the code for PowerBI Rest Api wrapped around a nice method:


Once your console app start sending data, you can go to PowerBI.com and start creating reports and dashboard like shown below:




Note that the dataset is listed as Push dataset.
Click on red boxed area (create report link) to create report as I created reports and composed them into one dashboard shown below:














That’s it so far.

Monday, 3 September 2018

Setting up an environment for Monte Carlo Simulation in Docker

In this blog I will walk you through to install JAGS inside a docker container. You might be thinking why I have chosen docker for this. The answer is very simple, when I was install JAGS on my personal computer, the OS did not recognise as a trusted software so I did not take a risk of installing on my personal computer.

If you want to play with JAGS and you don't want to install it in your computer, then Docker is the best option as I can play with the package/software and then I can delete the container.

Now you got the idea why I have chosen Docker container for this. Let's proceed to setup an environment for Monte Carlo simulation. Make sure you have got Docker installed. Follow below steps to setup the environment:

1. Open Command prompt with administrative privilege and issue follow command:
$ docker run --name mybox -t -p 8004:8004 opencpu/rstudio

Above command will download the opencpu/rstudio image locally.

2. Issue below command to start/run the container:
$ docker container start mybox

3. Open browser in your host computer and point http://localhost:8004/rstudio/ and provide opencpu as username and password like shown below:






4. Now, you need to connect to container, by issuing below command in your command prompt, to install JAGS - a tool that generate Gibbs Sampling:

$ docker exec -I -t mybox /bin/bash

You will be taken to terminal of container like shown below:



5. Issue below commands to terminal of container:
$ sudo apt-get update
$ sudo apt-get install jags

5. Now go the browser (you opened in step 3) and install "rjags" and "runjags" packages like shown below and you are done. Now you use this environment to create a simulation using Monte Carlo.


That's it so far. Stay tuned.

Wednesday, 9 May 2018

Azure IoT and IoT Edge - Part 2 (Building a Machine Learning model using generated data)


This blog is part 2 of Azure IoT Edge series. Please see http://blog.mmasood.com/2018/03/azure-iot-and-iot-edge-part-1.html if you have not read part 1.

In this blog I will cover the how we can build a logistic regression model in R using the data the captured in tables storage via IoT Hub.

We can run the simulated devices (all three at once) and wait for data to be generated and save it to table storage. But for the simplicity I have created an R script to generate the data so that I can build the model and deploy it to IoT Edge and hence we can leverage the this Edge device to apply Machine Learning model on the data it is receiving from the downstream devices.

I am using exactly the same minimum temperatures, pressure and humidity as our simulated device was using. Please see http://blog.mmasood.com/2018/03/azure-iot-and-iot-edge-part-1.html here are few lines of R script.




Let’s plot the data and see how it looks like. There are only 3 fields/feature so I will plot  Temperature vs Pressure using ggplot2:







Output of above R commands:

















We can see as the temperature and pressure increases the device is becoming bad or getting away from the good devices. For the simplicity the simulation generates higher number for temperature and pressure if device is flagged as defective.

Now let’s build a simple logistic regression model to find out the probability of device being defective.
I am using caret package for building model. Here is the code to split the training and test data:








The proportion of good vs bad for original data is: 66% (good)/33% (bad). So we make sure we don’t have skewness in the data.







Now applying glm function to data using R script shown below:







Here is the summary of the model:

















We can see from above output, the pressure is not statistically significant. The idea of this post is to have a model that we will be using in IoT Edge device.

Let’s test this model on test data set and find out the best threshold to separate the bad from good. I could have used cross-validation to find the best threshold. Use cross validation set to fine tune the parameters (eg. threshold or lambda if ridge regression is used etc).

Below is the confusion matrix when I use threshold 0.5:







Let’s construct a data frame which contains actual, predicted and calculated probability using below code:



And view first and last 5 records:

















The higher (or closer to 1) the probability the device is good.

With threshold 0.65, the confusion matrix look like below:








So we can see from above two confusion matrix, the best threshold should be 0.50 as it miss-classifies only 4 instances but when 0.65 is used it miss-classifies 5 instances.

The final model is given below:










So far I have got the model built. I will use this model in IoT Edge module which will make Edge intelligent, which I will post soon so stay tuned and happy IoTing J

Tuesday, 13 March 2018

Azure IoT and IoT Edge (Part 1)

In this blog post I will walk you through how and IoT device (for IoT Hub and IoT Edge gateway) can be created.

The simulated device will generate telemetry data that will be used by IoT Edge Module (e.g. Clustering) to find out which device need to be replace or restart it etc.

I will be posting few more blogs to achieve below:




We can see from above diagram, the main components are:
  •            IoT Hub
  •          Configuration of IoT Edge device as gateway
  •          IoT Edge Module
  •          Downstream devices


I will develop a Machine Learning model (k-means clustering) in R and will leverage in MachineLearningModel Edge Module to find which device need to be replaced or need to restart etc.

For the simplicity, the downstream device will generate following telemetry data:
  • Temperature
  • Humidity
  • Pressure


Let’s develop a downstream device that generates above random data. Follow https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-csharp-csharp-getstarted to setup you IoT Hub. I have got my IoT Hub setup now, I am creating a .Net console app that acts as device which generates some random data.

Here are some code snippets:

Here is the Main method:
Here is an example batch file to run as device1:


Now you need to register 3 devices in Azure Portal in IoT Hub here are the steps:
  •       Navigate to Azure Portal then your IoT Hub
  •       Navigate to IoT Devices
  •       Click on Add button and fill the details for the device like shown below:





  •       Now, go to the device you just created and copy the primary connection string to respective .bat file.
  •       Repeat for 3 times to create 3 devices.


Once you have created three devices, start running device1.bat, device2.bat and device3.bat. It will start sending data to IoT Hub like shown below:



And your IoT Hub will show number of messages received like shown below:




So far we have created/simulated 3 devices that started sending temperature and other data to IoT Hub. These devices will be used to send the data to IoT Edge gateway (by appending GatewayHostName=<your-gateway-host-name> to device connection string) and I will explain in next blog so stay tuned.