Azure Databricks (an Apache Spark implementation on Azure) is a big data analytics platform for the Microsoft cloud – Azure. This fast service offers a collaborative workspace for data scientists & Business analysts and also integrates seamlessly with Azure services and several BI tools like Power BI, Tableau and so on..,
Power BI Desktop is one of the powerful tools that provide rich and interactive visualizations with a plethora of default and custom visuals. This tool is not only limited to creating visualizations but also lets you transform, clean the data, and also publish it to the Power BI Service, which is a cloud-based service.
Quick way:
These both Azure Databricks and Power BI are powerful chambers for big data exploration, analysis, and visualization.
Using Databricks in Power BI Desktop allows us to lever fast performance benefits of it for all business users. Since all business users won’t be comfortable in using Azure Databricks, Power BI Desktop, being a drag and drop software, is a relatively simpler interface for all business users to use.
Pre-requisite if Any:
Familiar with Azure Databricks and how to create a Cluster and notebooks in it.
Make sure you have a Databricks cluster up and running, and a notebook, either Python or Scala is in place. Here I have created a cluster (azdbpowerbicluster) with Python (azdbpython) notebook.
Getting our data:
Try to Import sample data in the Databricks service (trail version). To do this, click on the Data icon on the left vertical menu bar and select Add Data:
Browse and upload your file, as shown below. In case you want to refer to the file used here in this article, you can get it from
This is Sales data per region for different items and channels.
The following code helps to read our csv file into mydf data frame:
mydf = sqlContext.read.csv("/drive/git/1000_Sales_Records-d540d/1000_Sales_Records-d540d.csv",header=True)
Connecting Azure Databricks data to Power BI Desktop
We need to make sure the Databricks cluster is up and running. The following are the steps for the integration of Azure Databricks with Power BI Desktop.
Step 1 – Constructing the connection URL
Go to the cluster and click on Advanced Options, as shown below:
First of all, replace jdbc:spark with https:
Next, we will have to delete a few sections from it, delete from default;transportMode…to ..Path= and again from Authmech… to …token>
Your final URL should look somewhat like this – https://<region>.azuredatabricks.net:<port>/sql/protocolv1/o/<key>/<key>
Save it somewhere; we will be using it in Power BI for the connection
Step 2 – Generate a personal access token
To connect to our clusters, we will need a personal access token in Databricks. To do this, go to the Databricks portal and click the user profile icon in the top right corner of it, as shown below:
Step 3 – Connect with Power BI Desktop
For the integration, we will first have to open the Power BI Desktop app. We will use the same CSV file, (1000 Sales Records.csv) used earlier in this article, and upload it on the Databricks portal using the Create Table with UI option.
On the Spark dialog box, copy-paste the JDBC URL (created in Step 1) in the Server field. Select HTTP as the Protocol and Direct-Query in the Data Connectivity mode, and click OK:
If everything was in place, you should be able to see all the tables available in your Databricks cluster in the Power BI Navigator dialog. You can select the data table(s) and select the Load option to load data or the Edit option to edit this data before loading in Power BI Desktop:
Summary
we can connect Azure Databricks data to Power BI Desktop quickly for rich visualizations to gain better insights about the data.
Comments
Post a Comment