Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform that provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

In the last article in this series, I showed how to create a new Databricks Cluster in a Microsoft Azure Databricks Workspace.

In this article, I will show how  to create a notebook and run it on that cluster.

Navigate to the Databricks service, as shown in Fig. 1.

db01-OverviewBlade
Fig. 1

Click the [Launch Workspace] button (Fig. 2) to open the Azure Databricks page, as shown in Fig. 3.

db02-LaunchWorkspaceButton
Fig. 2

db03-DatabricksHomePage
Fig. 3

Click the "New Notebook" link under "Common Tasks" to open the "Create Notebook" dialog, as shown in Fig. 4.

db04-CreateNotebookDialog
Fig. 4

At the "Name" field, enter a name for your notebook. The name must be unique within this workspace.

At the "Language" dropdown, select the default language for your notebook. Current options are Python, Scala, SQL, and R. Selecting a language does not limit you to only using that language within this notebook. You can override the language in a given cell.

Click the [Create] button to create the new notebook. A blank notebook displays, as shown in Fig. 5.

db05-BlankNotebook
Fig. 5

Fig. 6 shows a notebook with some simple code added to the first 2 cells.

db06-Notebook
Fig. 6

You can add, move, or manipulate cells by clicking the cell menu at the top right of an existing cell, as shown in Fig. 7.

db07-AddCell
Fig. 7

In order to run your notebook, you will need to attach it to an existing, running cluster. Click the "Attach to" dropdown and select from the clusters in the current workspace, as shown in Fig. 8.  See this article for information on how to create a cluster.

db08-AttachCluster
Fig. 8

You can run all the cells in a notebook by clicking the "Run all" button in the toolbar, as shown in Fig. 9.

db09-RunAll
Fig. 9

Use the "Run" menu in the top right of a cell to run only that cell or the cells above or below it, as shown in Fig. 10.

db10-RunCell
Fig. 10

Fig. 11 shows a notebook after all cells have been run. Note the output displayed below each cell.

db11-NotebookWithResults
Fig. 11

In this article, I showed how to create, run, and manage a notebook in an Azure Databricks workspace.