# Tuesday, July 9, 2019

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform that provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

In the last article in this series, I showed how to create a new Databricks service in Microsoft Azure.

A cluster is a set of compute nodes that can work together. All Databricks jobs run  in a cluster, so you will need to create one if you want to do anything with your Databricks service.

In this article, I will show how  to create a cluster in that service.

Navigate to the Databricks service, as shown in Fig. 1.

db01-OverviewBlade
Fig. 1

Click the [Launch Workspace] button (Fig. 2) to open the Azure Databricks page, as shown in Fig. 3.

db02-LaunchWorkspaceButton
Fig. 2

db03-DatabricksHomePage
Fig. 3

Click the "New Cluster" link to open the "Create Cluster" dialog, as shown in Fig. 4.

db04-CreateCluster
Fig. 4

At the "Cluster Name" field, enter a descriptive name for your cluster.

At the "Cluster Mode" dropdown, select "Standard" or "High Concurrency". The "High Concurrency" option can run multiple jobs concurrently.

At the "Databricks Runtime Version" dropdown, select the runtime version you wish to support on this cluster. I recommend selecting the latest non-beta version.

At the "Python Version" dropdown, select the version of Python you wish to support. New code will likely be written in version 3, but you may be running old notebooks written in version 2.

I recommend checking the "Enable autoscaling" checkbox. This allows the cluster to automatically spin up the number of nodes required for a  job, effectively balancing cost and performance.

I recommend checking the "Terminate after ___ minutes" checkbox and including a reasonable amount of time (I usually set this to 60 minutes) of inactivity to shut down your clusters. Running a cluster is an expensive operation, so you will save a lot of money if you shut them down when not in use. Because it takes a long time to spin up a cluster, consider how frequently a new job is required before setting this value too low. You may need to experiment with this value to get it right for your situation.

At the "Worker Type" node, select the size of machines to include in your cluster. If you enabled autoscaling, you can set the minimum and maximum worker nodes as well. If you did not enable autoscaling, you can only set the number of worker nodes. My experience is that more nodes and smaller machines tends to be more cost-effective than fewer nodes and more powerful machines; but you may want to experiment with your jobs to find the optimum setting for your organization.

At the "Driver Type" dropdown, select "Same as worker".

You can expand the "Advanced Options" section to pass specific data to your cluster, but this is usually not necessary.

Click the [Create Cluster] button to create this cluster. It will take a few minutes to create and start a new cluster.

When the cluster is created, you will see it listed, as shown in Fig. 5, with a state of "Running".

db05-Clusters
Fig. 5

You are now ready to create jobs and run them on this cluster. I will cover this in a future article.

In this article, you learned how to create a cluster in an existing Azure Databricks workspace.

Tuesday, July 9, 2019 9:37:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, July 8, 2019

Episode 571

Jon Galloway on the .NET Foundation

The .NET Foundation recently expanded its board and its goals. Jon Galloway discusses what the Foundation does and what it strives to do.

Monday, July 8, 2019 9:54:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, July 7, 2019

7/7
Today I am grateful for my first visit to the Chicago Architectural Center yesterday.

7/6
Today I am grateful for my first visit to the National Museum of Mexican Art yesterday.

7/5
Today I am grateful for an afternoon at the Lincoln Park Zoo.

7/4
Today I am grateful I was born in the United States.

7/3
Today I am grateful for trivia night at the Twisted Hippo.

7/2
Today I am grateful for a late Father's Day dinner celebration last night with my son and his girlfriend.

7/1
Today I am grateful for all the parks in the city of Chicago.

6/29
Today I am grateful for a bike ride around downtown St. Joseph yesterday.

6/28
Today I am grateful for a good personal trainer.

6/27
Today I am grateful to attend my first Chicago Sky WNBA game yesterday.

6/26
Today I am grateful to see the Rolling Stones in concert last night - the first time I've seen them in my life.

6/25
Today I am grateful for nice weather.

6/24
Today I am grateful for my first visit to Traverse City in 21 years.

6/23
Today I am grateful to celebrate Kristen and Scott's wedding with them in Traverse City yesterday.

6/22
Today I am grateful for breakfast yesterday with Austin in Amsterdam. We had not seen one another in decades.

6/21
Today I am grateful to spend yesterday at the Van Gogh Museum and the Reijksmuseum in Amsterdam.

6/20
‪Today I am grateful for a boat ride last night along the Amsterdam canals with Brent.

6/19
Today I am grateful that my son's Achilles tendon surgery went well yesterday.

6/18
Today I am grateful for dinner at a French restaurant in Amsterdam with my international team last night.

6/17
Today I am grateful for dinner with Mike in Amsterdam last night.

6/16
Today I am grateful to visit Amsterdam for the first time.

6/15
‪Today I am grateful to kick off a new project with a new customer yesterday. ‬

6/14
Today I am grateful for a chiropractor and masseuse across the street from my home.

6/13
Today I am grateful for a graduation celebration dinner last night.

6/12
Today I am grateful to attend Larissa's high school graduation ceremony yesterday.

6/11
Today I am grateful to work on my balcony on a beautiful summer morning.

6/10
Today I am grateful for a walk around North Park Village Nature Center yesterday.

6/9
Today I am grateful for:
-Hearing some great live music with Thad at the Chicago Blues Festival
-A visit to the Printers Row Lit Fest

6/8
Today I am grateful for new furniture on my balcony.

6/7
Today I am grateful that my son's injury is not as severe as we originally thought.

6/6
Today I am grateful that Raffaele and the folks at IT Camp are thinking of me in Transylvania.

6/5
Today I am grateful to receive an email with some kind words yesterday.

6/4
Today I am grateful to discover the exhibits at the Chicago Public Library yesterday.

6/3
Today I am grateful to visit the Fabyan Japanese Gardens and Fox River in Batavia, IL yesterday.

Sunday, July 7, 2019 3:28:22 PM (GMT Daylight Time, UTC+01:00)
# Saturday, July 6, 2019

HousekeepingHousekeeping by Marilynne Robinson is filled with water and filled with tragedy.

It opens with a train crash into a lake, killing hundreds, including the grandfather of Ruthie and Lucille. Later, Lucille and Ruthie's mother commits suicide by driving her car into the same lake. Abandoned years earlier by their father, the girls grow up under the care of their grandmother and aunts until eccentric Aunt Silvie shows up and moves in.

Silvie is a former transient, who sometimes falls asleep on park benches. She is not cut out for motherhood and the girls withdraw into one another, skipping schools and making no friends, other than one another. They skip school and the local authorities begin to question their situation, forcing everyone in this family to make a choice.

Housekeeping is a simple story, built on the strength of the characters. Robinson presents humor and tragedy in an eloquent style that keeps the reader engaged. For such a short novel, we see a full picture of the three main characters. It is worth the time to read.

Saturday, July 6, 2019 9:12:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, July 5, 2019

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform.

Databricks provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

Apache Spark is open source and Databricks is owned by the Databricks company; but, Microsoft adds value by providing the hardware and fabric on which these tools are deployed, including providing capacity on which to scale and built-in fault tolerance.

To create an Azure Databricks environment, navigate to the Azure Portal, log in, and click the [Create Resource] button (Fig. 1).

db01-CreateResourceButton
Fig. 1

From the menu, select Analytics | Azure Databricks, as shown in Fig. 2.

db02-NewDataBricksMenu
Fig. 2

The "Azure Databricks service" blade displays, as shown in Fig. 3.

db03-NewDataBricksBlade
Fig. 3

At the "Workspace name" field, enter a unique name for the Databricks workspace you will create.

At the "Subscription" field, select the subscription associated with this workspace. Most of you will have only one subscription.

At the "Resource group" field, click the "Use existing" radio button and select an existing Resource Group from the dropdown below; or click the "Create new" button and enter the name and region of a new Resource Group when prompted.

At the "Location" field, select the location in which to store your workspace. Considerations include the location of the data on which you will be working and the location of developers and users who will access this workspace.

At the "Pricing Tier" dropdown, select the desired pricing tier. The Pricing Tier options are shown in Fig. 4.

db04-PricingTier
Fig. 4

If you wish to deploy this workspace to a particular virtual network, select "Yes" radio button at this question.

When completed, the blade should look similar to Fig. 5.

db05-NewDataBricksBlade-Completed
Fig. 5

Click the [Create] button to create the new Databricks service. This may take a few minutes.

Navigate to the Databricks service, as shown in Fig. 6.

db06-OverviewBlade
Fig. 6

Click the [Launch Workspace] button (Fig. 7) to open the Azure Databricks page, as shown in Fig. 8.

db07-LaunchWorkspaceButton
Fig. 7

db08-DatabricksHomePage
Fig. 8

In this article, I showed you how  to create a new Azure Databricks service. In future articles, I will show how to create clusters, notebooks, and otherwise make use of your Databricks service.

Friday, July 5, 2019 9:00:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, July 4, 2019

GCast 55:

GitHub Deployment to an Azure Web App

Learn how to set up automated deployment from a GitHub repository to an Azure Web App

Thursday, July 4, 2019 9:58:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, July 3, 2019

Source control is an important part of software development - from collaborating with other developers to enabling continuous integration and continuous deployment to providing the ability to roll back changes.

Azure Data Factory (ADF) provides the ability to integrate with source control systems GitHub or Azure DevOps.

I will walk you through doing this, using GitHub.

Before you get started, you must have the following:

A GitHub account (Free at https://github.com)

A GitHub repository created in your account, with at least one file in it. You can easily add a "readme.md" file to a repository from within the GitHub portal.

Create an ADF service, as described in this article.

Open the "Author & Monitor" page (Fig. 1) and click the "Set up Code Repository" button (Fig. 2)

ar01-ADFOverviewPage
Fig. 1

ar02-SetupCodeRepositoryButton
Fig. 2

The "Repository Settings" blade displays, as shown in Fig. 3.

ar03-RepositoryType
Fig. 3

At the "Repository Type", dropdown, select the type of source control you are using. The current options are "Azure DevOps Git" and "GitHub". For this demo, I have selected "GitHub".

When you select a Repository type, the rest of the dialog expands with prompts relevant to that type. Fig. 4 shows the prompts when you select "GitHub".

ar04-RepositoryName
Fig. 4

I don't have a GitHub Enterprise account, so I left this checkbox unchecked.

At the "GitHub Account" field, enter the name of your GitHub account. You don't need the full URL - just the name. For example, my GitHub account name is "davidgiard", which you can find online at https://github.com/davidgiard; so, I entered "davidgiard" into the "GitHub Account" field.

The first time you enter this account, you may be prompted to sign in and to authorize Azure to access your GitHub account.

Once you enter a valid GitHub account, the "Git repository name" dropdown is populated with a list of your repositories. Select the repository you created to hold your ADF assets.

After you select a repository, you are prompted for more specific information, as shown in Fig. 5

ar05-RepositorySettings
Fig. 5

At the "Collaboration branch", select "master". If you are working in a team environment or with multiple releases, it might make sense to check into a different branch in order control when changes are merged. To do this, you will need to create a new branch in GitHub.

At the "Root folder", select a folder of the repository in which to store your ADF assets. I typically leave this at "/" to store everything in the root folder; but, if you are storing multiple ADF services in a single repository, it might make sense to organize them into separate folders.

Check the "Import existing Data Factory resources to repository" checkbox. This causes any current assets in this ADF asset to be added to the repository as soon as you save. If you have not yet created any pipelines, this setting is irrelevant.

At the "Branch to import resources into" radio buttons, select "Use Collaboration".

Click the [Save] button to save your changes and push any current assets into the GitHub repository.

Within seconds, any pipelines, linked services, or datasets in this ADF service will be pushed into GitHub. You can refresh the repository, as shown in Fig. 6.

ar06-GitHub
Fig. 6

Fig. 7 shows a pipeline asset. Notice that it is saved as JSON, which can easily be deployed to another server.

ar07-GitHub
Fig. 7

In this article, you learned how to connect your ADF service to a GitHub repository, storing and versioning all ADF assets in source control.

Wednesday, July 3, 2019 6:56:40 PM (GMT Daylight Time, UTC+01:00)
# Tuesday, July 2, 2019

GitHub provides a good way to create and manage source code repository.

But, how do you delete a repository when you no longer need it? I found this to be non-intuitive when I needed to delete one.

Here are the steps.

Log into GitHub and open your repository, as shown in Fig. 1.

dr01-repo
Fig. 1

Click the [Settings] tab (Fig. 2) near the top of the page.

dr02-SettingsButton
Fig. 2

The "Settings" page displays, as shown in Fig. 3

dr03-SettingsPage
Fig. 3

Scroll to the "Danger Zone" section at the bottom of the "Settings" page, as shown in Fig. 4.

dr04-DangerZone
Fig. 4

Click the [Delete this repository] button.

A confirmation popup (Fig. 5) displays, warning you that this action cannot be undone (which is why it is in the Danger Zone).

dr05-AreYouSure
Fig. 5

If you are sure you want to delete this repository, type the repository name in the textbox and click the [I understand the consequences, delete this repository] button.

If all goes well, a confirmation message displays indicating that your repository was successfully deleted, as shown in Fig. 6.

dr06-Confirmation
Fig. 6

Congratulations! Your repository is no more! It is an ex-repository!

Tuesday, July 2, 2019 9:55:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, July 1, 2019

Episode 570

Laurent Bugnion on Migrating Data to Azure

Laurent Bugnion describes how he migrated from on-premise MongoDB and SQL Server databases to CosmosDB and Azure SQL Database running in Microsoft Azure, using both native tools and the Database migration service.

Monday, July 1, 2019 9:39:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, June 30, 2019

MickJaggerFrom a distance, you would swear he was half his 75 years. You would never guess he underwent heart surgery two months ago. Only the cracks in his face revealed Mick Jagger's age. Not his body, which gyrated and strutted and danced for 2 hours as the legendary Rolling Stones performed before an overflowing Soldier Field in Chicago Tuesday night.

Most viewers saw him from a distance in the cavernous stadium. But the energy was high, and the audience sang and danced along with the band. Mick, Keith Richards, Ron Wood, and Charlie Watts have been recording and touring together for decades. People may think of them as the new member's, but Darryl Jones (of south Chicago), who joined the band when bassist Bill Wyman retired in 1993; and Chuck Leavell, former Allman Brothers keyboardist, who has been with the band since 1982 also have a tenure longer than most bands exist. Yet, they are newbies when compared with thier septuagenarian teammates.

This was my first time seeing the Stones and it may be their last visit to Chicago. Now in their 50th year, this year's "No Filter" tour will take them to 13 cities in the U.S. And they chose to open in Chicago, after Jagger's illness forced them to reshuffle the tour schedule.

To the delight of the crowd, they heard many references to Chicago. Mick noted the band had played the city nearly 40 times. And he introduced the new Chicago mayor and governor, who were in attendance, noting that Governor Pritzker had signed legislation that day legalizing cannabis in January. "Some of you may have jumped the gun," he quipped.

Of course, the Rolling Stones drew heavily from their catalog of hit songs - from opening with "Jumpin' Jack Flash" and "It's Only Rock 'n' Roll" to their encores: "Gimme Shelter" and "Satisfaction". But they included a few deeper album tracks, like "Bitch" and "Slipping Away".

It was mostly an evening of high energy rock and roll and blues; but a highlight of the night was when Mick, Keith, Ron, and Charlie brought their instruments (including a small drum kit) to a platform that extended 30 yards out into the audience to play two acoustic numbers: "Play with Fire" and "Sweet Virginia".

When the evening ended, it felt like they had given all they had and all we needed.

After 50 years, the band knew every note by heart, but still brought energy and made us feel they were having a good time after all this time.

Sunday, June 30, 2019 9:45:00 AM (GMT Daylight Time, UTC+01:00)