# Wednesday, July 3, 2019

Source control is an important part of software development - from collaborating with other developers to enabling continuous integration and continuous deployment to providing the ability to roll back changes.

Azure Data Factory (ADF) provides the ability to integrate with source control systems GitHub or Azure DevOps.

I will walk you through doing this, using GitHub.

Before you get started, you must have the following:

A GitHub account (Free at https://github.com)

A GitHub repository created in your account, with at least one file in it. You can easily add a "readme.md" file to a repository from within the GitHub portal.

Create an ADF service, as described in this article.

Open the "Author & Monitor" page (Fig. 1) and click the "Set up Code Repository" button (Fig. 2)

Fig. 1

Fig. 2

The "Repository Settings" blade displays, as shown in Fig. 3.

Fig. 3

At the "Repository Type", dropdown, select the type of source control you are using. The current options are "Azure DevOps Git" and "GitHub". For this demo, I have selected "GitHub".

When you select a Repository type, the rest of the dialog expands with prompts relevant to that type. Fig. 4 shows the prompts when you select "GitHub".

Fig. 4

I don't have a GitHub Enterprise account, so I left this checkbox unchecked.

At the "GitHub Account" field, enter the name of your GitHub account. You don't need the full URL - just the name. For example, my GitHub account name is "davidgiard", which you can find online at https://github.com/davidgiard; so, I entered "davidgiard" into the "GitHub Account" field.

The first time you enter this account, you may be prompted to sign in and to authorize Azure to access your GitHub account.

Once you enter a valid GitHub account, the "Git repository name" dropdown is populated with a list of your repositories. Select the repository you created to hold your ADF assets.

After you select a repository, you are prompted for more specific information, as shown in Fig. 5

Fig. 5

At the "Collaboration branch", select "master". If you are working in a team environment or with multiple releases, it might make sense to check into a different branch in order control when changes are merged. To do this, you will need to create a new branch in GitHub.

At the "Root folder", select a folder of the repository in which to store your ADF assets. I typically leave this at "/" to store everything in the root folder; but, if you are storing multiple ADF services in a single repository, it might make sense to organize them into separate folders.

Check the "Import existing Data Factory resources to repository" checkbox. This causes any current assets in this ADF asset to be added to the repository as soon as you save. If you have not yet created any pipelines, this setting is irrelevant.

At the "Branch to import resources into" radio buttons, select "Use Collaboration".

Click the [Save] button to save your changes and push any current assets into the GitHub repository.

Within seconds, any pipelines, linked services, or datasets in this ADF service will be pushed into GitHub. You can refresh the repository, as shown in Fig. 6.

Fig. 6

Fig. 7 shows a pipeline asset. Notice that it is saved as JSON, which can easily be deployed to another server.

Fig. 7

In this article, you learned how to connect your ADF service to a GitHub repository, storing and versioning all ADF assets in source control.

Wednesday, July 3, 2019 6:56:40 PM (GMT Daylight Time, UTC+01:00)
# Tuesday, July 2, 2019

GitHub provides a good way to create and manage source code repository.

But, how do you delete a repository when you no longer need it? I found this to be non-intuitive when I needed to delete one.

Here are the steps.

Log into GitHub and open your repository, as shown in Fig. 1.

Fig. 1

Click the [Settings] tab (Fig. 2) near the top of the page.

Fig. 2

The "Settings" page displays, as shown in Fig. 3

Fig. 3

Scroll to the "Danger Zone" section at the bottom of the "Settings" page, as shown in Fig. 4.

Fig. 4

Click the [Delete this repository] button.

A confirmation popup (Fig. 5) displays, warning you that this action cannot be undone (which is why it is in the Danger Zone).

Fig. 5

If you are sure you want to delete this repository, type the repository name in the textbox and click the [I understand the consequences, delete this repository] button.

If all goes well, a confirmation message displays indicating that your repository was successfully deleted, as shown in Fig. 6.

Fig. 6

Congratulations! Your repository is no more! It is an ex-repository!

Tuesday, July 2, 2019 9:55:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, July 1, 2019

Episode 570

Laurent Bugnion on Migrating Data to Azure

Laurent Bugnion describes how he migrated from on-premise MongoDB and SQL Server databases to CosmosDB and Azure SQL Database running in Microsoft Azure, using both native tools and the Database migration service.

Monday, July 1, 2019 9:39:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, June 30, 2019

MickJaggerFrom a distance, you would swear he was half his 75 years. You would never guess he underwent heart surgery two months ago. Only the cracks in his face revealed Mick Jagger's age. Not his body, which gyrated and strutted and danced for 2 hours as the legendary Rolling Stones performed before an overflowing Soldier Field in Chicago Tuesday night.

Most viewers saw him from a distance in the cavernous stadium. But the energy was high, and the audience sang and danced along with the band. Mick, Keith Richards, Ron Wood, and Charlie Watts have been recording and touring together for decades. People may think of them as the new member's, but Darryl Jones (of south Chicago), who joined the band when bassist Bill Wyman retired in 1993; and Chuck Leavell, former Allman Brothers keyboardist, who has been with the band since 1982 also have a tenure longer than most bands exist. Yet, they are newbies when compared with thier septuagenarian teammates.

This was my first time seeing the Stones and it may be their last visit to Chicago. Now in their 50th year, this year's "No Filter" tour will take them to 13 cities in the U.S. And they chose to open in Chicago, after Jagger's illness forced them to reshuffle the tour schedule.

To the delight of the crowd, they heard many references to Chicago. Mick noted the band had played the city nearly 40 times. And he introduced the new Chicago mayor and governor, who were in attendance, noting that Governor Pritzker had signed legislation that day legalizing cannabis in January. "Some of you may have jumped the gun," he quipped.

Of course, the Rolling Stones drew heavily from their catalog of hit songs - from opening with "Jumpin' Jack Flash" and "It's Only Rock 'n' Roll" to their encores: "Gimme Shelter" and "Satisfaction". But they included a few deeper album tracks, like "Bitch" and "Slipping Away".

It was mostly an evening of high energy rock and roll and blues; but a highlight of the night was when Mick, Keith, Ron, and Charlie brought their instruments (including a small drum kit) to a platform that extended 30 yards out into the audience to play two acoustic numbers: "Play with Fire" and "Sweet Virginia".

When the evening ended, it felt like they had given all they had and all we needed.

After 50 years, the band knew every note by heart, but still brought energy and made us feel they were having a good time after all this time.

Sunday, June 30, 2019 9:45:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, June 29, 2019

Amsterdam (32)Last month, I visited Copenhagen, Denmark for the first time and thinking that I'd never seen a city as bicycle-friendly.

But Amsterdam has Copenhagen beat by far in this regard. During morning rush hour, bicycle commuters easily outnumber automobiles; and I was told that Amsterdam has more bicycles than people.

I was mostly on foot, but I kept my head on a swivel as I walked around town, looking one way for cars and in the other direction for bicycles each time I ventured across the street.

It was my first visit to Amsterdam, and I came to work at an OpenHack - workshop designed to teach a specific technology via problem-solving and hands-on experience. The OpenHack was a great success for everyone. Nearly all the feedback we received was positive and people seemed to appreciate my coaching and enjoyed a presentation I delivered on Azure Data Factory. In addition, I led two "Envisioning Sessions" - an exploration of a project a customer is considering that involves use of cloud technology.

Amsterdam (17)I arrived Sunday and spent most of the day resting before meeting up with Mike Amundsen, an old friend from Cincinnati, who happened to be in Amsterdam to speak at the GOTO conference.

The next two days consisted of hard work during the day, followed by dinners with my teammates in the evening. I made up for the rich food by walking around miles the city.

When the OpenHack wrapped up on Day 3, I headed over to the museum area and spent about an hour exploring the Stedelijk Museum of Modern Art, before meeting Brent and David at an Indonesian restaurant. The Indonesian food was amazing, consisting of samples of dozens of different dishes.

After dinner, Brent and I took a boat ride along the canals. It featured a pre-recorded guided tour of the city. Because the canals cover almost all of Amsterdam, we got to see nearly all the city in this way.

Amsterdam (39)Friday I had to myself, so I bought tickets to 2 museums: The Van Gogh Museum, which features the works of the famous Dutch painter, along with those who influenced him; and the Rijksmuseum, which primarily features works of classic European artists from the Middle Ages to the present. It has a particularly impressive Rembrandt collection.

Friday morning, I had a special treat as I learned that Austin Haeberle - a friend I grew up with - was arriving in Amsterdam the same morning I was flying home. We had not seen each other in 30 years, so we met at the airport for breakfast and picked up where we left off.

If I had more time and energy, I could have spent days just exploring museums. I did not see the house where Anne Frank famously hid from the Nazis and wrote in her diary, nor the National Maritime Museum. I also did not get a chance to view Amsterdam's (in)famous Red-Light district. When I return, I will try to visit these places, as well as the rest of Amsterdam, which is a small enough country that one could drive across it in a couple hours.

The point is that I would love to return.

Amsterdam (2)

More photos

Saturday, June 29, 2019 9:11:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, June 28, 2019

Azure Data Factory (ADF) is an example of an Extract, Transform, and Load (ETL) tool, meaning that it is designed to extract data from a source system, optionally transform its format, and load it into a different destination system.

The source and destination data can reside in different locations, in different data stores, and can support different data structures.

For example, you can extract data from an Azure SQL database and load it into an Azure Blob storage container.

To create a new Azure Data Factory, log into the Azure Portal, click the [Create a resource] button (Fig. 1) and select Integration | Data Factory from the menu, as shown in Fig. 2.

Fig. 1

Fig. 2

The "New data factory" blade displays, as shown in Fig. 3.

Fig. 3

At the "Name" field, enter a unique name for this Data Factory.

At the Subscription dropdown, select the subscription with which you want to associate this Data Factory. Most of you will only have one subscription, making this an easy choice.

At the "Resource Group" field, select an existing Resource Group or create a new Resource Group which will contain your Data Factory.

At the "Version" dropdown, select "V2".

At the "Location" dropdown, select the Azure region in which you want your Data Factory to reside. Consider the location of the data with which it will interact and try to keep the Data Factory close to this data, in order to reduce latency.

Check the "Enable GIT" checkbox, if you want to integrate your ETL code with a source control system.

After the Data Factory is created, you can search for it by name or within the Resource Group containing it. Fig. 4 shows the "Overview" blade of a Data Factory.

Fig. 4

To begin using the Data Factory, click the [Author & Monitor] button in the middle of the blade.

The "Azure Data Factory Getting Started" page displays in a new browser tab, as shown in Fig. 5.

Fig. 5

Click the [Copy Data] button (Fig. 6) to display, the "Copy Data" wizard, as shown in Fig. 7.

Fig. 6

Fig. 7

This wizard steps you through the process of creating a Pipeline and its associated artifacts. A Pipeline performs an ETL on a single source and destination and may be run on demand or on a schedule.

At the "Task name" field, enter a descriptive name to identify this pipeline later.

Optionally, you can add a description to your task.

You have the option to run the task on a regular or semi-regular schedule (Fig. 8); but you can set this later, so I prefer to select "Run once now" until I know it is working properly.

Fig. 8

Click the [Next] button to advance to the "Source data store" page, as shown in Fig. 9.

Fig. 9

Click the [+ Create new connection] button to display to the "New Linked Service" dialog, as shown in Fig. 10.


This dialog lists all the supported data stores.
At the top of the dialog is a search box and a set of links, which allow you to filter the list of data stores, as shown in Fig. 11.

Fig. 11

Fig. 12 shows the next dialog if you select Azure SQL Database as your data source.

Fig. 12

In this dialog, you can enter information specific to the database from which you are extracting data. When complete, click the [Test connection] button to verify your entries are correct; then click the [Finish] button to close the dialog.

After successfully creating a new connection, the connection appears in the "Source data store" page, as shown in Fig. 13.

Fig. 13

Click the [Next] button to advance to the next page in the wizard, which asks questions to specific to the type of data in your data source. Fig. 14 shows the page for Azure SQL databases, which allows you to select which tables to extract.

Fig. 14

Click the [Next] button to advance to the "Destination data store", as shown in Fig. 15.

Fig. 15

Click the [+ Create new connection] button to display the "New Linked Service" dialog, as shown in Fig. 16.

Fig. 16

As with the source data connection, you can filter this list via the search box and top links, as shown in Fig. 17. Here we are selecting Azure Data Lake Storage Gen2 as our destination data store.

Fig. 17

After selecting a service, click the [Continue] button to display a dialog requesting information about the data service you selected. Fig. 18 shows the page for Azure Data Lake. When complete, click the [Test connection] button to verify your entries are correct; then click the [Finish] button to close the dialog.

Fig. 18

After successfully creating a new connection, the connection appears in the "Destination data store" page, as shown in Fig. 19.

Fig. 19

Click the [Next] button to advance to the next page in the wizard, which asks questions to specific to the type of data in your data destination. Fig. 20 shows the page for Azure Data Lake, which allows you to select the destination folder and file name.

Fig. 20

Click the [Next] button to advance to the "File format settings" page, as shown in Fig. 21.

Fig. 21

At the "File format" dropdown, select a format in which to structure your output file. The prompts change depending on the format you select. Fig.  21 shows the prompts for a Text format file.

Complete the page and click the [Next] button to advance to the "Settings" page, as shown in Fig. 22.

Fig. 22

The important question here is "Fault tolerance". When an error occurs, do you want to abort the entire activity, skipping the remaining records or do you want to log the error, skip the bad record, and continue with the remaining records.

Click the [Next] button to advance to the "Summary" page as shown in Fig. 23.

Fig. 23

This page lists the selections you have made to this point. You may edit a section if you want to change any settings. When satisfied with your changes, click the [Next] button to kick off the activity and advance to the "Deployment complete" page, as shown in Fig. 24.

Fig. 24

You will see progress of the major steps in  this activity as they run. You can click the [Monitor] button to see a more detailed real-time progress report or you can click the [Finish] button to close the wizard.

In this article, you learned about the Azure Data Factory and how to create a new data factory with an activity to copy data from a source to a destination.

Friday, June 28, 2019 9:04:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, June 27, 2019

GCast 54:

Azure Storage Replication

Learn about the data replication options in Azure Storage and how to set the option appropriate for your needs.

Azure | Database | GCast | Screencast | Video
Thursday, June 27, 2019 4:16:00 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, June 26, 2019

Azure IoT Hub allows you to route incoming messages to specific endpoints without having to write any code.

Refer to previous articles (here, here, and here, to learn how to create an Azure IoT Hub and how to add a device to that hub.

To perform automatic routing, you must

  1. Create an endpoint
  2. Create and configure a route that points to that endpoint
  3. Specify the criteria to invoke that route

Navigate to the Azure Portal and log in.

Open your IoT Hub, as shown in Fig. 1.

Fig. 1

Click the [Message routing] button (Fig. 2) under the "Messaging" section to open the "Routing" tab, as shown in Fig. 3

Fig. 2

Fig. 3

Click the [Add] button to open the "Add a route" blade, as shown in Fig. 4.

Fig. 4

At the "Name" field, enter a name for your route. I like to use something descripting, like "SendAllMessagesToBlobContainer".

At the "Endpoint" field, you can select an existing endpoint to which to send messages. An Endpoint is a destination to send any messages that meet the specified criteria. By default, only the "Events" endpoint exists. For a new hub, you will probably want to create a new endpoint. To create a new endpoint, click the [Add] button. This displays the "Add Endpoint" dialog, as shown in Fig. 5.

Fig. 5

At the "Endpoint" dropdown, select the type of endpoint you want to create. Fig. 6 shows the "Add a storage endpoint" dialog that displays if you select "Blob Storage".

Fig. 6

At the "Endpoint name", enter a descriptive name for the new endpoint.

Click the [Pick a container] button to display a list of Storage accounts, as shown in Fig. 7.

Fig. 7

Select an existing storage account or click the [+ Storage account] button to create a new one. After you select a storage account, the "Containers" dialog displays, listing all blob containers in the selected storage account, as shown in Fig. 8.

Fig. 8

Select an existing container or click the [+Container] button to create a new container. Messages matching the specified criteria will be stored in this blob container.

Back at the "Add a storage endpoint" dialog (Fig. 6), you have options to set the Batch frequency, Chunk size window, and Blob file name format.

Multiple blob messages are bundled together into a single blob.

The Batch frequency determines how frequently messages get bundled together. Lowering this value decreases latency; but doing so creates more files and requires more compute resources.

Chunk size window sets the maximum size of a blob. If a bundle of messages would exceed this value, the messages will be split into separate blobs.

The Blob file name format allows you to specify the name and folder structure of the blob. Each value within curly braces ({}) represents a variable. Each of the variables shown is required, but you can reorder them or remove slashes to change folders into file name parts or add more to the name, such as a file extension.

Click the [Create] button to create the endpoint and return to the "Add a route" blade, as shown in Fig. 9.

Fig. 9

At the "Endpoint" dropdown, select the endpoint you just created.

At the "Data source" dropdown, you can select exactly what data gets routed to the endpoint. Choices are "Device Telemetry Messages"; "Device Twin Change Events"; and "Device Lifecycle Events".

The "Routing query" field allows you to specify the conditions under which messages will be routed to this endpoint.

If you leave this value as 'true', all messages will be routed to the specified endpoint.

But you can filter which messages are routed by entering something else in the "Routing query" field. Query syntax is described here.

Click the [Save] button to create this route.

In this article, you learned how to perform automatic routing for an Azure IoT Hub.

Wednesday, June 26, 2019 8:55:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, June 25, 2019

Data Lake storage is a type of Azure Storage that supports a hierarchical structure.

There are no pre-defined schemas in a Data Lake, so you have a lot of flexibility on the type of data you want to store. You can store structured data or unstructured data or both. In fact, you can store data of different data types and structures in the same Data Lake.

Typically a Data Lake is used for ingesting raw data in order to preserve that data in its original format. The low cost, lack of schema enforcement, and optimization for inserts make it ideal for this. From the Microsoft docs: "The idea with a data lake is to store everything in its original, untransformed state."

After saving the raw data, you can then use ETL tools, such as SSIS or Azure Data Factory to copy and/or transform this data in a more usable format in another location.

Like most solutions in Azure, it is inherently highly scalable and highly reliable.

Data in Azure Data Lake is stored in a Data Lake Store.

Under the hood, a Data Lake Store is simply an Azure Storage account with some specific properties set.

To create a new Data Lake storage account, navigate to the Azure Portal, log in, and click the [Create a Resource] button (Fig.1).

Fig. 1

From the menu, select Storage | Storage Account, as shown in Fig. 2.

Fig. 2

The "Create Storage Account" dialog with the "Basic" tab selected displays, as shown in Fig. 3.

Fig. 3

At the “Subscription” dropdown, select the subscription with which you want to associate this account. Most of you will have only one subscription.

At the "Resource group" field, select a resource group in which to store your service or click "Create new" to store it in a newly-created resource group. A resource group is a logical container for Azure resources.

At the "Storage account name" field, enter a unique name for the storage account.

At the "Location" field, select the Azure Region in which to store this service. Consider where the users of this service will be, so you can reduce latency.

At the "Performance" field, select the "Standard" radio button. You can select the "Premium" performance button to achieve faster reads; however, there may be better ways to store your data if performance is your primary objective.

At the "Account kind" field, select "Storage V2"

At the "Replication" dropdown, select your preferred replication. Replication is explained here.

At the "Access tier" field, select the "Hot" radio button.

Click the [Next: Advanced>] button to advance to the "Advanced" tab, as shown in Fig. 4.

Fig. 4

The important field on this tab is "Hierarchical namespace". Select the "Enabled" radio button at this field.

Click the [Review + Create] button to advance to the "Review + Create" tab, as shown in Fig. 5.

Fig. 5

Verify all the information on this tab; then click the [Create] button to begin creating the Data Lake Store.

After a minute or so, a storage account is created. Navigate to this storage account and click the [Data Lake Gen2 file systems] button, as shown in Fig. 6.

Fig. 6

The "File Systems" blade displays, as shown in Fig. 7.

Fig. 7

Data Lake data is partitioned into file systems, so you must create at least one file system. Click the [+ File System] button and enter a name for the file system you wish to create, as shown in Fig. 8.

Fig. 8

Click the [OK] to add  this file system and close the dialog. The newly-created file system displays, as shown in Fig. 9.

Fig. 9

If you double-click the file system in the list, a page displays where you can set access control and read about how to manage the files in this Data Lake Storage, as shown in Fig. 10

Fig. 10

In this article, you learned how to create a Data Lake Storage and a file system within it.

Tuesday, June 25, 2019 10:10:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 24, 2019

Episode 569

John Alexander on ML.NET

John Alexander describes how .NET developers can use ML.NET to build and consume Machine Learning solutions.

Monday, June 24, 2019 9:01:00 AM (GMT Daylight Time, UTC+01:00)