# Friday, June 28, 2019

Azure Data Factory (ADF) is an example of an Extract, Transform, and Load (ETL) tool, meaning that it is designed to extract data from a source system, optionally transform its format, and load it into a different destination system.

The source and destination data can reside in different locations, in different data stores, and can support different data structures.

For example, you can extract data from an Azure SQL database and load it into an Azure Blob storage container.

To create a new Azure Data Factory, log into the Azure Portal, click the [Create a resource] button (Fig. 1) and select Integration | Data Factory from the menu, as shown in Fig. 2.

df01-CreateResource
Fig. 1

df02-IntegrationDataFactory
Fig. 2

The "New data factory" blade displays, as shown in Fig. 3.

df03-NewDataFactory
Fig. 3

At the "Name" field, enter a unique name for this Data Factory.

At the Subscription dropdown, select the subscription with which you want to associate this Data Factory. Most of you will only have one subscription, making this an easy choice.

At the "Resource Group" field, select an existing Resource Group or create a new Resource Group which will contain your Data Factory.

At the "Version" dropdown, select "V2".

At the "Location" dropdown, select the Azure region in which you want your Data Factory to reside. Consider the location of the data with which it will interact and try to keep the Data Factory close to this data, in order to reduce latency.

Check the "Enable GIT" checkbox, if you want to integrate your ETL code with a source control system.

After the Data Factory is created, you can search for it by name or within the Resource Group containing it. Fig. 4 shows the "Overview" blade of a Data Factory.

df04-OverviewBlade
Fig. 4

To begin using the Data Factory, click the [Author & Monitor] button in the middle of the blade.

The "Azure Data Factory Getting Started" page displays in a new browser tab, as shown in Fig. 5.

df05-GetStarted
Fig. 5

Click the [Copy Data] button (Fig. 6) to display, the "Copy Data" wizard, as shown in Fig. 7.

df06-CopyDataIcon
Fig. 6

df07-Properties
Fig. 7

This wizard steps you through the process of creating a Pipeline and its associated artifacts. A Pipeline performs an ETL on a single source and destination and may be run on demand or on a schedule.

At the "Task name" field, enter a descriptive name to identify this pipeline later.

Optionally, you can add a description to your task.

You have the option to run the task on a regular or semi-regular schedule (Fig. 8); but you can set this later, so I prefer to select "Run once now" until I know it is working properly.

df08-Schedule
Fig. 8

Click the [Next] button to advance to the "Source data store" page, as shown in Fig. 9.

df09-Source
Fig. 9

Click the [+ Create new connection] button to display to the "New Linked Service" dialog, as shown in Fig. 10.

df10-NewLinkedService
Fig.10

This dialog lists all the supported data stores.
At the top of the dialog is a search box and a set of links, which allow you to filter the list of data stores, as shown in Fig. 11.

df11-AzureSql
Fig. 11

Fig. 12 shows the next dialog if you select Azure SQL Database as your data source.

df12-AzureSqlDetails
Fig. 12

In this dialog, you can enter information specific to the database from which you are extracting data. When complete, click the [Test connection] button to verify your entries are correct; then click the [Finish] button to close the dialog.

After successfully creating a new connection, the connection appears in the "Source data store" page, as shown in Fig. 13.

df13-Source
Fig. 13

Click the [Next] button to advance to the next page in the wizard, which asks questions to specific to the type of data in your data source. Fig. 14 shows the page for Azure SQL databases, which allows you to select which tables to extract.

df14-SelectTables
Fig. 14

Click the [Next] button to advance to the "Destination data store", as shown in Fig. 15.

df15-Destination
Fig. 15

Click the [+ Create new connection] button to display the "New Linked Service" dialog, as shown in Fig. 16.

df16-NewLinkedService
Fig. 16

As with the source data connection, you can filter this list via the search box and top links, as shown in Fig. 17. Here we are selecting Azure Data Lake Storage Gen2 as our destination data store.

df17-NewLinkedService-ADL
Fig. 17

After selecting a service, click the [Continue] button to display a dialog requesting information about the data service you selected. Fig. 18 shows the page for Azure Data Lake. When complete, click the [Test connection] button to verify your entries are correct; then click the [Finish] button to close the dialog.

df18-ADLDetails
Fig. 18

After successfully creating a new connection, the connection appears in the "Destination data store" page, as shown in Fig. 19.

df19-Destination
Fig. 19

Click the [Next] button to advance to the next page in the wizard, which asks questions to specific to the type of data in your data destination. Fig. 20 shows the page for Azure Data Lake, which allows you to select the destination folder and file name.

df20-ChooseOutput
Fig. 20

Click the [Next] button to advance to the "File format settings" page, as shown in Fig. 21.

df21-FileFormatSettings
Fig. 21

At the "File format" dropdown, select a format in which to structure your output file. The prompts change depending on the format you select. Fig.  21 shows the prompts for a Text format file.

Complete the page and click the [Next] button to advance to the "Settings" page, as shown in Fig. 22.

df22-Settings
Fig. 22

The important question here is "Fault tolerance". When an error occurs, do you want to abort the entire activity, skipping the remaining records or do you want to log the error, skip the bad record, and continue with the remaining records.

Click the [Next] button to advance to the "Summary" page as shown in Fig. 23.

df23-Summary
Fig. 23

This page lists the selections you have made to this point. You may edit a section if you want to change any settings. When satisfied with your changes, click the [Next] button to kick off the activity and advance to the "Deployment complete" page, as shown in Fig. 24.

df24-DeploymentComplete
Fig. 24

You will see progress of the major steps in  this activity as they run. You can click the [Monitor] button to see a more detailed real-time progress report or you can click the [Finish] button to close the wizard.

In this article, you learned about the Azure Data Factory and how to create a new data factory with an activity to copy data from a source to a destination.

Friday, June 28, 2019 9:04:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, June 27, 2019

GCast 54:

Azure Storage Replication

Learn about the data replication options in Azure Storage and how to set the option appropriate for your needs.

Azure | Database | GCast | Screencast | Video
Thursday, June 27, 2019 4:16:00 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, June 26, 2019

Azure IoT Hub allows you to route incoming messages to specific endpoints without having to write any code.

Refer to previous articles (here, here, and here, to learn how to create an Azure IoT Hub and how to add a device to that hub.

To perform automatic routing, you must

  1. Create an endpoint
  2. Create and configure a route that points to that endpoint
  3. Specify the criteria to invoke that route

Navigate to the Azure Portal and log in.

Open your IoT Hub, as shown in Fig. 1.

ir01-IotHubOverviewBlade
Fig. 1

Click the [Message routing] button (Fig. 2) under the "Messaging" section to open the "Routing" tab, as shown in Fig. 3

ir02-RoutingButton
Fig. 2

ir03-RoutingBlade
Fig. 3

Click the [Add] button to open the "Add a route" blade, as shown in Fig. 4.

ir04-AddRouteBlade
Fig. 4

At the "Name" field, enter a name for your route. I like to use something descripting, like "SendAllMessagesToBlobContainer".

At the "Endpoint" field, you can select an existing endpoint to which to send messages. An Endpoint is a destination to send any messages that meet the specified criteria. By default, only the "Events" endpoint exists. For a new hub, you will probably want to create a new endpoint. To create a new endpoint, click the [Add] button. This displays the "Add Endpoint" dialog, as shown in Fig. 5.

ir05-AddEndpoint
Fig. 5

At the "Endpoint" dropdown, select the type of endpoint you want to create. Fig. 6 shows the "Add a storage endpoint" dialog that displays if you select "Blob Storage".

ir06-AddStorageEndpointBlade
Fig. 6

At the "Endpoint name", enter a descriptive name for the new endpoint.

Click the [Pick a container] button to display a list of Storage accounts, as shown in Fig. 7.

ir07-PickStorageAccount
Fig. 7

Select an existing storage account or click the [+ Storage account] button to create a new one. After you select a storage account, the "Containers" dialog displays, listing all blob containers in the selected storage account, as shown in Fig. 8.

ir08-PickContainer
Fig. 8

Select an existing container or click the [+Container] button to create a new container. Messages matching the specified criteria will be stored in this blob container.

Back at the "Add a storage endpoint" dialog (Fig. 6), you have options to set the Batch frequency, Chunk size window, and Blob file name format.

Multiple blob messages are bundled together into a single blob.

The Batch frequency determines how frequently messages get bundled together. Lowering this value decreases latency; but doing so creates more files and requires more compute resources.

Chunk size window sets the maximum size of a blob. If a bundle of messages would exceed this value, the messages will be split into separate blobs.

The Blob file name format allows you to specify the name and folder structure of the blob. Each value within curly braces ({}) represents a variable. Each of the variables shown is required, but you can reorder them or remove slashes to change folders into file name parts or add more to the name, such as a file extension.

Click the [Create] button to create the endpoint and return to the "Add a route" blade, as shown in Fig. 9.

ir09-SaveRoute
Fig. 9

At the "Endpoint" dropdown, select the endpoint you just created.

At the "Data source" dropdown, you can select exactly what data gets routed to the endpoint. Choices are "Device Telemetry Messages"; "Device Twin Change Events"; and "Device Lifecycle Events".

The "Routing query" field allows you to specify the conditions under which messages will be routed to this endpoint.

If you leave this value as 'true', all messages will be routed to the specified endpoint.

But you can filter which messages are routed by entering something else in the "Routing query" field. Query syntax is described here.

Click the [Save] button to create this route.

In this article, you learned how to perform automatic routing for an Azure IoT Hub.

IoT
Wednesday, June 26, 2019 8:55:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, June 25, 2019

Data Lake storage is a type of Azure Storage that supports a hierarchical structure.

There are no pre-defined schemas in a Data Lake, so you have a lot of flexibility on the type of data you want to store. You can store structured data or unstructured data or both. In fact, you can store data of different data types and structures in the same Data Lake.

Typically a Data Lake is used for ingesting raw data in order to preserve that data in its original format. The low cost, lack of schema enforcement, and optimization for inserts make it ideal for this. From the Microsoft docs: "The idea with a data lake is to store everything in its original, untransformed state."

After saving the raw data, you can then use ETL tools, such as SSIS or Azure Data Factory to copy and/or transform this data in a more usable format in another location.

Like most solutions in Azure, it is inherently highly scalable and highly reliable.

Data in Azure Data Lake is stored in a Data Lake Store.

Under the hood, a Data Lake Store is simply an Azure Storage account with some specific properties set.

To create a new Data Lake storage account, navigate to the Azure Portal, log in, and click the [Create a Resource] button (Fig.1).

dl01-CreateResource
Fig. 1

From the menu, select Storage | Storage Account, as shown in Fig. 2.

dl02-MenuStorageAccount
Fig. 2

The "Create Storage Account" dialog with the "Basic" tab selected displays, as shown in Fig. 3.

dl03-Basics
Fig. 3

At the “Subscription” dropdown, select the subscription with which you want to associate this account. Most of you will have only one subscription.

At the "Resource group" field, select a resource group in which to store your service or click "Create new" to store it in a newly-created resource group. A resource group is a logical container for Azure resources.

At the "Storage account name" field, enter a unique name for the storage account.

At the "Location" field, select the Azure Region in which to store this service. Consider where the users of this service will be, so you can reduce latency.

At the "Performance" field, select the "Standard" radio button. You can select the "Premium" performance button to achieve faster reads; however, there may be better ways to store your data if performance is your primary objective.

At the "Account kind" field, select "Storage V2"

At the "Replication" dropdown, select your preferred replication. Replication is explained here.

At the "Access tier" field, select the "Hot" radio button.

Click the [Next: Advanced>] button to advance to the "Advanced" tab, as shown in Fig. 4.

dl04-Advanced
Fig. 4

The important field on this tab is "Hierarchical namespace". Select the "Enabled" radio button at this field.

Click the [Review + Create] button to advance to the "Review + Create" tab, as shown in Fig. 5.

dl05-Review
Fig. 5

Verify all the information on this tab; then click the [Create] button to begin creating the Data Lake Store.

After a minute or so, a storage account is created. Navigate to this storage account and click the [Data Lake Gen2 file systems] button, as shown in Fig. 6.

dl06-Services
Fig. 6

The "File Systems" blade displays, as shown in Fig. 7.

dl07-FileSystem
Fig. 7

Data Lake data is partitioned into file systems, so you must create at least one file system. Click the [+ File System] button and enter a name for the file system you wish to create, as shown in Fig. 8.

dl08-AddFileSystem
Fig. 8

Click the [OK] to add  this file system and close the dialog. The newly-created file system displays, as shown in Fig. 9.

dl09-FileSystem
Fig. 9

If you double-click the file system in the list, a page displays where you can set access control and read about how to manage the files in this Data Lake Storage, as shown in Fig. 10

dl10-FileSystem
Fig. 10

In this article, you learned how to create a Data Lake Storage and a file system within it.

Tuesday, June 25, 2019 10:10:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 24, 2019

Episode 569

John Alexander on ML.NET

John Alexander describes how .NET developers can use ML.NET to build and consume Machine Learning solutions.

Monday, June 24, 2019 9:01:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, June 23, 2019

Frank and April Wheeler were living the 1950s American dream. Frank had a steady - if unfulfilling - job in New York City, April was the attractive wife he always wanted, and they owned a large home in a quiet neighborhood in suburban New Jersey.

But, like nearly all their neighbors, the Wheelers were far from happy.

They were bored suburbanites, working dead-end jobs, in loveless marriages, talking about their dreams.

They talked of how they didn't belong - of how they were so much better than the rest of the sheep who surrendered to the conformity of the world. But they take no action to correct their circumstances. The fact is that they are not as much "better" as they believe.

April suggests that the Wheelers move to Paris and start a new life, so that Frank can explore his potential. But Frank is not interested in his potential or in self-exploration. He likes the low expectations that come with his job. And, when he is given an opportunity at a promotion, he leaps at the chance.

Frank and April are self-aware enough to believe they are superior to their neighbors and co-workers, but not self-aware enough to realize they are not. They either don't know themselves or they refuse to see themselves.

They are under the illusion that their problems are easily fixable - move to Paris; get a promotion; have an affair. New flash: They are not.

Instead they continue their pretentious life of drunken lunches and adultery and deluding themselves that they are destined for more. No one takes responsibility for his or her own actions, choosing instead to blame others or the expectations of society.

The only honest person in the book is John Givings, a son of the Wheelers' neighbors, who has been literally certified insane and institutionalized. But John is so shockingly rude that it's difficult for anyone to listen to him or to take him seriously.

Inevitably, the story ends in tragedy, with no lessons learned and everyone continuing to face their troubles alone.

Don't read Revolutionary Road by Richard Yates to feel good about yourself. Read it as a warning about buying too much into the American dream. The sad part is how relevant this warning feels today.

Sunday, June 23, 2019 7:29:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, June 22, 2019

NeverLetMeGoIt isn't obvious until well into Never Let Me Go by Kazuo Ishiguro that this is a story of a dystopian society. Ishiguro drops hints throughout the story, slowly revealing the situation in which the characters find themselves. Words like "donations", "Possible", and "Completion" are introduced, and we know they have some mysterious meaning, but are not told that meaning until much later.

Kathy H is a 31-year-old “Carer” looking back on her life - particularly her time at Hailsham - a boarding school in rural England. Life is good at Hailsham, but the students are secluded and are given almost no knowledge of the outside world, other than being told they will someday have a special place in it.

Everyone has a name like "Kathy H" or "Tommy D". At first, I thought this was a literary device, with the author pretending to protect identities; but, on reflection, I think the students were not given last names one more way to dehumanize them.

Never Let Me Go is a story of false hope; of what it means to be human and to have a soul; and of how much control each of us has over our destiny. It is told in a believable manner in a world not very different from ours and referencing technology that does not sound far-fetched.

It is a dystopian nightmare, disguised as a coming-of-age story.

Saturday, June 22, 2019 9:56:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, June 20, 2019

GCast 53:

Creating a Data Warehouse in Azure

Learn how to create a new SQL Sever data warehouse in Microsoft Azure.

Thursday, June 20, 2019 9:24:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, June 18, 2019

CTRL+V has been in Windows since the beginning: After copying something to the Windows clipboard (via CTRL+C or some other method), hold down the CTRL key and press V to insert that something at the current cursor location.

But I learned today about a new feature: WINDOWS + V.

Hold down the WINDOWS key (Fig. 1) and press V.

wv01-WindowsKey
Fig. 1

This will bring up a context menu, listing the last few items added to the clipboard, as shown in Fig. 2.

wv02-ContextMenu
Fig. 2

You can then select from this list which item to insert at the current cursor position.

The context menu even lists the time the item was added to the clipboard.

This is useful if you need to copy several items before pasting them. But, the most useful use case is when you accidentally copy something to the clipboard without thinking you might overwrite a previous item copied there. Now you have some time to still use that previously overwritten item.

I'm unclear how long items stay in this clipboard list, but I like this advantage.

Tuesday, June 18, 2019 2:11:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 17, 2019

Episode 568

Heather Wilde on Anticipatory Design

Heather Wilde discusses how to machine learning with user interfaces and user experience to craft a more personalized experience between a person and the products and services they use.

https://twitter.com/heathriel

Monday, June 17, 2019 8:21:00 AM (GMT Daylight Time, UTC+01:00)