# Tuesday, June 25, 2019

Data Lake storage is a type of Azure Storage that supports a hierarchical structure.

There are no pre-defined schemas in a Data Lake, so you have a lot of flexibility on the type of data you want to store. You can store structured data or unstructured data or both. In fact, you can store data of different data types and structures in the same Data Lake.

Typically a Data Lake is used for ingesting raw data in order to preserve that data in its original format. The low cost, lack of schema enforcement, and optimization for inserts make it ideal for this. From the Microsoft docs: "The idea with a data lake is to store everything in its original, untransformed state."

After saving the raw data, you can then use ETL tools, such as SSIS or Azure Data Factory to copy and/or transform this data in a more usable format in another location.

Like most solutions in Azure, it is inherently highly scalable and highly reliable.

Data in Azure Data Lake is stored in a Data Lake Store.

Under the hood, a Data Lake Store is simply an Azure Storage account with some specific properties set.

To create a new Data Lake storage account, navigate to the Azure Portal, log in, and click the [Create a Resource] button (Fig.1).

dl01-CreateResource
Fig. 1

From the menu, select Storage | Storage Account, as shown in Fig. 2.

dl02-MenuStorageAccount
Fig. 2

The "Create Storage Account" dialog with the "Basic" tab selected displays, as shown in Fig. 3.

dl03-Basics
Fig. 3

At the “Subscription” dropdown, select the subscription with which you want to associate this account. Most of you will have only one subscription.

At the "Resource group" field, select a resource group in which to store your service or click "Create new" to store it in a newly-created resource group. A resource group is a logical container for Azure resources.

At the "Storage account name" field, enter a unique name for the storage account.

At the "Location" field, select the Azure Region in which to store this service. Consider where the users of this service will be, so you can reduce latency.

At the "Performance" field, select the "Standard" radio button. You can select the "Premium" performance button to achieve faster reads; however, there may be better ways to store your data if performance is your primary objective.

At the "Account kind" field, select "Storage V2"

At the "Replication" dropdown, select your preferred replication. Replication is explained here.

At the "Access tier" field, select the "Hot" radio button.

Click the [Next: Advanced>] button to advance to the "Advanced" tab, as shown in Fig. 4.

dl04-Advanced
Fig. 4

The important field on this tab is "Hierarchical namespace". Select the "Enabled" radio button at this field.

Click the [Review + Create] button to advance to the "Review + Create" tab, as shown in Fig. 5.

dl05-Review
Fig. 5

Verify all the information on this tab; then click the [Create] button to begin creating the Data Lake Store.

After a minute or so, a storage account is created. Navigate to this storage account and click the [Data Lake Gen2 file systems] button, as shown in Fig. 6.

dl06-Services
Fig. 6

The "File Systems" blade displays, as shown in Fig. 7.

dl07-FileSystem
Fig. 7

Data Lake data is partitioned into file systems, so you must create at least one file system. Click the [+ File System] button and enter a name for the file system you wish to create, as shown in Fig. 8.

dl08-AddFileSystem
Fig. 8

Click the [OK] to add  this file system and close the dialog. The newly-created file system displays, as shown in Fig. 9.

dl09-FileSystem
Fig. 9

If you double-click the file system in the list, a page displays where you can set access control and read about how to manage the files in this Data Lake Storage, as shown in Fig. 10

dl10-FileSystem
Fig. 10

In this article, you learned how to create a Data Lake Storage and a file system within it.

Tuesday, June 25, 2019 10:10:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 24, 2019

Episode 569

John Alexander on ML.NET

John Alexander describes how .NET developers can use ML.NET to build and consume Machine Learning solutions.

Monday, June 24, 2019 9:01:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, June 23, 2019

Frank and April Wheeler were living the 1950s American dream. Frank had a steady - if unfulfilling - job in New York City, April was the attractive wife he always wanted, and they owned a large home in a quiet neighborhood in suburban New Jersey.

But, like nearly all their neighbors, the Wheelers were far from happy.

They were bored suburbanites, working dead-end jobs, in loveless marriages, talking about their dreams.

They talked of how they didn't belong - of how they were so much better than the rest of the sheep who surrendered to the conformity of the world. But they take no action to correct their circumstances. The fact is that they are not as much "better" as they believe.

April suggests that the Wheelers move to Paris and start a new life, so that Frank can explore his potential. But Frank is not interested in his potential or in self-exploration. He likes the low expectations that come with his job. And, when he is given an opportunity at a promotion, he leaps at the chance.

Frank and April are self-aware enough to believe they are superior to their neighbors and co-workers, but not self-aware enough to realize they are not. They either don't know themselves or they refuse to see themselves.

They are under the illusion that their problems are easily fixable - move to Paris; get a promotion; have an affair. New flash: They are not.

Instead they continue their pretentious life of drunken lunches and adultery and deluding themselves that they are destined for more. No one takes responsibility for his or her own actions, choosing instead to blame others or the expectations of society.

The only honest person in the book is John Givings, a son of the Wheelers' neighbors, who has been literally certified insane and institutionalized. But John is so shockingly rude that it's difficult for anyone to listen to him or to take him seriously.

Inevitably, the story ends in tragedy, with no lessons learned and everyone continuing to face their troubles alone.

Don't read Revolutionary Road by Richard Yates to feel good about yourself. Read it as a warning about buying too much into the American dream. The sad part is how relevant this warning feels today.

Sunday, June 23, 2019 7:29:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, June 22, 2019

NeverLetMeGoIt isn't obvious until well into Never Let Me Go by Kazuo Ishiguro that this is a story of a dystopian society. Ishiguro drops hints throughout the story, slowly revealing the situation in which the characters find themselves. Words like "donations", "Possible", and "Completion" are introduced, and we know they have some mysterious meaning, but are not told that meaning until much later.

Kathy H is a 31-year-old “Carer” looking back on her life - particularly her time at Hailsham - a boarding school in rural England. Life is good at Hailsham, but the students are secluded and are given almost no knowledge of the outside world, other than being told they will someday have a special place in it.

Everyone has a name like "Kathy H" or "Tommy D". At first, I thought this was a literary device, with the author pretending to protect identities; but, on reflection, I think the students were not given last names one more way to dehumanize them.

Never Let Me Go is a story of false hope; of what it means to be human and to have a soul; and of how much control each of us has over our destiny. It is told in a believable manner in a world not very different from ours and referencing technology that does not sound far-fetched.

It is a dystopian nightmare, disguised as a coming-of-age story.

Saturday, June 22, 2019 9:56:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, June 20, 2019

GCast 53:

Creating a Data Warehouse in Azure

Learn how to create a new SQL Sever data warehouse in Microsoft Azure.

Thursday, June 20, 2019 9:24:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, June 18, 2019

CTRL+V has been in Windows since the beginning: After copying something to the Windows clipboard (via CTRL+C or some other method), hold down the CTRL key and press V to insert that something at the current cursor location.

But I learned today about a new feature: WINDOWS + V.

Hold down the WINDOWS key (Fig. 1) and press V.

wv01-WindowsKey
Fig. 1

This will bring up a context menu, listing the last few items added to the clipboard, as shown in Fig. 2.

wv02-ContextMenu
Fig. 2

You can then select from this list which item to insert at the current cursor position.

The context menu even lists the time the item was added to the clipboard.

This is useful if you need to copy several items before pasting them. But, the most useful use case is when you accidentally copy something to the clipboard without thinking you might overwrite a previous item copied there. Now you have some time to still use that previously overwritten item.

I'm unclear how long items stay in this clipboard list, but I like this advantage.

Tuesday, June 18, 2019 2:11:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 17, 2019

Episode 568

Heather Wilde on Anticipatory Design

Heather Wilde discusses how to machine learning with user interfaces and user experience to craft a more personalized experience between a person and the products and services they use.

https://twitter.com/heathriel

Monday, June 17, 2019 8:21:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, June 16, 2019

ArrowOfGodArrow of God is part of Chinua Achebe's African Trilogy. Although it is Achebe's third novel, chronologically, it is second in the trilogy. The first book - Things Fall Apart - took place as the English colonizers were arriving in west Africa; book 2 - No Longer At Ease - takes place near the end of the colonization period; and "Arrow of God"'s story happens in the 1920's - at the height of European colonization.

Unlike the first two books, this one does not focus on Okonkwo or his descendants.
Arrow of God tells the story of Ezeulu, a high priest of the god Ulu in colonial Nigeria. Ezeulu's tribe goes to war with another tribe - a result of a perceived insult and a dispute over land ownership. The British governor steps in, ends the conflict, and burns everyone's guns. It is this assertion of control by the British over the local population that is at the heart of the novel.

The British are in Africa to implement their legal system and their culture and their religion on the native population and they use any means to do so. They implement "Local Rule", installing an African in a position of authority, but controlling that man so he works in their interests.

The British government bureaucracy makes them terribly inefficient, making their job harder; but the infighting among individuals, villages, and tribes of the Africans makes them easier to manipulate. As part of England's efforts at influencing the local population, missionaries are trying to convert the natives to Christianity. Many resist this change in culture; but they are aided when a failure of the yam crop leads to famine and praying to the god Ulu does not help. The religious conversions are a microcosm of the colonizers' efforts to implement their culture on the Africans, subsuming the existing culture.

Ezeulu stands in the center of it all. He wants to better understand the British and sends his son to work with them to gain more information; but he rejects their offer to make him a Local Ruler. Still, his own people are suspicious of his motives, feeling he has given too much to the white invaders. There is conflict between Ezeulu and

Arrow of God is the story of conflict and influence; of people trying to hold onto their traditions in the face of a tidal wave of change; and of the hope on which people place their faith; of how culture was supplanted in a short time. It is the story of toe loss of the Igbo cultural identity.

Achebe was probably the first English language writer to portray events in Africa from the point of view of the Africans.

The book is told in an eloquent style, peppered with many African proverbs. This one sums it up best:

"When brothers fight to death a stranger inherits their father's estate"

Sunday, June 16, 2019 9:06:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, June 15, 2019

ShelteringSkySometimes, I dream about traveling to new places with no responsibilities and no worries about money.

The Sheltering Sky by Paul Bowles may have cured me of that dream.

"The Sheltering Sky" tells the story of 3 idle rich people - husband and wife, Port and Kit, along with their friend Tunner - who decide to explore north Africa after World War II. They don't have a plan and so drift from city to city seeing the sites and trying to overcome their boredom. They insist this makes them "travelers", rather than "tourists". They drift through a land of sweltering temperatures and dust storms and bedbugs and lice and biting flies.

The story turns tragic as the trio splits up, Port falls gravely ill, and Kit becomes lost in the desert. But this book isn't defined by plot twists or great character developments. It is an existential tale about people who appear to be drifting through life, with no purpose and no direction. What did this trio contribute to the world on their travels? What did they gain for themselves? What meaning did their lives have? They are searching for something, but even they don't know what that is. And none of them ever finds it.

Every character seems bent on self-destruction in this story. Almost everyone we meet is unlikable - from the 3 main characters to the constantly-arguing-and-probably-incestuous young man and his mother they encounter to a traveler who kidnaps and rapes Kit before adding her to his harem. Port and Kit are each unfaithful to one another during their trip, but only Kit feels any guilt about it.

Some will take offense at Kit's reaction to her rape (she accepts it and even comes to enjoy it), but I saw this as evidence of her decreasing mental stability.

If you are looking for a good travelogue of Algeria and Morocco, skip this one. If you want a commentary on the state of the human condition, this is a pretty good one.

Saturday, June 15, 2019 9:46:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, June 14, 2019

In the last few articles, I introduced the OCR Service of the Cognitive Services Computer Vision API. The OCR service is a general-purpose tool for detecting text in an image. But this tool is only useful if you want to do something with that text. Often it is easier to figure out how to process recognized text if you know something about the image.

Enter receipt-api - an open source project that builds on the Cognitive Services API to recognize information in a store receipt.

You can download this project at https://github.com/nzregs/receipt-api.

Compile and run it in Visual Studio and you have a web service that you can call by submitting an HTTP POST to the following URL:

http://localhost:xxxxx/api/values

where xxxxx is the port number on which the web service is running. You can spot this port easily because a browser launches when the app runs and the port number is in the URL, as shown in Fig. 1.

ra01-Browser
Fig. 1

Before Testing

In order to test the API, you will need an image of a receipt. You can take a photo with your phone and copy it to your computer.

You will also need to create a Computer Vision service in Azure, as described here.

Finally, you will also need to make a change to the receipt-api project. Open the solution and rename Sample-Secrets_cs.txt to Secrets.cs. The code in this file is shown in listing 1.

Listing 1:

public class Secrets

{
    // rename this file to Secrets.cs
    // update the constants below with your API Key and API Endpoint
    public const string apikey = "28737opek;jwlbjksgui3y2[pik";
    public const string apiendpoint_ocr = @"https://australiaeast.api.cognitive.microsoft.com/vision/v1.0/ocr";
}
  

Replace the key and api endpoint with the key and endpoint in the service associated with your cognitive service.

Testing the Receipt API

A simple way to test any API is with Postman, a free tool available at https://www.getpostman.com/.

Download, install, and run Postman.

With the receipt-api service running, create a new request in Postman consisting of a POST to the receipt-api service URL, as shown in Fig. 2.

ra02-Postman
Fig. 2

On the "Headers" tab, add a header row with NAME = Content-Type and VALUE = application/octet-stream, as shown in Fig. 3.

ra03-Headers
Fig. 3

On the "Body" tab, click the [Select File] button and select the photo of the receipt from your computer, as shown in Fig. 4.

ra04-Body
Fig. 4

Click the [Send] and wait for a response to appear. If all goes well, you will see something like Fig. 5.

ra05-Response
Fig. 5

If you have used the OCR service, you will notice that this response looks identical to the response from that service. But scroll to the bottom, as shown in Fig. 6 and you will see information specific to receipts.

ra06-Response
Fig. 6

This is from the receipt shown in Fig. 7.

ra07-Receipt
Fig. 7

How it works

The solution works by first calling the Cognitive Services OCR service; then, looping through each line and word, looking for patterns. It uses regular expressions to find these patterns. Below is the code to find the date in the recognized text:

Listing 2:

static string ExtractDate(string line) 
 { 
    string receiptdate = ""; 
    // match dates "01/05/2018" "01-05-2018" "01-05-18" "01 05 18" "01 05 2018" 
    string pat = @"\s*((31([-/ .])((0?[13578])|(1[02]))\3(\d\d)?\d\d)|((([012]?[1-9])|([123]0))([-/ .])((0?[13-9])|(1[0-2]))\12(\d\d)?\d\d)|(((2[0-8])|(1[0-9])|(0?[1-9]))([-/ .])0?2\22(\d\d)?\d\d)|(29([-/ .])0?2\25(((\d\d)?(([2468][048])|([13579][26])|(0[48])))|((([02468][048])|([13579][26]))00))))\s*"; 
    foreach (Match in Regex.Matches(line, pat)) 
    { 
        receiptdate = match.Value.Trim(); 
            receiptdate = receiptdate.Replace("-", "/"); 
         receiptdate = receiptdate.Replace(".", "/"); 
        receiptdate = receiptdate.Replace(" ", "/"); 
    }

    // didnt find date?  now we'll try searching with month names.  03 OCT 2017, 03 October 2017 etc 
    if (receiptdate == "") 
    { 
        pat = @"((31(?![-/ .](Feb(ruary)?|Apr(il)?|June?|(Sep(?=\b|t)t?|Nov)(ember)?)))|((30|29)(?![-/ .]Feb(ruary)?))|(29(?=[-/ .]Feb(ruary)?[-/ .](((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])[-/ .](Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sep(?=\b|t)t?|Nov|Dec)(ember)?)[-/ .]((1[6-9]|[2-9]\d)\d{2})";

        foreach (Match in Regex.Matches(line, pat, RegexOptions.IgnoreCase)) 
        { 
            receiptdate = match.Value.Trim(); 
            receiptdate = receiptdate.Replace("/", "-"); 
            receiptdate = receiptdate.Replace(".", "-"); 
            receiptdate = receiptdate.Replace(" ", "-"); 
         } 
    }

    return receiptdate; 
}
  

Limitations

This tool is not perfect.

It is incomplete. Although the model supports a Business Name and a Tax Total, it looks like the logic to extract this information has not yet been written.

Note that it is an Open Source project and you are welcome to contribute and submit a Pull Request. If this logic is important to your project, write it and share it with the world.

The solution is also limited by the capabilities of the OCR service it calls. However, my experience is that this service becomes more accurate as time goes on.

The results are best with a clear, high-contrast receipt. If your receipt is wrinkled or faded or has a watermark, the OCR will be degraded, effecting any analysis of the recognized text.

Strengths

The receipt-api project does provide several advantages:

  • It is simple to use.
  • It can scale when deployed to Azure.
  • It is an open source project, so other developers (including you) can improve it.
  • It is free.
  • It has an MIT license and can be used without restriction.

The receipt-api open source project provides a simple way to extract data from a receipt.

Friday, June 14, 2019 9:22:00 AM (GMT Daylight Time, UTC+01:00)