# Friday, July 12, 2019

From its earliest days, Microsoft Cognitive Services has had the ability to convert pictures of text into text - process known as Optical Character Recognition. I wrote about using this service here and here.

Recently, Microsoft released a new service to perform OCR. Unlike the previous service, which only requires a single web service call, this service requires two calls: one to pass an image and start the text recognition process; and other to ask the status of that text recognition process and return the transcribed text.

To get started, you will need to create a Computer Vision key, as described here.

Creating this service gives you a URI endpoint to call as a web service, and an API key, which must be passed in the header of web service calls.

Recognize Text

The first call is to the Recognize Text API. To call this API, send an HTTP POST to the following URL:

https://lllll.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=mmmmm

where:

lllll is the location selected when you created the Computer Vision Cognitive Service in Azure; and

mmmmm is "Printed" if the image contains printed text, as from a computer or typewriter; or "Handwritten" if the image contains a picture of handwritten text.

The header of an HTTP request can include name-value pairs. In this request, include the following name-value pairs:

Name Value
Ocp-Apim-Subscription-Key The Computer Vision API key (from the Cognitive Service created above)
Content-Type "application/json", if you plan to pass a URL pointing to an image on the public web;
"application/octet-stream", if you are passing the actual image in the request body.
Details about the request body are described below.

You must pass the image or the URL of the image in the request body. What you pass must be consistent with the "Content-Type" value passed in the header.

If you set the Content-Type header value to "application/json", pass the following JSON in the request body:

{"url":"http://xxxx.com/xxx.xxx"}  

where http://xxxx.com/xxx.xxx is the URL of the image you want to analyze. This image must be accessible to Cognitive Service (e.g., it cannot be behind a firewall or password-protected).

If you set the Content-Type header value to "application/octet-stream", pass the binary image in the request body.

You will receive an HTTP response to your POST. If you receive a response code of "202" ("Accepted"), this is an indication that the POST was successful, and the service is analyzing the image. An "Accepted" response will include the "Operation-Location in its header. The value of this header will contain a URL that you can use to query if the service has finished analyzing the image. The URL will look like the following:

https://lllll.api.cognitiveservices.microsoft.com/vision/v2.0/textOperations/gggggggg-gggg-gggg-gggg-gggggggggggg

where

lllll is the location selected when you created the Computer Vision Cognitive Service in Azure; and

gggggggg-gggg-gggg-gggg-gggggggggggg is a GUID that uniquely identifies the analysis job.

Get Recognize Text Operation Result

After you call the Recognize Text service, you can call the Get Recognize Text Operation Result service to determine if the OCR operation is complete.

To call this service, send an HTTP GET request to the "Operation-Location" URL returned in the request above.

In the header, send the following name-value pair:

Name Value
Ocp-Apim-Subscription-Key The Computer Vision API key (from the Cognitive Service created above)

This is the same value as in the previous request.

An HTTP GET request has no body, so there is nothing to send there.

If the request is successful, you will receive an HTTP "200" ("OK") response code. A successful response does not mean that the image has been analyzed. To know if it has been analyzed, you will need to look at the JSON object returned in the body of the response.

At the root of this JSON object is a property named "status". If the value of this property is "Succeeded", this indicates that the analysis is complete, and the text of the image will also be included in the same JSON object.

Other possible statuses are "NotStarted", "Running" and "Failed".

A successful status will include the recognized text in the JSON document.

At the root of the JSON (the same level as "status") is an object named "recognitionResult". This object contains a child object named "lines".

The "lines" object contains an array of anonymous objects, each of which contains a "boundingBox" object, a "text" object, and a "words" object. Each object in this array represents a line of text.

The "boundingBox" object contains an array of exactly 8 integers, representing the x,y coordinates of the corners an invisible rectangle around the line.

The "text" object contains a string with the full text of the line.

The "words" object contains an array of anonymous objects, each of which contains a "boundingBox" object and a "text" object. Each object in this array represents a single word in this line.

The "boundingBox" object contains an array of exactly 8 integers, representing the x,y coordinates of the corners an invisible rectangle around the word.

The "text" object contains a string with the word.

Below is a sample of a partial result:

{ 
  "status": "Succeeded", 
  "recognitionResult": { 
    "lines": [ 
      { 
        "boundingBox": [ 
          202, 
          618, 
          2047, 
          643, 
          2046, 
          840, 
          200, 
          813 
        ], 
        "text": "The walrus and the carpenter", 
         "words": [ 
          { 
            "boundingBox": [ 
               204, 
              627, 
              481, 
              628, 
              481, 
              830, 
              204, 
               829 
            ], 
            "text": "The" 
           }, 
          { 
            "boundingBox": [ 
              519, 
              628, 
              1057, 
              630, 
               1057, 
              832, 
              518, 
               830 
            ], 
           "text": "walrus" 
          }, 
          ...etc... 
  

In this article, I showed details of the Recognize Text API. In a future article, I will show how to call this service from code within your application.

Friday, July 12, 2019 2:00:09 PM (GMT Daylight Time, UTC+01:00)
# Thursday, July 11, 2019

GCast 56:

Azure Web App Deployment Slots

Deployment slots allow you to test changes to your web application in a production-like environment before deploying to production.

Azure | GCast | Screencast | Video | Web
Thursday, July 11, 2019 9:27:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, July 10, 2019

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform that provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

In the last article in this series, I showed how to create a new Databricks Cluster in a Microsoft Azure Databricks Workspace.

In this article, I will show how  to create a notebook and run it on that cluster.

Navigate to the Databricks service, as shown in Fig. 1.

db01-OverviewBlade
Fig. 1

Click the [Launch Workspace] button (Fig. 2) to open the Azure Databricks page, as shown in Fig. 3.

db02-LaunchWorkspaceButton
Fig. 2

db03-DatabricksHomePage
Fig. 3

Click the "New Notebook" link under "Common Tasks" to open the "Create Notebook" dialog, as shown in Fig. 4.

db04-CreateNotebookDialog
Fig. 4

At the "Name" field, enter a name for your notebook. The name must be unique within this workspace.

At the "Language" dropdown, select the default language for your notebook. Current options are Python, Scala, SQL, and R. Selecting a language does not limit you to only using that language within this notebook. You can override the language in a given cell.

Click the [Create] button to create the new notebook. A blank notebook displays, as shown in Fig. 5.

db05-BlankNotebook
Fig. 5

Fig. 6 shows a notebook with some simple code added to the first 2 cells.

db06-Notebook
Fig. 6

You can add, move, or manipulate cells by clicking the cell menu at the top right of an existing cell, as shown in Fig. 7.

db07-AddCell
Fig. 7

In order to run your notebook, you will need to attach it to an existing, running cluster. Click the "Attach to" dropdown and select from the clusters in the current workspace, as shown in Fig. 8.  See this article for information on how to create a cluster.

db08-AttachCluster
Fig. 8

You can run all the cells in a notebook by clicking the "Run all" button in the toolbar, as shown in Fig. 9.

db09-RunAll
Fig. 9

Use the "Run" menu in the top right of a cell to run only that cell or the cells above or below it, as shown in Fig. 10.

db10-RunCell
Fig. 10

Fig. 11 shows a notebook after all cells have been run. Note the output displayed below each cell.

db11-NotebookWithResults
Fig. 11

In this article, I showed how to create, run, and manage a notebook in an Azure Databricks workspace.

Wednesday, July 10, 2019 9:20:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, July 9, 2019

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform that provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

In the last article in this series, I showed how to create a new Databricks service in Microsoft Azure.

A cluster is a set of compute nodes that can work together. All Databricks jobs run  in a cluster, so you will need to create one if you want to do anything with your Databricks service.

In this article, I will show how  to create a cluster in that service.

Navigate to the Databricks service, as shown in Fig. 1.

db01-OverviewBlade
Fig. 1

Click the [Launch Workspace] button (Fig. 2) to open the Azure Databricks page, as shown in Fig. 3.

db02-LaunchWorkspaceButton
Fig. 2

db03-DatabricksHomePage
Fig. 3

Click the "New Cluster" link to open the "Create Cluster" dialog, as shown in Fig. 4.

db04-CreateCluster
Fig. 4

At the "Cluster Name" field, enter a descriptive name for your cluster.

At the "Cluster Mode" dropdown, select "Standard" or "High Concurrency". The "High Concurrency" option can run multiple jobs concurrently.

At the "Databricks Runtime Version" dropdown, select the runtime version you wish to support on this cluster. I recommend selecting the latest non-beta version.

At the "Python Version" dropdown, select the version of Python you wish to support. New code will likely be written in version 3, but you may be running old notebooks written in version 2.

I recommend checking the "Enable autoscaling" checkbox. This allows the cluster to automatically spin up the number of nodes required for a  job, effectively balancing cost and performance.

I recommend checking the "Terminate after ___ minutes" checkbox and including a reasonable amount of time (I usually set this to 60 minutes) of inactivity to shut down your clusters. Running a cluster is an expensive operation, so you will save a lot of money if you shut them down when not in use. Because it takes a long time to spin up a cluster, consider how frequently a new job is required before setting this value too low. You may need to experiment with this value to get it right for your situation.

At the "Worker Type" node, select the size of machines to include in your cluster. If you enabled autoscaling, you can set the minimum and maximum worker nodes as well. If you did not enable autoscaling, you can only set the number of worker nodes. My experience is that more nodes and smaller machines tends to be more cost-effective than fewer nodes and more powerful machines; but you may want to experiment with your jobs to find the optimum setting for your organization.

At the "Driver Type" dropdown, select "Same as worker".

You can expand the "Advanced Options" section to pass specific data to your cluster, but this is usually not necessary.

Click the [Create Cluster] button to create this cluster. It will take a few minutes to create and start a new cluster.

When the cluster is created, you will see it listed, as shown in Fig. 5, with a state of "Running".

db05-Clusters
Fig. 5

You are now ready to create jobs and run them on this cluster. I will cover this in a future article.

In this article, you learned how to create a cluster in an existing Azure Databricks workspace.

Tuesday, July 9, 2019 9:37:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, July 8, 2019

Episode 571

Jon Galloway on the .NET Foundation

The .NET Foundation recently expanded its board and its goals. Jon Galloway discusses what the Foundation does and what it strives to do.

Monday, July 8, 2019 9:54:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, July 7, 2019

7/7
Today I am grateful for my first visit to the Chicago Architectural Center yesterday.

7/6
Today I am grateful for my first visit to the National Museum of Mexican Art yesterday.

7/5
Today I am grateful for an afternoon at the Lincoln Park Zoo.

7/4
Today I am grateful I was born in the United States.

7/3
Today I am grateful for trivia night at the Twisted Hippo.

7/2
Today I am grateful for a late Father's Day dinner celebration last night with my son and his girlfriend.

7/1
Today I am grateful for all the parks in the city of Chicago.

6/29
Today I am grateful for a bike ride around downtown St. Joseph yesterday.

6/28
Today I am grateful for a good personal trainer.

6/27
Today I am grateful to attend my first Chicago Sky WNBA game yesterday.

6/26
Today I am grateful to see the Rolling Stones in concert last night - the first time I've seen them in my life.

6/25
Today I am grateful for nice weather.

6/24
Today I am grateful for my first visit to Traverse City in 21 years.

6/23
Today I am grateful to celebrate Kristen and Scott's wedding with them in Traverse City yesterday.

6/22
Today I am grateful for breakfast yesterday with Austin in Amsterdam. We had not seen one another in decades.

6/21
Today I am grateful to spend yesterday at the Van Gogh Museum and the Reijksmuseum in Amsterdam.

6/20
‪Today I am grateful for a boat ride last night along the Amsterdam canals with Brent.

6/19
Today I am grateful that my son's Achilles tendon surgery went well yesterday.

6/18
Today I am grateful for dinner at a French restaurant in Amsterdam with my international team last night.

6/17
Today I am grateful for dinner with Mike in Amsterdam last night.

6/16
Today I am grateful to visit Amsterdam for the first time.

6/15
‪Today I am grateful to kick off a new project with a new customer yesterday. ‬

6/14
Today I am grateful for a chiropractor and masseuse across the street from my home.

6/13
Today I am grateful for a graduation celebration dinner last night.

6/12
Today I am grateful to attend Larissa's high school graduation ceremony yesterday.

6/11
Today I am grateful to work on my balcony on a beautiful summer morning.

6/10
Today I am grateful for a walk around North Park Village Nature Center yesterday.

6/9
Today I am grateful for:
-Hearing some great live music with Thad at the Chicago Blues Festival
-A visit to the Printers Row Lit Fest

6/8
Today I am grateful for new furniture on my balcony.

6/7
Today I am grateful that my son's injury is not as severe as we originally thought.

6/6
Today I am grateful that Raffaele and the folks at IT Camp are thinking of me in Transylvania.

6/5
Today I am grateful to receive an email with some kind words yesterday.

6/4
Today I am grateful to discover the exhibits at the Chicago Public Library yesterday.

6/3
Today I am grateful to visit the Fabyan Japanese Gardens and Fox River in Batavia, IL yesterday.

Sunday, July 7, 2019 3:28:22 PM (GMT Daylight Time, UTC+01:00)
# Saturday, July 6, 2019

HousekeepingHousekeeping by Marilynne Robinson is filled with water and filled with tragedy.

It opens with a train crash into a lake, killing hundreds, including the grandfather of Ruthie and Lucille. Later, Lucille and Ruthie's mother commits suicide by driving her car into the same lake. Abandoned years earlier by their father, the girls grow up under the care of their grandmother and aunts until eccentric Aunt Silvie shows up and moves in.

Silvie is a former transient, who sometimes falls asleep on park benches. She is not cut out for motherhood and the girls withdraw into one another, skipping schools and making no friends, other than one another. They skip school and the local authorities begin to question their situation, forcing everyone in this family to make a choice.

Housekeeping is a simple story, built on the strength of the characters. Robinson presents humor and tragedy in an eloquent style that keeps the reader engaged. For such a short novel, we see a full picture of the three main characters. It is worth the time to read.

Saturday, July 6, 2019 9:12:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, July 5, 2019

Azure Databricks is a web-based platform built on top of Apache Spark and deployed to Microsoft's Azure cloud platform.

Databricks provides a web-based interface that makes it simple for users to create and scale clusters of Spark servers and deploy jobs and Notebooks to those clusters. Spark provides a general-purpose compute engine ideal for working with big data, thanks to its built-in parallelization engine.

Apache Spark is open source and Databricks is owned by the Databricks company; but, Microsoft adds value by providing the hardware and fabric on which these tools are deployed, including providing capacity on which to scale and built-in fault tolerance.

To create an Azure Databricks environment, navigate to the Azure Portal, log in, and click the [Create Resource] button (Fig. 1).

db01-CreateResourceButton
Fig. 1

From the menu, select Analytics | Azure Databricks, as shown in Fig. 2.

db02-NewDataBricksMenu
Fig. 2

The "Azure Databricks service" blade displays, as shown in Fig. 3.

db03-NewDataBricksBlade
Fig. 3

At the "Workspace name" field, enter a unique name for the Databricks workspace you will create.

At the "Subscription" field, select the subscription associated with this workspace. Most of you will have only one subscription.

At the "Resource group" field, click the "Use existing" radio button and select an existing Resource Group from the dropdown below; or click the "Create new" button and enter the name and region of a new Resource Group when prompted.

At the "Location" field, select the location in which to store your workspace. Considerations include the location of the data on which you will be working and the location of developers and users who will access this workspace.

At the "Pricing Tier" dropdown, select the desired pricing tier. The Pricing Tier options are shown in Fig. 4.

db04-PricingTier
Fig. 4

If you wish to deploy this workspace to a particular virtual network, select "Yes" radio button at this question.

When completed, the blade should look similar to Fig. 5.

db05-NewDataBricksBlade-Completed
Fig. 5

Click the [Create] button to create the new Databricks service. This may take a few minutes.

Navigate to the Databricks service, as shown in Fig. 6.

db06-OverviewBlade
Fig. 6

Click the [Launch Workspace] button (Fig. 7) to open the Azure Databricks page, as shown in Fig. 8.

db07-LaunchWorkspaceButton
Fig. 7

db08-DatabricksHomePage
Fig. 8

In this article, I showed you how  to create a new Azure Databricks service. In future articles, I will show how to create clusters, notebooks, and otherwise make use of your Databricks service.

Friday, July 5, 2019 9:00:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, July 4, 2019

GCast 55:

GitHub Deployment to an Azure Web App

Learn how to set up automated deployment from a GitHub repository to an Azure Web App

Thursday, July 4, 2019 9:58:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, July 3, 2019

Source control is an important part of software development - from collaborating with other developers to enabling continuous integration and continuous deployment to providing the ability to roll back changes.

Azure Data Factory (ADF) provides the ability to integrate with source control systems GitHub or Azure DevOps.

I will walk you through doing this, using GitHub.

Before you get started, you must have the following:

A GitHub account (Free at https://github.com)

A GitHub repository created in your account, with at least one file in it. You can easily add a "readme.md" file to a repository from within the GitHub portal.

Create an ADF service, as described in this article.

Open the "Author & Monitor" page (Fig. 1) and click the "Set up Code Repository" button (Fig. 2)

ar01-ADFOverviewPage
Fig. 1

ar02-SetupCodeRepositoryButton
Fig. 2

The "Repository Settings" blade displays, as shown in Fig. 3.

ar03-RepositoryType
Fig. 3

At the "Repository Type", dropdown, select the type of source control you are using. The current options are "Azure DevOps Git" and "GitHub". For this demo, I have selected "GitHub".

When you select a Repository type, the rest of the dialog expands with prompts relevant to that type. Fig. 4 shows the prompts when you select "GitHub".

ar04-RepositoryName
Fig. 4

I don't have a GitHub Enterprise account, so I left this checkbox unchecked.

At the "GitHub Account" field, enter the name of your GitHub account. You don't need the full URL - just the name. For example, my GitHub account name is "davidgiard", which you can find online at https://github.com/davidgiard; so, I entered "davidgiard" into the "GitHub Account" field.

The first time you enter this account, you may be prompted to sign in and to authorize Azure to access your GitHub account.

Once you enter a valid GitHub account, the "Git repository name" dropdown is populated with a list of your repositories. Select the repository you created to hold your ADF assets.

After you select a repository, you are prompted for more specific information, as shown in Fig. 5

ar05-RepositorySettings
Fig. 5

At the "Collaboration branch", select "master". If you are working in a team environment or with multiple releases, it might make sense to check into a different branch in order control when changes are merged. To do this, you will need to create a new branch in GitHub.

At the "Root folder", select a folder of the repository in which to store your ADF assets. I typically leave this at "/" to store everything in the root folder; but, if you are storing multiple ADF services in a single repository, it might make sense to organize them into separate folders.

Check the "Import existing Data Factory resources to repository" checkbox. This causes any current assets in this ADF asset to be added to the repository as soon as you save. If you have not yet created any pipelines, this setting is irrelevant.

At the "Branch to import resources into" radio buttons, select "Use Collaboration".

Click the [Save] button to save your changes and push any current assets into the GitHub repository.

Within seconds, any pipelines, linked services, or datasets in this ADF service will be pushed into GitHub. You can refresh the repository, as shown in Fig. 6.

ar06-GitHub
Fig. 6

Fig. 7 shows a pipeline asset. Notice that it is saved as JSON, which can easily be deployed to another server.

ar07-GitHub
Fig. 7

In this article, you learned how to connect your ADF service to a GitHub repository, storing and versioning all ADF assets in source control.

Wednesday, July 3, 2019 6:56:40 PM (GMT Daylight Time, UTC+01:00)