# Thursday, August 29, 2019

GCast 63:

Sentiment Analysis JavaScript Demo

In this video, I walk you through a JavaSript application that calls the Sentiment Analysis Cognitive Service.

Thursday, August 29, 2019 1:09:57 PM (GMT Daylight Time, UTC+01:00)
# Friday, August 23, 2019

GCast 62:

Sentiment Analysis Cognitive Service

This video explains the Sentiment Analysis service, which is part of the Text Analytics Cognitive Service.

Friday, August 23, 2019 4:47:19 AM (GMT Daylight Time, UTC+01:00)
# Friday, August 16, 2019

In the last article, I walked through the syntax of calling the Bing Spell Check service.

In this article, I will walk through a simple JavaScript application that calls this service.

If you want to follow along this sample is part of my Cognitive Services demos, which you can find on GitHub at https://github.com/DavidGiard/CognitiveSvcsDemos 

This project is found in the "SpellCheckDemo" folder.

Here is the main web page:

Listing 1:

<html>
<head>
    <title>Spell Check Demo</title>
    <script src="scripts/jquery-1.10.2.min.js"></script>
    <script src="scripts/script.js"></script>
    <script src="scripts/getkey.js"></script>
    <link rel="stylesheet" href="css/site.css">
</head>
 <body>
     <h1>Spell Check Demo</h1>
     <div>
         <textarea id="TextToCheck">Life ig buuutiful all the tyme
         </textarea>
     </div>
    <button id="SpellCheckButton">Check Spelling!</button>
     <div id="NewTextDiv"></div>
     <div id="OutputDiv"></div>

</body>
</html>
  

As you can see, the page consists of a text area with some misspelled text; a button; and 2 empty divs.

The page looks like this when rendered in a browser:

scjs01-PageOnLoad
Fig. 1

When the user clicks the button, we want to call the Spell Check service, sending it the text in the text area.

We want to display the values in the web service response in the OutputDiv div; and we want to display some of the raw information in the response in the NewTextDiv div.

Below is the screen after clicking the [Check Spelling] button

scjs02-PageAfterClick

Fig. 2

We need a reference to the outputDiv, so we can easily write to it.

Listing 2:

var outputDiv = document.getElementById("OutputDiv");
  

Next, we bind code to the button's click event, as shown in Listing 3.

Listing 3:

var spellCheckButton = document.getElementById("SpellCheckButton"); 
spellCheckButton.onclick = function () { 
    // Replace this with your Spell Check API key from Aure 
    var subscriptionKey = "xxxxxxxxxxxxxxxxxxxxxxxx"; 

    outputDiv.innerHTML = "Thinking...";

    var textToCheck = document.getElementById("TextToCheck").textContent; 
    var webSvcUrl = "https://api.cognitive.microsoft.com/bing/v7.0/spellcheck/?text=" + textToCheck; 
    webSvcUrl = webSvcUrl + "&mode=proof&mkt=en-US";

    var httpReq = new XMLHttpRequest(); 
    httpReq.open("GET", webSvcUrl, true); 
    httpReq.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey) 
    httpReq.setRequestHeader("contentType", "application/json") 
    httpReq.onload = onSpellCheckSuccess; 
    httpReq.onerror = onSpellCheckError; 
    httpReq.send(null); 
};
  

This code gets the text from the text area and makes an asynchronous HTTP GET request to the Spell Check API, passing the API key in the header. When the API sends a response, this will call the onSpellCheckSuccess or onSpellCheckError function, depending on the success of the call.

Listing 4 shows the onSpellCheckSuccess function:

Listing 4:

function onSpellCheckSuccess(evt) { 
    var req = evt.srcElement; 
    var resp = req.response; 
    var data = JSON.parse(resp);

    var flaggedTokens = data.flaggedTokens; 
    if (data.flaggedTokens.length > 0) { 
        var newText = document.getElementById("TextToCheck").textContent; 
        ; 
        var outputHtml = ""; 
         flaggedTokens.forEach(flaggedToken => { 
            var token = flaggedToken.token; 
            var tokenType = flaggedToken.type; 
            var offset = flaggedToken.offset; 
            var suggestions = flaggedToken.suggestions; 
            outputHtml += "<div>" 
            outputHtml += "<h3>Token: " + token + "</h3>"; 
            outputHtml += "Type: " + tokenType + "<br/>"; 
            outputHtml += "Offset: " + offset + "<br/>"; 
             outputHtml += "<div>Suggestions</div>"; 
            outputHtml += "<ul>";

            if (suggestions.length > 0) { 
                 suggestions.forEach(suggestion => { 
                     outputHtml += "<li>" + suggestion.suggestion; 
                     outputHtml += " (" + (suggestion.score * 100).toFixed(2) + "%)" 
                }); 
                outputHtml += "</ul>"; 
                outputHtml += "</div>";

                newText = replaceTokenWithSuggestion(newText, token, offset, suggestions[0].suggestion) 
            } 
            else { 
                 outputHtml += "<ul><li>No suggestions for this token</ul>"; 
            } 
        });

        newText = "<h2>New Text:</h2>" + newText; 
        var newTextDiv = document.getElementById("NewTextDiv"); 
        newTextDiv.innerHTML = newText;

        outputHtml = "<h2>Details</h2>" + outputHtml; 
        outputDiv.innerHTML = outputHtml;

    } 
    else { 
        outputDiv.innerHTML = "No errors found."; 
    } 
};
  

As you can see, we parse out the JSON object from the response and retrieve each flaggedToken from that object. For each flaggedToken, we output information, such as the original text (or token), the tokenType, and suggested replacements, along with the score of each replacement.

If an error occurs when calling the API service, the onSpellCheckError function is called, as shown in Listing 5.

Listing 5:

function onSpellCheckError(evt) { 
    outputDiv.innerHTML = "An error has occurred!!!"; 
};
  

Finally, we replace each token with the first suggestion, using the code in Listing 6.

Listing 6*:

function replaceTokenWithSuggestion(originalString, oldToken, offset, newWord) { 
    var textBeforeToken = originalString.substring(0, offset);

    var textAfterToken = ""; 
    if (originalString.length > textBeforeToken.length + oldToken.length) { 
        textAfterToken = originalString.substring(offset + oldToken.length, originalString.length); 
    }

    var newString = textBeforeToken + newWord + textAfterToken;

    return newString; 
 }
  

Here is the full JavaScript:

Listing 7:

window.onload = function () {

    var outputDiv = document.getElementById("OutputDiv");
    // var subscriptionKey = getKey();

    var spellCheckButton = document.getElementById("SpellCheckButton");
    spellCheckButton.onclick = function () {
        var subscriptionKey = getKey();
        var textToCheck = document.getElementById("TextToCheck").textContent;

        var webSvcUrl = "https://api.cognitive.microsoft.com/bing/v7.0/spellcheck/?text=" + textToCheck;
        webSvcUrl = webSvcUrl + "&mode=proof&mkt=en-US";

        outputDiv.innerHTML = "Thinking...";

        var httpReq = new XMLHttpRequest();
        httpReq.open("GET", webSvcUrl, true);
        httpReq.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey)
        httpReq.setRequestHeader("contentType", "application/json")
        httpReq.onload = onSpellCheckSuccess;
        httpReq.onerror = onSpellCheckError;
        httpReq.send(null);
    };

    function onSpellCheckSuccess(evt) {
        var req = evt.srcElement;
        var resp = req.response;
        var data = JSON.parse(resp);

        var flaggedTokens = data.flaggedTokens;
        if (data.flaggedTokens.length > 0) {
            var newText = document.getElementById("TextToCheck").textContent;
            ;
            var outputHtml = "";
            flaggedTokens.forEach(flaggedToken => {
                var token = flaggedToken.token;
                var tokenType = flaggedToken.type;
                var offset = flaggedToken.offset;
                var suggestions = flaggedToken.suggestions;
                outputHtml += "<div>"
                outputHtml += "<h3>Token: " + token + "</h3>";
                outputHtml += "Type: " + tokenType + "<br/>";
                outputHtml += "Offset: " + offset + "<br/>";
                outputHtml += "<div>Suggestions</div>";
                outputHtml += "<ul>";

                if (suggestions.length > 0) {
                    suggestions.forEach(suggestion => {
                        outputHtml += "<li>" + suggestion.suggestion;
                        outputHtml += " (" + (suggestion.score * 100).toFixed(2) + "%)" 
                    });
                    outputHtml += "</ul>";
                    outputHtml += "</div>";

                    newText = replaceTokenWithSuggestion(newText, token, offset, suggestions[0].suggestion)
                }
                else {
                    outputHtml += "<ul><li>No suggestions for this token</ul>";
                }
            });

            newText = "<h2>New Text:</h2>" + newText;
            var newTextDiv = document.getElementById("NewTextDiv");
            newTextDiv.innerHTML = newText;

            outputHtml = "<h2>Details</h2>" + outputHtml;
            outputDiv.innerHTML = outputHtml;

        }
        else {
            outputDiv.innerHTML = "No errors found.";
        }
    };

    function onSpellCheckError(evt) {
        outputDiv.innerHTML = "An error has occurred!!!";
    };

    function replaceTokenWithSuggestion(originalString, oldToken, offset, newWord) {
        var textBeforeToken = originalString.substring(0, offset);

        var textAfterToken = "";
        if (originalString.length > textBeforeToken.length + oldToken.length) {
            textAfterToken = originalString.substring(offset + oldToken.length, originalString.length);
        }

        var newString = textBeforeToken + newWord + textAfterToken;

        return newString;
    }

};
  

Hopefully, this sample gives you an idea how to get started building your first app that uses the Bing Spell Check API.



* This code currently has a bug in it: It only works if each suggestion is the same length as the token it replaces. I plan to fix this bug, but I'm publishing now because:

  1. It is not a fatal bug and
  2. It is not relevant to the call to the API, which is the primary point I'm showing in this article.
Friday, August 16, 2019 9:00:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, August 15, 2019

GCast 61:

Text Recognition C# Demo

In this video, I walk you through a C# application that calls the Text Recognition service, passing in an image of text and retrieving that text.

Thursday, August 15, 2019 8:47:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, August 14, 2019

In the last article, I showed how to create a Bing Spell Check service in Azure. Once you have created this service, you can now pass text to a web service to perform spell checking.

Given a text sample, the service checks the spelling of each token in the sample. A token is a word or two word that should be a single word, such as "arti cle", which is a misspelling of the word "article".

It returns an array of unrecognized tokens, along with suggested replacements for these misspelled tokens.

URL and querystring arguments

The URL for the web service is
https://api.cognitive.microsoft.com/bing/v7.0/spellcheck

You can add some optional querystring parameters to this URL:

mode
Set this to "proof" if you want to check for spelling, grammar, and punctuation errors
Set it to "spell" if you only want to check for spelling errors.

If you omit the "mode" querystring argument, it defaults to "proof".

mkt
Set this to the Market Code of the country/language/culture you want to test. This is in the format [Language Code]-[Country Code], such as "en-US" for United States English. A full list of Market Codes can be fond here.

The "Proof" mode supports only en-US,  es-ES, and pt-BR Market Codes.

If you omit the mkt argument, the service will guess the market based on the text. Therefore, it is a good idea to include this value, even though it is optional.

Below is an example of a URL with some querystring values set.

https://api.cognitive.microsoft.com/bing/v7.0/spellcheck?mode=proof&mkt=en-us

POST vs GET

You have the option to submit either an HTTP POST or an HTTP GET request to the URL. We will discuss the differences below.

If you use the GET verb, you pass the text to check in the querystring, as in the following example:

https://api.cognitive.microsoft.com/bing/v7.0/spellcheck?mode=proof&mkt=en-us&text=Life+ig+buuutifull+all+the+tyme

With the GET method, the text is limited to 1,500 characters

If you use the POST verb, the text is passed in the body of the request, as in the following example:

text=Life+ig+buuutifull+all+the+tyme

With the POST method, you can send text up to 10,000 characters long.

Results

If successful, the web service will return an HTTP 200 ("OK") response, along with the following data in JSON format in the body of the response:

_type: "SpellCheck"

An array of "flaggedTokens", representing spelling errors found

Each flaggedToken consists of the following information:

  • offset: The position of the offending token within the text
  • token: The token text
  • type: The reason this token is in this list (usually "UnknownToken")
  • suggestion: An array of suggested replacements for the offending token. Each suggestion consists of the following:
  • score: a value (0-1) indicating the likelihood that this suggestion is the appropriate replacement

Below is an example of a response:

{
   "_type": "SpellCheck",
   "flaggedTokens": [{
     "offset": 5,
     "token": "ig",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "is",
       "score": 0.8922398888897022
     }]
   }, {
     "offset": 8,
     "token": "buuutifull",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "beautiful",
       "score": 0.8922398888897022
     }]
   }, {
     "offset": 27,
     "token": "tyme",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "time",
       "score": 0.8922398888897022
     }]
   }]
 }
  

In this article, I showed how to call the Bing Spell Check service with either a GET or POST HTTP request.

Wednesday, August 14, 2019 8:53:00 AM (GMT Daylight Time, UTC+01:00)

The Bing Spell Check API allows you to call a simple web service to perform spell checking on your text.

Before you get started, you must log into a Microsoft Azure account and create a new Bing Spell Check Service. Here are the steps to do this:

In the Azure Portal, click the [Create a resource] button (Fig. 1); then, search for and select "Bing Spell Check", as shown in Fig. 2.

sc01-CreateResourceButton
Fig. 1

sc02-SearchForBingSpellCheck
Fig. 2

The "Bing Spell Check" (currently on version 7) page displays, which describes the service and provides links to documentation and information about the service, as shown in Fig. 3

sc03-BingSpellCheckLandingPage
Fig. 3

Click the [Create] button to open the "Create" blade, as shown in Fig. 4.

sc04-CreateSpellCheckBlade
Fig. 4

At the "Name" field, enter a unique name for your service.

At the "Subscription" dropdown, select the subscription in which to create the service. Most of you will have only one subscription.

At the "Pricing Tier" dropdown, select the free or paid tier, as shown in Fig. 5.

sc05-PricingTiers
Fig. 5

The number of calls are severely limited for the free tier, so this is most useful for testing and learning the service. You may only create one free Spell Check service per subscription.

At the "Resource Group" field, select a resource group to associate with this service or click the "Create new" link to associate it with a newly-created resource group. A resource group provides a way to group together related service, making it easier to manage them together.

Click the [Create] button to begin creating the service. This process takes only a few seconds.

Open the service and select the "Keys" blade, as shown in Fig. 6.

sc06-KeysBlade
Fig. 6

Either one of the keys listed on this page must be passed in the header of your web service call.

Save a copy of one of these keys. You will need it when I show you how to call the Bing Spell Check Service in tomorrow’s article.

Wednesday, August 14, 2019 1:46:16 AM (GMT Daylight Time, UTC+01:00)
# Thursday, August 8, 2019

GCast 60:

Text Recognition Cognitive Service with Binary Images

The Text Recognition Service supports sending a binary image and reading any text in that image. This video shows you how.

Thursday, August 8, 2019 1:24:10 PM (GMT Daylight Time, UTC+01:00)
# Thursday, August 1, 2019

GCast 59:

Cognitive Services Text Recognition service

Learn to extract text from an image using the new Text Recognition service.

Thursday, August 1, 2019 11:53:50 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, July 17, 2019

In a recent article, I introduced you to the "Recognize Text" API that returns the text in an image - process known as "Optical Character Recognition", or "OCR".

In this article, I will show how to call this API from a .NET application.

Recall that the "Recognize Text" API consists of two web service calls:

We call the "Recognize Text" web service and pass an image to begin the process.

We call the "Get Recognize Text Operation Result" web service to check the status of the processing and retrieive the resulting text, when the process is complete.

The sample .NET application

If you want to follow along, the code is available in the RecognizeTextDemo found in this GitHub repository.

To get started, you will need to create a Computer Vision key, as described here.

Creating this service gives you a URI endpoint to call as a web service, and an API key, which must be passed in the header of web service calls.

The App

To run the app, you will need to copy the key created above into the App.config file. Listing 1 shows a sample config file:

Listing 1:

<configuration>
   <appSettings>
     <add key="ComputerVisionKey" value="5070eab11e9430cea32254e3b50bfdd5" />
   </appSettings>
 </configuration>
  

You will also need an image with some text in it. For this demo, we will use the image shown in Fig. 1.

rt01-Kipling
Fig. 1

When you run the app, you will see the screen in Fig. 2.

rt02-Form1
Fig. 2

Press the [Get File] button and select the saved image, as shown in Fig. 3.

rt03-SelectImage
Fig. 3

Click the [Open] button. The Open File Dialog closes, the full path of the image is displays on the form, and the [Start OCR] button is enabled, as shown in Fig. 4.

rt04-Form2
Fig. 4

Click the [Start OCR] button to call a service that starts the OCR. If an error occurs, it is possible that you did not configure the key correctly or that you are not connected to the Internet.

When the service call returns, the URL of the "Get Text" service displays (beneath the "Location Address" label), and the [Get Text] button is enabled, as shown in Fig. 5.

rt05-Form3
Fig. 5

Click the [Get Text] button. This calls the Location Address service and displays the status. If the status is "Succeeded", it displays the text in the image, as shown in Fig. 6.

rt06-Form4
Fig. 6

## The code

Let's take a look at the code in this application. It is all written in C#. The relevant parts are the calls to the two web service: "Recognize Text" and "Get Recognize Text Operation Result". The first call kicks off the OCR job; the second call returns the status of the job and returns the text found, when complete.

The code is in the TextService static class.

This class has a constant: visionEndPoint, which is the base URL of the Computer Vision Cognitive Service you created above. The code in the repository is in Listing 2. You may need to modify the URL, if you created your service in a different region.

Listing 2:

const string visionEndPoint = "https://westus.api.cognitive.microsoft.com/";
  

### Recognize Text

The call to the "Recognize Text" API is in Listing 1:

Listing 3:

public static async Task<string> GetRecognizeTextOperationResultsFromFile(string imageLocation, string computerVisionKey)
{
    var cogSvcUrl = visionEndPoint + "vision/v2.0/recognizeText?mode=Printed";
    HttpClient client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
    HttpResponseMessage response;
    // Convert image to a Byte array
    byte[] byteData = null;
    using (FileStream fileStream = new FileStream(imageLocation, FileMode.Open, FileAccess.Read))
    {
        BinaryReader binaryReader = new BinaryReader(fileStream);
        byteData = binaryReader.ReadBytes((int)fileStream.Length);
    }

    // Call web service; pass image; wait for response
    using (ByteArrayContent content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(cogSvcUrl, content);
    }

    // Read results
    RecognizeTextResult results = null;
    if (response.IsSuccessStatusCode)
    {
        var data = await response.Content.ReadAsStringAsync();
        results = JsonConvert.DeserializeObject<RecognizeTextResult>(data);
    }
    var headers = response.Headers;
    var locationHeaders = response.Headers.GetValues("Operation-Location");
    string locationAddress = "";
    IEnumerable<string> values;
    if (headers.TryGetValues("Operation-Location", out values))
    {
        locationAddress = values.First();
    }
    return locationAddress;
}
  

The first thing we do is construct the specific URL of this service call.

Then we use the System.Net.Http library to submit an HTTP POST request to this URL, passing in the image as an array of bytes in the body of the request. For more information on passing a binary file to a web service, see this article.

When the response returns, we check the headers for the "Operation-Location", which is the URL of the next web service to call. The URL contains a GUID that uniquely identifies this job. We save this for our next  call.

Get Recognize Text Operation Result

After kicking of the OCR, we need to call a different service to check the status and get the results. The code in Listing 4 does this.

Listing 4:

public static async Task<RecognizeTextResult> GetRecognizeTextOperationResults(string locationAddress, string computerVisionKey) 
 { 
    var client = new HttpClient(); 
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey); 
    var response = await client.GetAsync(locationAddress); 
    RecognizeTextResult results = null; 
    if (response.IsSuccessStatusCode) 
    { 
        var data = await response.Content.ReadAsStringAsync(); 
        results = JsonConvert.DeserializeObject<RecognizeTextResult>(data); 
    } 
    return results; 
 }
  

This code is much simpler because it is an HTTP GET and we don't need to pass anything in the request body.

We simply submit an HTTP GET request and use the Newtonsoft.Json libary to convert the response to a string.

Here is the complete code in the TextService class:

Listing 5:

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using TextLib.Models;

namespace TextLib
{

    public static class TextService
    {
        const string visionEndPoint = "https://westus.api.cognitive.microsoft.com/";

public static async Task<string> GetRecognizeTextOperationResultsFromFile(string imageLocation, string computerVisionKey)
{
    var cogSvcUrl = visionEndPoint + "vision/v2.0/recognizeText?mode=Printed";
    HttpClient client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
    HttpResponseMessage response;
    // Convert image to a Byte array
    byte[] byteData = null;
    using (FileStream fileStream = new FileStream(imageLocation, FileMode.Open, FileAccess.Read))
    {
        BinaryReader binaryReader = new BinaryReader(fileStream);
        byteData = binaryReader.ReadBytes((int)fileStream.Length);
    }

    // Call web service; pass image; wait for response
    using (ByteArrayContent content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(cogSvcUrl, content);
    }

    // Read results
    RecognizeTextResult results = null;
    if (response.IsSuccessStatusCode)
    {
        var data = await response.Content.ReadAsStringAsync();
        results = JsonConvert.DeserializeObject<RecognizeTextResult>(data);
    }
    var headers = response.Headers;
    var locationHeaders = response.Headers.GetValues("Operation-Location");
    string locationAddress = "";
    IEnumerable<string> values;
    if (headers.TryGetValues("Operation-Location", out values))
    {
        locationAddress = values.First();
    }
    return locationAddress;
}

        public static async Task<RecognizeTextResult> GetRecognizeTextOperationResults(string locationAddress, string computerVisionKey)
        {
            var client = new HttpClient();
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
            var response = await client.GetAsync(locationAddress);
            RecognizeTextResult results = null;
            if (response.IsSuccessStatusCode)
            {
                var data = await response.Content.ReadAsStringAsync();
                results = JsonConvert.DeserializeObject<RecognizeTextResult>(data);
            }
            return results;
        }

    }
}
  

The remaining code

There is other code in this application to do things like select the file from disk and loop through the JSON to concatenate all the text; but this code is very simple and (hopefully) self-documenting. You may choose other ways to get the file and handle the JSON in the response.

In this article, I've focused on the code to manage the Cognitive Services calls and responses to those calls in order to get the text from a picture of text.

Wednesday, July 17, 2019 10:51:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, July 16, 2019

Sometimes a web service requires us to pass a binary file, such as an image in the request body.

To do this, we need to submit the request with the POST verb, because other verbs - most notably "GET" - do not contain a body.

One simple web service that accepts a binary file is the Cognitive Services Image Analysis API. This API is fully documented here.

I created a console application (the simplest .NET app I can think of) to demonstrate how to pass the binary image to the web service. This application is named "ImageAnalysisConsoleAppDemo" and is included in my Cognitive Services demos, which you can download here.

Assumptions

Before you get started, you will need to create a Computer Vision Cognitive Service, as described here.

I have hard-coded the file name and location, along with the Cognitive Services URL, but you can change these to match what you are using. You will also need to add your API key to the App.config file.

The code

The first thing we need to do is to read the file and convert it into an array of bytes. The code to do this is in Listing 1 below.

Listing 1:

byte[] byteData; ;
using (FileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read))
{
    BinaryReader = new BinaryReader(fileStream);
    byteData = binaryReader.ReadBytes((int)fileStream.Length);
}
  

Next, we call the web service, passing the byte array. The System.Net.Http client library helps us to make this call. Notice the "using" construct that converts the byte array into a ByteArrayContent object that is required by the library.

Within that "using", we make an asynchronous call to the web service and capture the results.

Listing 2 shows this code.

Listing 2:

var cogSvcUrl = "https://westus.api.cognitive.microsoft.com/vision/v2.0/analyze?visualFeatures=Description&language=en"; 
HttpClient client = new HttpClient(); 
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey); 
HttpResponseMessage response; 
using (ByteArrayContent content = new ByteArrayContent(byteData)) 
{ 
    content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream"); 
    response = await client.PostAsync(cogSvcUrl, content); 
}
  

Finally, we convert the results to a string, as shown in Listing 3. This web service returns JSON containing either information about the image or an error message.

Listing 3:

string webServiceResponseContent = await response.Content.ReadAsStringAsync();
  

Here is the full code:

Listing 4:

using System;
using System.Configuration;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;

namespace ImageAnalysisConsoleAppDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            MainAsync().Wait();
            Console.ReadLine();
        }

        static async Task MainAsync()
        {
            string key = GetKey();
            string imageFilePath = @"c:\test\kittens.jpg";
            if (!File.Exists(imageFilePath))
            {
                Console.WriteLine("File {0} does not exist", imageFilePath);
                return;
            }
            string results = await GetRecognizeTextOperationResultsFromFile(imageFilePath, key);
            Console.WriteLine(results);
        }


        public static async Task<string> GetRecognizeTextOperationResultsFromFile(string imageFilePath, string computerVisionKey)
        {
            // Convert file into Byte Array
            byte[] byteData; ;
            using (FileStream fileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read))
            {
                BinaryReader binaryReader = new BinaryReader(fileStream);
                byteData = binaryReader.ReadBytes((int)fileStream.Length);
            }

            // Make web service call. Pass byte array in body
            var cogSvcUrl = "https://westus.api.cognitive.microsoft.com/vision/v2.0/analyze?visualFeatures=Description&language=en";
            HttpClient client = new HttpClient();
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
            HttpResponseMessage response;
            using (ByteArrayContent content = new ByteArrayContent(byteData))
            {
                content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                response = await client.PostAsync(cogSvcUrl, content);
            }

            // Get results
            string webServiceResponseContent = await response.Content.ReadAsStringAsync();
            return webServiceResponseContent;
        }

        public static string GetKey()
        {
            string computerVisionKey = ConfigurationManager.AppSettings["ComputerVisionKey"];
            return computerVisionKey;
        }

    }
}
  

Fig. 1 shows the output when analyzing the image displayed in Fig. 2 (saved in “c:\test\kittens.jpg”).

AnalyzeImage
Fig. 1

Kittens
Fig. 2

This code is not complex, but it is not intuitive (at least not, to me). So, it's useful to understand how to write C# code to pass a binary file to a web service.

Tuesday, July 16, 2019 9:00:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, July 12, 2019

From its earliest days, Microsoft Cognitive Services has had the ability to convert pictures of text into text - process known as Optical Character Recognition. I wrote about using this service here and here.

Recently, Microsoft released a new service to perform OCR. Unlike the previous service, which only requires a single web service call, this service requires two calls: one to pass an image and start the text recognition process; and other to ask the status of that text recognition process and return the transcribed text.

To get started, you will need to create a Computer Vision key, as described here.

Creating this service gives you a URI endpoint to call as a web service, and an API key, which must be passed in the header of web service calls.

Recognize Text

The first call is to the Recognize Text API. To call this API, send an HTTP POST to the following URL:

https://lllll.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=mmmmm

where:

lllll is the location selected when you created the Computer Vision Cognitive Service in Azure; and

mmmmm is "Printed" if the image contains printed text, as from a computer or typewriter; or "Handwritten" if the image contains a picture of handwritten text.

The header of an HTTP request can include name-value pairs. In this request, include the following name-value pairs:

Name Value
Ocp-Apim-Subscription-Key The Computer Vision API key (from the Cognitive Service created above)
Content-Type "application/json", if you plan to pass a URL pointing to an image on the public web;
"application/octet-stream", if you are passing the actual image in the request body.
Details about the request body are described below.

You must pass the image or the URL of the image in the request body. What you pass must be consistent with the "Content-Type" value passed in the header.

If you set the Content-Type header value to "application/json", pass the following JSON in the request body:

{"url":"http://xxxx.com/xxx.xxx"}  

where http://xxxx.com/xxx.xxx is the URL of the image you want to analyze. This image must be accessible to Cognitive Service (e.g., it cannot be behind a firewall or password-protected).

If you set the Content-Type header value to "application/octet-stream", pass the binary image in the request body.

You will receive an HTTP response to your POST. If you receive a response code of "202" ("Accepted"), this is an indication that the POST was successful, and the service is analyzing the image. An "Accepted" response will include the "Operation-Location in its header. The value of this header will contain a URL that you can use to query if the service has finished analyzing the image. The URL will look like the following:

https://lllll.api.cognitiveservices.microsoft.com/vision/v2.0/textOperations/gggggggg-gggg-gggg-gggg-gggggggggggg

where

lllll is the location selected when you created the Computer Vision Cognitive Service in Azure; and

gggggggg-gggg-gggg-gggg-gggggggggggg is a GUID that uniquely identifies the analysis job.

Get Recognize Text Operation Result

After you call the Recognize Text service, you can call the Get Recognize Text Operation Result service to determine if the OCR operation is complete.

To call this service, send an HTTP GET request to the "Operation-Location" URL returned in the request above.

In the header, send the following name-value pair:

Name Value
Ocp-Apim-Subscription-Key The Computer Vision API key (from the Cognitive Service created above)

This is the same value as in the previous request.

An HTTP GET request has no body, so there is nothing to send there.

If the request is successful, you will receive an HTTP "200" ("OK") response code. A successful response does not mean that the image has been analyzed. To know if it has been analyzed, you will need to look at the JSON object returned in the body of the response.

At the root of this JSON object is a property named "status". If the value of this property is "Succeeded", this indicates that the analysis is complete, and the text of the image will also be included in the same JSON object.

Other possible statuses are "NotStarted", "Running" and "Failed".

A successful status will include the recognized text in the JSON document.

At the root of the JSON (the same level as "status") is an object named "recognitionResult". This object contains a child object named "lines".

The "lines" object contains an array of anonymous objects, each of which contains a "boundingBox" object, a "text" object, and a "words" object. Each object in this array represents a line of text.

The "boundingBox" object contains an array of exactly 8 integers, representing the x,y coordinates of the corners an invisible rectangle around the line.

The "text" object contains a string with the full text of the line.

The "words" object contains an array of anonymous objects, each of which contains a "boundingBox" object and a "text" object. Each object in this array represents a single word in this line.

The "boundingBox" object contains an array of exactly 8 integers, representing the x,y coordinates of the corners an invisible rectangle around the word.

The "text" object contains a string with the word.

Below is a sample of a partial result:

{ 
  "status": "Succeeded", 
  "recognitionResult": { 
    "lines": [ 
      { 
        "boundingBox": [ 
          202, 
          618, 
          2047, 
          643, 
          2046, 
          840, 
          200, 
          813 
        ], 
        "text": "The walrus and the carpenter", 
         "words": [ 
          { 
            "boundingBox": [ 
               204, 
              627, 
              481, 
              628, 
              481, 
              830, 
              204, 
               829 
            ], 
            "text": "The" 
           }, 
          { 
            "boundingBox": [ 
              519, 
              628, 
              1057, 
              630, 
               1057, 
              832, 
              518, 
               830 
            ], 
           "text": "walrus" 
          }, 
          ...etc... 
  

In this article, I showed details of the Recognize Text API. In a future article, I will show how to call this service from code within your application.

Friday, July 12, 2019 2:00:09 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, June 12, 2019

In a previous article, I showed how to use the Microsoft Cognitive Services Computer Vision API to perform Optical Character Recognition (OCR) on a document containing a picture of text. We did so by making an HTTP POST to a REST service.

If you are developing with .NET languages, such as C# Visual Basic, or F#, a NuGet Package makes this call easier. Classes in this package abstract the REST call, so can write less and simpler code; and strongly-typed objects allow you to make the call and parse the results more easily.


To get started, you will first need to create a Computer Vision service in Azure and retrieve the endpoint and key, as described here.

Then, you can create a new C# project in Visual Studio. I created a WPF application, which can be found and downloaded at my GitHub account. Look for the project named "OCR-DOTNETDemo". Fig. 1 shows how to create a new WPF project in Visual Studio.

od01-FileNewProject
Fig. 1

In the Solution Explorer, right-click the project and select "Manage NuGet Packages", as shown in Fig. 2.

od02-ManageNuGet
Fig. 2

Search for and install the "Microsoft.Azure.CognitiveServices.Vision.ComputerVision", as shown in Fig. 3.

od03-NuGet
Fig. 3

The important classes in this package are:

  • OcrResult
    A class representing the object returned from the OCR service. It consists of an array of OcrRegions, each of which contains an array of OcrLines, each of which contains an array of OcrWords. Each OcrWord has a text property, representing the text that is recognized. You can reconstruct all the text in an image by looping through each array.
  • ComputerVisionClient
    This class contains the RecognizePrintedTextInStreamAsync method, which abstracts the HTTP REST call to the OCR service.
  • ApiKeyServiceClientCredentials
    This class constructs credentials that will be passed in the header of the HTTP REST call.

Create a new class in the project named "OCRServices" and make its scope "internal" or "public"

Add the following "using" statements to the top of the class:

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
using System.IO;
  


Add the following methods to this class:

Listing 1:

internal static async Task<OcrResult> UploadAndRecognizeImageAsync(string imageFilePath, OcrLanguages language) 
 { 
    string key = "xxxxxxx"; 
    string endPoint = "https://xxxxx.api.cognitive.microsoft.com/"; 
    var credentials = new ApiKeyServiceClientCredentials(key);

    using (var client = new ComputerVisionClient(credentials) { Endpoint = endPoint }) 
    { 
        using (Stream imageFileStream = File.OpenRead(imageFilePath)) 
        { 
             OcrResult ocrResult = await client.RecognizePrintedTextInStreamAsync(false, imageFileStream, language); 
            return ocrResult; 
        } 
    } 
}

internal static async Task<string> FormatOcrResult(OcrResult ocrResult) 
{ 
    var sb = new StringBuilder(); 
    foreach(OcrRegion region in  ocrResult.Regions) 
    { 
        foreach (OcrLine line in region.Lines) 
        { 
             foreach (OcrWord word in line.Words) 
            { 
                 sb.Append(word.Text); 
                sb.Append(" "); 
            } 
            sb.Append("\r\n"); 
        } 
         sb.Append("\r\n\r\n"); 
    } 
    return sb.ToString(); 
}
  

The UploadAndRecognizeImageAsync method calls the HTTP REST OCR service (via the NuGet library extractions) and returns a strongly-typed object representing the results of that call. Replace the key and the endPoint in this method with those associated with your Computer Vision service.

The FormatOcrResult method loops through each region, line, and word of the service's return object. It concatenates the text of each word, separating words by spaces, lines by a carriage return and line feed, and regions by a double carriage return / line feed.

Add a Button and a TextBlock to the MainWindow.xaml form.

In the click event of that button add the following code.

Listing 2:

private async void GetText_Click(object sender, RoutedEventArgs e) 
{ 
    string imagePath = @"xxxxxxx.jpg"; 
    OutputTextBlock.Text = "Thinking…"; 
    var language = OcrLanguages.En; 
    OcrResult ocrResult =  await OCRServices.UploadAndRecognizeImageAsync(imagePath, language); 
     string resultText = await OCRServices.FormatOcrResult(ocrResult); 
    OutputTextBlock.Text = resultText; 
 }
  


Replace xxxxxxx.jpg with the full path of an image file on disc that contains pictures of text.

You will need to add the following using statement to the top of MainWindow.xaml.cs.

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
  

If you like, you can add code to allow users to retrieve an image and display that image on your form. This code is in the sample application from my GitHub repository, if you want to view it.

Running the form should look something like Fig. 4.

od04-RunningApp
Fig. 4

Wednesday, June 12, 2019 9:46:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, June 11, 2019

In a previous article, I described the details of the OCR Service, which is part of the Microsoft Cognitive Services Computer Vision API.

To make this API useful, you need to write some code and build an application that calls this service.

In this article, I will show an example of a JavaScript application that calls the OCR web service.

If you want to follow along, you can find all the code in the "OCRDemo" project, included in this set of demos.

To use this demo project, you will first need to create a Computer Vision API service, as described here.

Read the project's read.me file, which explains the setup you need to do in order to run this with your account.

If you open index.html in the browser, you will see that it displays an image of a poem, along with some controls on the left:

  • A dropdown list to change the poem image
  • A dropdown list to select the language of the poem text
  • A [Get Text] button that calls the web service.

Fig. 1 shows index.html when it first loads:

oj01-WebPage
Fig. 1

    Let's look at the JavaScript that runs when you click the [Get Text] button. You can find it in script.js

    print 'hello world!'$("#GetTextFromPictureButton").click(function () {
         var outputDiv = $("#OutputDiv");
         outputDiv.text("Thinking…");
         var url = $("#ImageUrlDropdown").val();
         var language = $("#LanguageDropdown").val();
    
        try {
             var computerVisionKey = getKey();
         }
         catch(err) {
             outputDiv.html(missingKeyErrorMsg);
             return;
         }
    
        var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr";
        webSvcUrl = webSvcUrl + "?language=" + language;
        $.ajax({
            type: "POST",
            url: webSvcUrl,
            headers: { "Ocp-Apim-Subscription-Key": computerVisionKey },
            contentType: "application/json",
            data: '{ "Url": "' + url + '" }'
        }).done(function (data) {
            outputDiv.text("");
    
            var regionsOfText = data.regions;
            for (var r = 0; r < regionsOfText.length; h++) {
                var linesOfText = data.regions[r].lines;
                for (var l = 0; l < linesOfText.length; l++) {
                    var output = "";
    
                    var thisLine = linesOfText[l];
                    var words = thisLine.words;
                    for (var w = 0; w < words.length; w++) {
                        var thisWord = words[w];
                        output += thisWord.text;
                        output += " ";
                    }
                    var newDiv = "<div>" + output + "</div>";
                    outputDiv.append(newDiv);
    
                }
                outputDiv.append("<hr>");
            }
    
        }).fail(function (err) {
            $("#OutputDiv").text("ERROR!" + err.responseText);
        });
      

    This code uses jQuery to simplify selecting elements, but raw JavaScript would work just as well.

    On the page is an empty div with the id="OutputDiv"

    In the first two lines, we select this div and set its text to "Thinking…" while the web service is being called.

        var outputDiv = $("#OutputDiv");
        outputDiv.text("Thinking…");

    Next, we get the URL of the image containing the currently displayed poem and the selected language. These both come from the selected items of the two dropdowns.

        var url = $("#ImageUrlDropdown").val(); 
        var language = $("#LanguageDropdown").val();
      

    Then, we get the API key, which is in the getKey() function, which is stored in the getkey.js file. You will need to update this file yourself, adding your own key, as described in the read.me.

        try { 
            var computerVisionKey = getKey(); 
        } 
        catch(err) { 
            outputDiv.html(missingKeyErrorMsg); 
            return; 
        }
      

    Now, it's time to call the web service. My Computer Vision API service was created in the West Central US region, so I've hard-coded the URL. You may need  to change this, if you created your service in a different region.

    I add a querystring parameter to the URL to indicate the slected language.

    Then, I call the web service by submitting an HTTP POST request to the web service URL, passing in the appropriate headers and constructing a JSON document to pass in the request body.

        var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr";
        webSvcUrl = webSvcUrl + "?language=" + language;
        $.ajax({
            type: "POST",
            url: webSvcUrl,
            headers: { "Ocp-Apim-Subscription-Key": computerVisionKey },
            contentType: "application/json",
            data: '{ "Url": "' + url + '" }'
      

    Finally, I process the results when the HTTP response returns.

    JavaScript is a dynamic language, so I don't need to create any classes to identify the structure of the JSON that is returned; I just need to know the names of each property.

    The returned JSON contains an array of regions; each region contains an array of lines; and each line contains an array of words.

    In this simple example, I simply loop through each word in each line in each region, concatenating them together and adding some HTML to format line breaks.

    Then, I append this HTML to the outputDiv and follow it up with a horizontal rule to emphasize that it is the end.

        }).done(function (data) { 
            outputDiv.text("");
    
            var regionsOfText = data.regions; 
            for (var r = 0; r < regionsOfText.length; h++) { 
                var linesOfText = data.regions[r].lines; 
                for (var l = 0; l < linesOfText.length; l++) { 
                     var output = "";
    
                    var thisLine = linesOfText[l]; 
                    var words = thisLine.words; 
                     for (var w = 0; w < words.length; w++) { 
                         var thisWord = words[w]; 
                        output += thisWord.text; 
                        output += " "; 
                    } 
                     var newDiv = "<div>" + output + "</div>"; 
                     outputDiv.append(newDiv);
    
                } 
                outputDiv.append("<hr>"); 
            }
      

    I also, catch errors that might occur, displaying a generic message in the outputDiv, where the returned text would have been.

        catch(err) { 
            outputDiv.html(missingKeyErrorMsg); 
            return; 
        }
      

    Fig. 2 shows the results after a successful web service call.

    oj02-Results
    Fig. 2

    Try this yourself to see it in action. The process is very similar in other languages.

    Tuesday, June 11, 2019 9:11:00 AM (GMT Daylight Time, UTC+01:00)
    # Friday, June 7, 2019

    The Microsoft Cognitive Services Computer Vision API contains functionality to infer a lot of information about a given image. One capability is to convert pictures of text into text, a process known as "Optical Characer Recognition" or "OCR".

    Performing OCR on an image is simple and inexpensive. It is done through a web service call; but first, you must set up the Computer Vision Service, as described in this article.

    In that article, you were told to save two pieces of information about the service: The API Key and the URL. Here is where you will use them.

    HTTP Endpoint

    The OCR service is a web service. To call it, you send an HTTP POST request to an HTTP endpoint. The endpoint consists of the URL copied above, followed by "vision/v2.0/ocr", followed by some optional querystring parameters (which we will discuss later).

    So, if you create your service in the EAST US Azure region, the copied URL will be

    https://eastus.api.cognitive.microsoft.com/

    and the HTTP endpoint for the OCR service will be

    https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr

    Querystring Parameters

    The optional querystring parameters are

    language:

    The 2-character language code of the text you are recognizing. This helps the service more accurately and quickly match pictures of words to the words they represent. If you omit this parameter, the system will analyze the text and guess an appropriate language. Currently, the service supports 26 languages. The 2-character code of each supported language is listed in Appendix 1 at the bottom of this article.

    detectOrientation

    "true", if you want the service to adjust the orientation of the image before performing OCR. If you pass "false" or omitting this parameter, the service will assume the image is oriented correctly.

    If you have an image with English text and you want the service to detect and adjust the image's orientation, the above URL becomes:

    https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true

    HTTP Headers

    In the header of the HTTP request, you must add the following name/value pairs:

    Ocp-Apim-Subscription-Key

    The API key you copied above

    Content-Type

    The media type of the image you are passing to the service in the body of the HTTP request

    Possible values are:

    • application/json
    • application/octet-stream
    • multipart/form-data

    The value you pass must be consistent with the data in the body.

    If you select "application/json", you must pass in the request body a URL pointing to the image on the public Internet.

    If you select "application/json" or "application/octet-stream", you must pass the actual binary image in the request body.

    Body

    In the body of the HTTP request, you pass the image you want the service to analyze.

    If you selected "application/json" as the Content-Type in the header, pass a URL within a JSON document, with the following format:

    {"url":"image_url"}

    where image_url is a URL pointing to the image you want to recognize.

    Here is an example:

    {"url":"https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png"}

    If you selected "application/octet-stream" or "multipart/form-data" as the Content-Type in the header, pass the actual binary image in the body of the request.

    The service has some restrictions on the images it can analyze.

    It cannot analyze an image larger than 4MB.

    The width and height of the image must be between 50 and 4,200 pixels

    The image must be one of the following formats: JPEG, PNG, GIF, BMP.

    Sample call with Curl

    Here is an example of a call to the service, using Curl:

    curl -v -X POST "https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: f27c7436c3a64d91a177111a6b594537" --data-ascii "{'url' : 'https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png'}"

    (NOTE: I modified the key, so it will not work. You will need to replace it with your own key if you want this to work.)

    Response

    If all goes well, you will receive an HTTP 200 (OK) response.

    In the body of that response will be the results of the OCR in JSON format.

    At the top level is the language, textAngle, and orientation

    Below that is an array of 0 or more text regions. Each region represents a block of text within the image.

    Each region contains an array of 0 or more lines of text.

    Each line contains an array of 0 or more words.

    Each region, line, and word contains a bounding box, consisting of the left, top, width, and height of the word(s) within.

    Here is a partial example of the JSON returned from a successful web service call:

    {
        "language": "en",
        "textAngle": 0.0,
        "orientation": "Up",
        "regions": [
            {
                "boundingBox": "147,96,622,1095",
                "lines": [
                    {
                        "boundingBox": "408,96,102,56",
                        "words": [
                            {
                                "boundingBox": "408,96,102,56",
                                "text": "Hey"
                            }
                        ]
                    },
                    {
                        "boundingBox": "282,171,350,45",
                        "words": [
                            {
                                "boundingBox": "282,171,164,45",
                                "text": "Diddle"
                            },
                            {
                                "boundingBox": "468,171,164,45",
                                "text": "Diddle"
                            }
                        ]
                    },
                    etc...
                     }
                ]
            }
        ]
    }
      

    The full JSON can be found in Appendix 2 below.

    Errors

    If an error occurs, the response will not by HTTP 200. It will be an HTTP Response code greater than 400. Additional error information will be in the body of the response.

    Common errors include:

    • Images too large or too small
    • Image not found (It might require a password or be behind a firewall)
    • Invalid image format
    • Incorrect API key
    • Incorrect URL (It must match the API key. If you have multiple services, it’s easy to mix them up)
    • Miscellaneous spelling errors (e.g., not entering a valid language code or misspelling a header parameter)

    In this article, I showed how to call the Cognitive Services OCR Computer Vision Service.

    Appendix 1: Supported languages

    zh-Hans (ChineseSimplified)
    zh-Hant (ChineseTraditional)
    cs (Czech)
    da (Danish)
    nl (Dutch)
    en (English)
    fi (Finnish)
    fr (French)
    de (German)
    el (Greek)
    hu (Hungarian)
    it (Italian)
    ja (Japanese)
    ko (Korean)
    nb (Norwegian)
    pl (Polish)
    pt (Portuguese,
    ru (Russian)
    es (Spanish)
    sv (Swedish)
    tr (Turkish)
    ar (Arabic)
    ro (Romanian)
    sr-Cyrl (SerbianCyrillic)
    sr-Latn (SerbianLatin)
    sk (Slovak)

    Appendix 2: JSON Response Example

    {
        "language": "en",
        "textAngle": 0.0,
        "orientation": "Up",
        "regions": [
            {
                "boundingBox": "147,96,622,1095",
                "lines": [
                    {
                        "boundingBox": "408,96,102,56",
                        "words": [
                            {
                                "boundingBox": "408,96,102,56",
                                "text": "Hey"
                            }
                        ]
                    },
                    {
                        "boundingBox": "282,171,350,45",
                        "words": [
                            {
                                "boundingBox": "282,171,164,45",
                                "text": "Diddle"
                            },
                            {
                                "boundingBox": "468,171,164,45",
                                "text": "Diddle"
                            }
                        ]
                    },
                    {
                        "boundingBox": "239,336,441,46",
                        "words": [
                            {
                                "boundingBox": "239,336,87,46",
                                "text": "Hey"
                            },
                            {
                                "boundingBox": "359,337,144,35",
                                "text": "diddle"
                            },
                            {
                                "boundingBox": "536,337,144,35",
                                "text": "diddle"
                            }
                        ]
                    },
                    {
                        "boundingBox": "169,394,576,35",
                        "words": [
                            {
                                "boundingBox": "169,394,79,35",
                                "text": "The"
                            },
                            {
                                "boundingBox": "279,402,73,27",
                                "text": "cat"
                            },
                            {
                                "boundingBox": "383,394,83,35",
                                "text": "and"
                            },
                            {
                                "boundingBox": "500,394,70,35",
                                "text": "the"
                            },
                            {
                                "boundingBox": "604,394,141,35",
                                "text": "fiddle"
                            }
                        ]
                    },
                    {
                        "boundingBox": "260,452,391,50",
                        "words": [
                            {
                                "boundingBox": "260,452,79,35",
                                "text": "The"
                            },
                            {
                                "boundingBox": "370,467,80,20",
                                "text": "cow"
                            },
                            {
                                "boundingBox": "473,452,178,50",
                                "text": "jumped"
                            }
                        ]
                    },
                    {
                        "boundingBox": "277,509,363,35",
                        "words": [
                            {
                                "boundingBox": "277,524,100,20",
                                "text": "over"
                            },
                            {
                                "boundingBox": "405,509,71,35",
                                "text": "the"
                            },
                            {
                                "boundingBox": "509,524,131,20",
                                "text": "moon."
                            }
                        ]
                    },
                    {
                        "boundingBox": "180,566,551,49",
                        "words": [
                            {
                                "boundingBox": "180,566,79,35",
                                "text": "The"
                            },
                            {
                                "boundingBox": "292,566,103,35",
                                "text": "little"
                            },
                            {
                                "boundingBox": "427,566,82,49",
                                "text": "dog"
                            },
                            {
                                "boundingBox": "546,566,185,49",
                                "text": "laughed"
                            }
                        ]
                    },
                    {
                        "boundingBox": "212,623,493,51",
                        "words": [
                            {
                                "boundingBox": "212,631,42,27",
                                "text": "to"
                            },
                            {
                                "boundingBox": "286,638,72,20",
                                "text": "see"
                            },
                            {
                                "boundingBox": "390,623,96,35",
                                "text": "such"
                            },
                            {
                                "boundingBox": "519,638,20,20",
                                "text": "a"
                            },
                            {
                                "boundingBox": "574,631,131,43",
                                "text": "sport."
                            }
                        ]
                    },
                    {
                        "boundingBox": "301,681,312,35",
                        "words": [
                            {
                                "boundingBox": "301,681,90,35",
                                "text": "And"
                            },
                            {
                                "boundingBox": "425,681,70,35",
                                "text": "the"
                            },
                            {
                                "boundingBox": "528,681,85,35",
                                "text": "dish"
                            }
                        ]
                    },
                    {
                        "boundingBox": "147,738,622,50",
                        "words": [
                            {
                                "boundingBox": "147,753,73,20",
                                "text": "ran"
                            },
                            {
                                "boundingBox": "255,753,114,30",
                                "text": "away"
                            },
                            {
                                "boundingBox": "401,738,86,35",
                                "text": "with"
                            },
                            {
                                "boundingBox": "519,738,71,35",
                                "text": "the"
                            },
                            {
                                "boundingBox": "622,753,147,35",
                                "text": "spoon."
                            }
                        ]
                    },
                    {
                        "boundingBox": "195,1179,364,12",
                        "words": [
                            {
                                "boundingBox": "195,1179,45,12",
                                "text": "Nursery"
                            },
                            {
                                "boundingBox": "242,1179,38,12",
                                "text": "Rhyme"
                            },
                            {
                                "boundingBox": "283,1179,36,9",
                                "text": "Charts"
                            },
                            {
                                "boundingBox": "322,1179,28,12",
                                "text": "from"
                            },
                            {
                                "boundingBox": "517,1179,11,10",
                                "text": "C"
                            },
                            {
                                "boundingBox": "531,1179,28,9",
                                "text": "2017"
                            }
                        ]
                    },
                    {
                        "boundingBox": "631,1179,90,12",
                        "words": [
                            {
                                "boundingBox": "631,1179,9,9",
                                "text": "P"
                            },
                            {
                                "boundingBox": "644,1182,6,6",
                                "text": "a"
                            },
                            {
                                "boundingBox": "655,1182,7,9",
                                "text": "g"
                            },
                            {
                                "boundingBox": "667,1182,7,6",
                                "text": "e"
                            },
                            {
                                "boundingBox": "690,1179,31,12",
                                "text": "7144"
                            }
                        ]
                    }
                ]
            }
        ]
    }
      
    Friday, June 7, 2019 9:09:00 AM (GMT Daylight Time, UTC+01:00)
    # Wednesday, June 5, 2019

    The Microsoft Cognitive Services Computer Vision API contains functionality to infer a lot of information about a given image.

    As of this writing, the API is on version 2.0 and supports the following capabilities:

    Analyze an Image

    Get general information about an image, such as the objects found, what each object is and where it is located. It can even identify potentially pornographic images.

    Analyze Faces

    Find the location of each face in a video and determine information about each face, such as is age, gender, and type of facial hair or glasses.

    Optical Character Recognition (OCR)

    Convert a picture of text into text

    Recognize Celebrities

    Recognize famous people from photos of their face

    Recognize Landmarks

    Recognize famous landmarks, such as the Statue of Liberty or Diamond Head Volcano.

    Analyze Video

    Retrieve keywords to describe a video at different points in time as it plays.

    Generate a Thumbnail

    Change the size and shape of an image, without cropping out the main subject.

    Getting Started

    To get started, you need to create a Computer Vision Service. To do this, navigate to the Azure Portal, login in, click the [Create a resource] button (Fig. 1) and enter "Computer Vision" in the Search box, as shown in Fig. 2.

    cv01-CreateResource
    Fig. 1

    cv02-SearchForComputerVision
    Fig. 2

    A dialog displays, with information about the Computer Vision Service, as shown in Fig. 3.

    cv03-ComputerVisionSplashPage
    Fig. 3

    Click the [Create] button to display the Create Computer Vision Service blade, as shown in Fig. 4.

    cv04-NewSvc
    Fig. 4

    At the "Name" field, enter a name by which you can easily identify this service. This name must be unique among your services, but need not be globally unique.

    At the "Subscription" field, select the Subscription with which you want to associate this service. Most of you will only have one subscription.

    At the "Location" field, select the Azure Region in which to store this service. Consider where the users of this service will be, so you can reduce latency.

    At the "Pricing tier" field, select "F0" to use this service for free or "S1" to incur a small charge for each call to the service. If you select the free service, you will be limited in the number and frequency of calls that can be made.

    At the "Resource group" field, select a resource group in which to store your service or click "Create new" to store it in a newly-created resource group. A resource group is a logical container for Azure resources.

    Click the [Create] button to create the Computer Vision service.

    Usually, it takes less than a minute to create a Computer Vision Service. When Azure has created this service, you can navigate to it by its name or the name of the resource group.

    Two pieces of information are critical when using the service: The Endpoint and the API keys.

    The Endpoint can be found on the service's Overview blade, as shown in Fig. 5.

    cv05-OverviewBlade
    Fig. 5

    The API Keys can be found on the service's "Keys" blade, as shown in Fig. 6. There are 2 keys, in case one key is compromised; you can use the other key, while the first is regenerated, in order to minimize downtime.

    cv06-KeysBlade
    Fig. 6

    Copy the URL and and one of the API keys. You will need it to call the web services. We will describe how to make specific calls in future articles.

    Wednesday, June 5, 2019 4:46:00 PM (GMT Daylight Time, UTC+01:00)
    # Thursday, January 24, 2019

    GCast 32:

    Handwriting OCR with Cognitive Services

    See how to perform OCR on images with handwritten text, using Microsoft Cognitive Services. I walk through the API and show sample JavaScript code.

    Thursday, January 24, 2019 8:21:00 AM (GMT Standard Time, UTC+00:00)
    # Thursday, January 17, 2019

    GCast 31:

    OCR with Cognitive Services

    Cognitive Services can automatically detect text from pictures of text. This video shows how.

    Thursday, January 17, 2019 8:17:00 AM (GMT Standard Time, UTC+00:00)
    # Thursday, January 10, 2019

    GCast 30:

    Creating Applications with the Analyze Image Cognitive Services API

    Learn how to create C# and node applications using the "Analyze Image" service of the Microsoft Cognitive Services Vision API.

    Thursday, January 10, 2019 7:28:00 AM (GMT Standard Time, UTC+00:00)
    # Thursday, January 3, 2019

    GCast 29:

    Introducing Cognitive Services and Computer Vision

    Microsoft Cognitive Services allow you to take advantage of Machine Learning without all the complexities of Machine Learning. In this video, I introduce Cognitive Services by showing how to use Computer Vision to analyze an image, automatically detecting properties of that image.

    Thursday, January 3, 2019 12:53:21 PM (GMT Standard Time, UTC+00:00)
    # Thursday, December 27, 2018

    GCast 28:

    Natural Language Processing with LUIS

    Learn how to use Microsoft Language Understanding Information Service (LUIS) to build models that provide Natural Language Processing (NLP) for your application.

    Thursday, December 27, 2018 9:53:00 AM (GMT Standard Time, UTC+00:00)
    # Monday, September 10, 2018
    Monday, September 10, 2018 9:29:00 AM (GMT Daylight Time, UTC+01:00)
    # Wednesday, August 15, 2018

    Here is my presentation "Building and Training your own Custom Image Recognition AI" that I delivered in June at NDC-Oslo in Norway.

    Building and Training your own Custom Image Recognition AI
    Wednesday, August 15, 2018 9:53:00 AM (GMT Daylight Time, UTC+01:00)
    # Saturday, December 30, 2017

    As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language. One of these APIs is a REST web service that can determine the words and punctuation contained in a picture. This is accomplished by a simple REST web service call.

    The Cognitive Services Optical Character Recognition (OCR) service is part of the Custom Vision API. It takes as input a picture of text and returns the words found in the image.

    To get started, you will need an Azure account and a Cognitive Services Vision API key.

    If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.

    Once you have an Azure Account,  follow the instructions in this article to generate a Cognitive Services Computer Vision key.

    To use this API, you simply have to make a POST request to the following URL:
    https://[location].api.cognitive.microsoft.com/vision/v1.0/ocr

    where [location] is the Azure location where you created your API key (above).

    Optionally, you can add the following 2 querystring parameters to the URL:

    • Language: the 2-digit language abbreviation abbreviation. Use “en” for English. Currently, 25 languages are supported. If omitted, the service will attempt to auto-detect the language
    • detectOrientation: Set this to “true” if you want to support upside-down or rotated images.

    The HTTP header of the request should include the following:

    Ocp-Apim-Subscription-Key.     
    The Cognitive Services Computer Vision key you generated above.

    Content-Type

    This tells the service how you will send the image. The options are:

    • application/json
    • application/octet-stream
    • multipart/form-data

    If the image is accessible via a public URL, set the Content-Type to application/json and send JSON in the body of the HTTP request in the following format

    {"url":"imageurl"}
    where imageurl is a public URL pointing to the image. For example, to perform OCR on an image of an Edgar Allen Poe poem, submit the following JSON:

    {"url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png"}

    DreamWithinADream

    If you plan to send the image itself to the web service, set the content type to either "application/octet-stream" or “multipart/form-data” and submit the binary image in the body of the HTTP request.

    The full request looks something like:  

    POST https://westus.api.cognitive.microsoft.com/vision/v1.0/ocr HTTP/1.1
    Content-Type: application/json
    Host: westus.api.cognitive.microsoft.com
    Content-Length: 62
    Ocp-Apim-Subscription-Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    { "url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png" }

    For example, passing a URL with the following picture:

     DreamWithinADream
      (found online at http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png)

    returned the following data: 

    {
      "textAngle": 0.0,
      "orientation": "NotDetected",
      "language": "en",
      "regions": [
        {
          "boundingBox": "31,6,435,478",
          "lines": [
            {
              "boundingBox": "114,6,352,23",
              "words": [
                {
                  "boundingBox": "114,6,24,22",
                  "text": "A"
                },
                {
                  "boundingBox": "144,6,93,23",
                   "text": "Dream"
                },
                {
                   "boundingBox": "245,6,95,23",
                  "text": "Within"
                },
                {
                  "boundingBox": "350,12,14,16",
                  "text": "a"
                },
                {
                  "boundingBox": "373,6,93,23",
                  "text": "Dream"
                }
              ]
            },
            {
               "boundingBox": "31,50,187,16",
              "words": [
                 {
                  "boundingBox": "31,50,31,12",
                   "text": "Take"
                },
                {
                  "boundingBox": "66,50,23,12",
                  "text": "this"
                 },
                {
                  "boundingBox": "93,50,24,12",
                  "text": "kiss"
                },
                {
                   "boundingBox": "121,54,33,12",
                  "text": "upon"
                },
                {
                  "boundingBox": "158,50,19,12",
                  "text": "the"
                },
                 {
                  "boundingBox": "181,50,37,12",
                   "text": "brow!"
                }
              ]
            },
            {
              "boundingBox": "31,67,194,16",
              "words": [
                 {
                  "boundingBox": "31,67,31,15",
                   "text": "And,"
                },
                {
                  "boundingBox": "67,67,12,12",
                  "text": "in"
                 },
                {
                  "boundingBox": "82,67,46,16",
                  "text": "parting"
                },
                {
                  "boundingBox": "132,67,31,12",
                  "text": "from"
                },
                {
                  "boundingBox": "167,71,25,12",
                  "text": "you"
                },
                 {
                  "boundingBox": "195,71,30,11",
                   "text": "now,"
                }
              ]
            },
             {
              "boundingBox": "31,85,159,12",
              "words": [
                {
                  "boundingBox": "31,85,32,12",
                   "text": "Thus"
                },
                {
                   "boundingBox": "67,85,35,12",
                  "text": "much"
                },
                {
                  "boundingBox": "107,86,16,11",
                  "text": "let"
                },
                 {
                  "boundingBox": "126,89,20,8",
                  "text": "me"
                },
                {
                  "boundingBox": "150,89,40,8",
                  "text": "avow-"
                }
              ]
            },
            {
              "boundingBox": "31,102,193,16",
              "words": [
                {
                  "boundingBox": "31,103,26,11",
                  "text": "You"
                 },
                {
                  "boundingBox": "61,106,19,8",
                  "text": "are"
                },
                {
                   "boundingBox": "84,104,21,10",
                  "text": "not"
                },
                {
                  "boundingBox": "109,106,44,12",
                  "text": "wrong,"
                },
                 {
                  "boundingBox": "158,102,27,12",
                   "text": "who"
                },
                {
                  "boundingBox": "189,102,35,12",
                  "text": "deem"
                 }
              ]
            },
            {
              "boundingBox": "31,120,214,16",
              "words": [
                {
                   "boundingBox": "31,120,29,12",
                  "text": "That"
                },
                {
                  "boundingBox": "64,124,21,12",
                  "text": "my"
                },
                {
                  "boundingBox": "89,121,29,15",
                  "text": "days"
                },
                {
                  "boundingBox": "122,120,30,12",
                  "text": "have"
                },
                {
                  "boundingBox": "156,121,30,11",
                  "text": "been"
                },
                {
                   "boundingBox": "191,124,7,8",
                  "text": "a"
                },
                {
                  "boundingBox": "202,121,43,14",
                  "text": "dream;"
                }
               ]
            },
            {
              "boundingBox": "31,138,175,16",
              "words": [
                {
                  "boundingBox": "31,139,22,11",
                  "text": "Yet"
                },
                 {
                  "boundingBox": "57,138,11,12",
                   "text": "if"
                },
                {
                  "boundingBox": "70,138,31,16",
                  "text": "hope"
                 },
                {
                  "boundingBox": "105,138,21,12",
                  "text": "has"
                },
                {
                   "boundingBox": "131,138,37,12",
                  "text": "flown"
                },
                {
                  "boundingBox": "172,142,34,12",
                  "text": "away"
                }
              ]
            },
            {
              "boundingBox": "31,155,140,16",
              "words": [
                {
                  "boundingBox": "31,156,13,11",
                  "text": "In"
                 },
                {
                  "boundingBox": "48,159,8,8",
                   "text": "a"
                },
                {
                   "boundingBox": "59,155,37,16",
                  "text": "night,"
                },
                {
                  "boundingBox": "100,159,14,8",
                  "text": "or"
                },
                 {
                  "boundingBox": "118,155,12,12",
                  "text": "in"
                },
                {
                  "boundingBox": "134,159,7,8",
                  "text": "a"
                },
                 {
                  "boundingBox": "145,155,26,16",
                   "text": "day,"
                }
              ]
            },
             {
              "boundingBox": "31,173,144,15",
              "words": [
                {
                  "boundingBox": "31,174,13,11",
                  "text": "In"
                },
                {
                   "boundingBox": "48,177,8,8",
                  "text": "a"
                 },
                {
                  "boundingBox": "59,173,43,15",
                  "text": "vision,"
                },
                 {
                  "boundingBox": "107,177,13,8",
                  "text": "or"
                },
                {
                  "boundingBox": "124,173,12,12",
                  "text": "in"
                },
                {
                  "boundingBox": "140,177,35,11",
                   "text": "none,"
                }
              ]
            },
            {
              "boundingBox": "31,190,180,16",
              "words": [
                {
                  "boundingBox": "31,191,11,11",
                  "text": "Is"
                },
                {
                   "boundingBox": "47,190,8,12",
                  "text": "it"
                },
                {
                  "boundingBox": "59,190,58,12",
                  "text": "therefore"
                },
                 {
                  "boundingBox": "121,190,19,12",
                   "text": "the"
                },
                {
                   "boundingBox": "145,191,23,11",
                  "text": "less"
                 },
                {
                  "boundingBox": "173,191,38,15",
                  "text": "gone?"
                }
              ]
            },
            {
              "boundingBox": "31,208,150,12",
              "words": [
                {
                  "boundingBox": "31,208,20,12",
                  "text": "All"
                },
                 {
                  "boundingBox": "55,208,24,12",
                   "text": "that"
                },
                {
                  "boundingBox": "83,212,19,8",
                  "text": "we"
                 },
                {
                  "boundingBox": "107,212,19,8",
                  "text": "see"
                },
                {
                   "boundingBox": "131,212,13,8",
                  "text": "or"
                },
                {
                  "boundingBox": "148,212,33,8",
                  "text": "seem"
                }
               ]
            },
            {
              "boundingBox": "31,226,194,12",
              "words": [
                {
                  "boundingBox": "31,227,11,11",
                  "text": "Is"
                },
                 {
                  "boundingBox": "46,226,21,12",
                   "text": "but"
                },
                {
                  "boundingBox": "71,230,7,8",
                  "text": "a"
                 },
                {
                  "boundingBox": "82,226,40,12",
                  "text": "dream"
                },
                {
                   "boundingBox": "126,226,41,12",
                  "text": "within"
                },
                {
                  "boundingBox": "171,230,7,8",
                  "text": "a"
                },
                 {
                  "boundingBox": "182,226,43,12",
                   "text": "dream."
                }
              ]
            },
             {
              "boundingBox": "31,261,133,12",
              "words": [
                {
                  "boundingBox": "31,262,5,11",
                   "text": "I"
                },
                {
                   "boundingBox": "41,261,33,12",
                  "text": "stand"
                 },
                {
                  "boundingBox": "78,261,32,12",
                  "text": "amid"
                },
                {
                  "boundingBox": "114,261,19,12",
                  "text": "the"
                },
                {
                  "boundingBox": "137,265,27,8",
                  "text": "roar"
                }
              ]
            },
            {
              "boundingBox": "31,278,169,15",
              "words": [
                {
                  "boundingBox": "31,278,18,12",
                  "text": "Of"
                 },
                {
                  "boundingBox": "52,282,7,8",
                  "text": "a"
                },
                {
                   "boundingBox": "63,278,95,12",
                  "text": "surf-tormented"
                },
                {
                  "boundingBox": "162,278,38,15",
                  "text": "shore,"
                }
              ]
            },
            {
              "boundingBox": "31,296,174,15",
              "words": [
                {
                  "boundingBox": "31,296,28,12",
                  "text": "And"
                 },
                {
                  "boundingBox": "63,297,4,11",
                  "text": "I"
                },
                {
                   "boundingBox": "72,296,28,12",
                  "text": "hold"
                },
                {
                  "boundingBox": "104,296,41,12",
                  "text": "within"
                },
                 {
                  "boundingBox": "149,300,20,11",
                   "text": "my"
                },
                {
                  "boundingBox": "173,296,32,12",
                  "text": "hand"
                 }
              ]
            },
            {
              "boundingBox": "31,314,169,16",
              "words": [
                {
                   "boundingBox": "31,314,42,12",
                  "text": "Grains"
                },
                {
                  "boundingBox": "78,314,15,12",
                  "text": "of"
                },
                 {
                  "boundingBox": "95,314,19,12",
                  "text": "the"
                },
                {
                  "boundingBox": "119,315,43,15",
                  "text": "golden"
                 },
                {
                  "boundingBox": "167,314,33,12",
                  "text": "sand-"
                }
              ]
             },
            {
              "boundingBox": "31,331,189,16",
               "words": [
                {
                  "boundingBox": "31,332,31,11",
                  "text": "How"
                },
                 {
                  "boundingBox": "66,331,28,12",
                  "text": "few!"
                },
                {
                  "boundingBox": "99,333,20,14",
                  "text": "yet"
                },
                {
                  "boundingBox": "123,331,27,12",
                   "text": "how"
                },
                {
                   "boundingBox": "154,331,28,16",
                  "text": "they"
                },
                {
                  "boundingBox": "186,335,34,12",
                  "text": "creep"
                }
               ]
            },
            {
              "boundingBox": "31,349,206,16",
              "words": [
                {
                  "boundingBox": "31,349,55,16",
                  "text": "Through"
                },
                {
                  "boundingBox": "90,353,20,11",
                   "text": "my"
                },
                {
                   "boundingBox": "115,349,44,16",
                  "text": "fingers"
                },
                {
                  "boundingBox": "163,351,12,10",
                  "text": "to"
                },
                 {
                  "boundingBox": "179,349,20,12",
                   "text": "the"
                },
                {
                  "boundingBox": "203,350,34,15",
                  "text": "deep,"
                 }
              ]
            },
            {
              "boundingBox": "31,366,182,16",
              "words": [
                {
                   "boundingBox": "31,366,39,12",
                  "text": "While"
                },
                {
                  "boundingBox": "74,367,5,11",
                  "text": "I"
                },
                {
                  "boundingBox": "83,370,39,12",
                  "text": "weep-"
                },
                {
                  "boundingBox": "126,366,36,12",
                  "text": "while"
                 },
                {
                  "boundingBox": "166,367,5,11",
                  "text": "I"
                },
                {
                   "boundingBox": "175,367,38,15",
                  "text": "weep!"
                }
              ]
            },
            {
              "boundingBox": "31,384,147,16",
              "words": [
                {
                   "boundingBox": "31,385,11,11",
                  "text": "O"
                },
                {
                  "boundingBox": "47,384,31,12",
                  "text": "God!"
                },
                 {
                  "boundingBox": "84,388,21,8",
                   "text": "can"
                },
                {
                  "boundingBox": "110,385,4,11",
                  "text": "I"
                 },
                {
                  "boundingBox": "119,386,20,10",
                  "text": "not"
                },
                {
                   "boundingBox": "144,388,34,12",
                  "text": "grasp"
                }
              ]
            },
            {
              "boundingBox": "31,402,170,16",
              "words": [
                {
                  "boundingBox": "31,402,37,12",
                  "text": "Them"
                },
                {
                  "boundingBox": "72,402,29,12",
                  "text": "with"
                },
                {
                  "boundingBox": "105,406,7,8",
                   "text": "a"
                },
                {
                  "boundingBox": "116,402,42,16",
                  "text": "tighter"
                },
                {
                  "boundingBox": "162,403,39,15",
                  "text": "clasp?"
                }
               ]
            },
            {
              "boundingBox": "31,419,141,12",
              "words": [
                {
                  "boundingBox": "31,420,11,11",
                  "text": "O"
                },
                 {
                  "boundingBox": "47,419,31,12",
                   "text": "God!"
                },
                {
                  "boundingBox": "84,423,21,8",
                  "text": "can"
                 },
                {
                  "boundingBox": "110,420,4,11",
                  "text": "I"
                },
                {
                   "boundingBox": "119,421,20,10",
                  "text": "not"
                },
                {
                  "boundingBox": "144,423,28,8",
                  "text": "save"
                }
               ]
            },
            {
              "boundingBox": "31,437,179,16",
              "words": [
                {
                  "boundingBox": "31,438,26,11",
                  "text": "One"
                },
                {
                  "boundingBox": "62,437,31,12",
                   "text": "from"
                },
                {
                   "boundingBox": "97,437,19,12",
                  "text": "the"
                 },
                {
                  "boundingBox": "120,437,45,16",
                  "text": "pitiless"
                },
                 {
                  "boundingBox": "169,438,41,11",
                   "text": "wave?"
                }
              ]
            },
            {
              "boundingBox": "31,454,161,12",
              "words": [
                {
                  "boundingBox": "31,455,11,11",
                   "text": "Is"
                },
                {
                   "boundingBox": "47,454,15,12",
                  "text": "all"
                 },
                {
                  "boundingBox": "66,454,25,12",
                  "text": "that"
                },
                {
                  "boundingBox": "94,458,19,8",
                  "text": "we"
                },
                {
                  "boundingBox": "118,458,19,8",
                  "text": "see"
                },
                 {
                  "boundingBox": "142,458,13,8",
                   "text": "or"
                },
                {
                  "boundingBox": "159,458,33,8",
                  "text": "seem"
                 }
              ]
            },
            {
              "boundingBox": "31,472,185,12",
              "words": [
                {
                   "boundingBox": "31,473,23,11",
                  "text": "But"
                 },
                {
                  "boundingBox": "58,476,7,8",
                  "text": "a"
                },
                {
                   "boundingBox": "69,472,40,12",
                  "text": "dream"
                },
                {
                  "boundingBox": "113,472,41,12",
                  "text": "within"
                },
                {
                  "boundingBox": "158,476,7,8",
                   "text": "a"
                },
                {
                  "boundingBox": "169,472,47,12",
                  "text": "dream?"
                }
              ]
            }
          ]
        }
      ]
    }
      

    Note that the image is split into an array of regions; each region contains an array of lines; and each line contains an array of words. This is done so that you can replace or block out one or more specific words, lines, or regions.

    Below is a jQuery code snippet making a request to this service to perform OCR on images of text. You can download the full application at https://github.com/DavidGiard/CognitiveSvcsDemos.

        var language = $("#LanguageDropdown").val();
        var computerVisionKey = getKey() || "Copy your Subscription key here";
        var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";     
        webSvcUrl = webSvcUrl + "?language=" + language;
    $.ajax({
        type: "POST",
        url: webSvcUrl,
        headers: { "Ocp-Apim-Subscription-Key": computerVisionKey },
        contentType: "application/json",
        data: '{ "Url": "' + url + '" }'
    }).done(function (data) {
        outputDiv.text("");

        var regionsOfText = data.regions;
        for (var h = 0; h < regionsOfText.length; h++) {
            var linesOfText = data.regions[h].lines;
            for (var i = 0; i < linesOfText.length; i++) {
                var output = "";

                var thisLine = linesOfText[i];
                var words = thisLine.words;
                for (var j = 0; j < words.length; j++) {
                     var thisWord = words[j];
                    output += thisWord.text;
                    output += " ";

                }
                var newDiv = "<div>" + output + "</div>";
                 outputDiv.append(newDiv);

            }
            outputDiv.append("<hr>");
        }
                   
    }).fail(function (err) {
        $("#OutputDiv").text("ERROR!" + err.responseText);
    });

    You can find the full documentation – including an in-browser testing tool - for this API here.

    Sending requests to the Cognitive Services OCR API makes it simple to convert a picture of text into text.  

    Saturday, December 30, 2017 10:31:00 AM (GMT Standard Time, UTC+00:00)
    # Friday, December 29, 2017

    It's difficult enough for humans to recognize emotions in the faces of other humans. Can a computer accomplish this task? It can if we train it to and if we give it enough examples of different faces with different emotions.

    When we supply data to a computer with the objective of training that computer to recognize patterns and predict new data, we call that Machine Learning. And Microsoft has done a lot of Machine Learning with a lot of faces and a lot of data and they are exposing the results for you to use.

    As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language.

    The Cognitive Services Emotions API looks at photographs of people and determines the emotion of each person in the photo. Supported emotions are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. Each emotion is assigned a score between 0 and 1 - higher numbers indicate a high confidence that this is the emotion expressed in the face. If a picture contains multiple faces, the emotion of each face is returned.

    To get started, you will need an Azure account and a Cognitive Services Vision API key.

    If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.

    Once you have an Azure Account,  follow the instructions in this article to generate a Cognitive Services Computer Vision key.

    To use this API, you simply have to make a POST request to the following URL:
    https://[location].api.cognitive.microsoft.com/vision/v1.0/recognize

    where [location] is the Azure location where you created your API key (above).

    The HTTP header of the request should include the following:

    Ocp-Apim-Subscription-Key.
    This is the Cognitive Services Computer Vision key you generated above.

    Content-Type

    This tells the service how you will send the image. The options are:

    • application/json
    • application/octet-stream

    If the image is accessible via a public URL, set the Content-Type to application/json and send JSON in the body of the HTTP request in the following format

    {"url":"imageurl"}
    where imageurl is a public URL pointing to the image. For example, to generate a thumbnail of this picture of a happy face and a not happy face,

    TwoEmotions

    submit the following JSON:

    {"url":"http://davidgiard.com/content/binary/Open-Live-Writer/Using-the-Cognitive-Services-Emotion-API_14A56/TwoEmotions_2.jpg"}

    If you plan to send the image itself to the web service, set the content type to "application/octet-stream" and submit the binary image in the body of the HTTP request.

    A full request looks something like this:

    The full request looks something like:

    POST https://westus.api.cognitive.microsoft.com/emotion/v1.0/recognize HTTP/1.1
    Content-Type: application/json
    Host: westus.api.cognitive.microsoft.com
    Content-Length: 62
    Ocp-Apim-Subscription-Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    { "url": "http://xxxx.com/xxxx.jpg" }

    For example, passing a URL with a picture below of 3 attractive, smiling people

    BrianAnnaDavid   

    (found online at https://giard.smugmug.com/Tech-Community/SpartaHack-2016/i-4FPV9bf/0/X2/SpartaHack-068-X2.jpg)

    returned the following data: 

    [
      {
        "faceRectangle": {
          "height": 113,
           "left": 285,
          "top": 156,
          "width": 113
        },
        "scores": {
          "anger": 1.97831262E-09,
          "contempt": 9.096525E-05,
          "disgust": 3.86221245E-07,
          "fear": 4.26409547E-10,
          "happiness": 0.998336,
          "neutral": 0.00156954059,
          "sadness": 8.370223E-09,
          "surprise": 3.06117772E-06
        }
      },
      {
        "faceRectangle": {
           "height": 108,
          "left": 831,
          "top": 169,
          "width": 108
        },
        "scores": {
          "anger": 2.63808062E-07,
          "contempt": 5.387114E-08,
          "disgust": 1.3360991E-06,
          "fear": 1.407629E-10,
          "happiness": 0.9999967,
          "neutral": 1.63170478E-06,
          "sadness": 2.52861843E-09,
          "surprise": 1.91028926E-09
        }
      },
      {
         "faceRectangle": {
          "height": 100,
          "left": 591,
          "top": 168,
          "width": 100
        },
        "scores": {
          "anger": 3.24157673E-10,
          "contempt": 4.90155344E-06,
          "disgust": 6.54665473E-06,
          "fear": 1.73284559E-06,
          "happiness": 0.9999156,
          "neutral": 6.42121E-05,
          "sadness": 7.02297257E-06,
          "surprise": 5.53670576E-09
        }
      }
    ]   

    A high value for the 3 happiness scores and the very low values for all the other scores suggest a very high degree of confidence that each person in this photo  happy. is

    Here is the request in the popular HTTP analysis tool Fiddler [http://www.telerik.com/fiddler]:
    Request

    Em01-Fiddler-Request

    Response:
    Em02-Fiddler-Response 

    Below is a C# code snippet making a request to this service to analyze the emotions of the people in an online photograph. You can download the full application at https://github.com/DavidGiard/CognitiveSvcsDemos.

    string emotionApiKey = "XXXXXXXXXXXXXXXXXXXXXXX";
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", emotionApiKey);
        string uri = "https://westus.api.cognitive.microsoft.com/emotion/v1.0/recognize";
    HttpResponseMessage response;
    var json = "{'url': '" + imageUrl + "'}";
    byte[] byteData = Encoding.UTF8.GetBytes(json);
    using (var content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new MediaTypeHeaderValue("application/json");
        response = await client.PostAsync(uri, content);
    }

    if (response.IsSuccessStatusCode)
    {
        var data = await response.Content.ReadAsStringAsync();
    }

    You can find the full documentation – including an in-browser testing tool - for this API here.

    Sending requests to the Cognitive Services Emotion API makes it simple to analyze the emotions of people in a photograph.  

    Friday, December 29, 2017 10:43:00 AM (GMT Standard Time, UTC+00:00)
    # Thursday, December 28, 2017

    Generating a thumbnail image from a larger image sounds easy – just shrink the dimensions of the original, right? But it becomes more complicated if the thumbnail image is a different shape than the original. For example, the original image may be rectangular but we need the new image to be a square. Or we may need to generate a portrait-oriented thumbnail from a landscape-oriented original image. In these cases, we will need to crop or distort the original image when we create the thumbnail. Distorting the image tends to look very bad; and when we crop an image, we want ensure that the primary subject of the image remains in the generated thumbnail. To do this, we need to identify the primary subject of the image. That's easy enough for a human observer to do, but a difficult thing for a computer to do. But if we want to automate this process, we will have to ask the computer to do exactly that.

    This is where machine learning can help. By analyzing many images, Machine Learning can figure out what parts of a picture are likely to be the main subject. Once this is known, it becomes a simpler matter to crop the picture in such a way that the main subject remains in the generated thumbnail.

    As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language.

    The Cognitive Services Vision API uses Machine Learning so that you don't have to. It exposes a web service to return an intelligent thumbnail image from any picture.

    You can see this in action here.

    Scroll down the the section titled "Generate a thumbnail" to see the Thumbnail generator as shown in Figure 1. 

    Th01
    Figure 1

    With this live, in-browser demo, you can either select an image from the gallery and view the generated thumbnails; or provide your own image - either from your local computer or from a public URL. The page uses the Thumbnail API to create thumbnails of 6 different dimensions.
     
    For your own application, you can either call the REST Web Service directly or (for a .NET application) use a custom library. The library simplifies development by abstracting away HTTP calls via strongly-typed objects.

    To get started, you will need an Azure account and a Cognitive Services Vision API key.

    If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.

    Once you have an Azure Account,  follow the instructions in this article to generate a Cognitive Services Computer Vision key.

         

    To use this API, you simply have to make a POST request to the following URL:
    https://[location].api.cognitive.microsoft.com/vision/v1.0/generateThumbnail?width=ww&height=hh&smartCropping=true

    where [location] is the Azure location where you created your API key (above) and ww and hh are the desired width and height of the thumbnail to generate.

    The “smartCropping” parameter tells the service to determine the main subject of the photo and to try keep it in the thumbnail while cropping.

    The HTTP header of the request should include the following:

    Ocp-Apim-Subscription-Key.     
    The Cognitive Services Computer Vision key you generated above.

    Content-Type

    This tells the service how you will send the image. The options are:   

    • application/json    
    • application/octet-stream    
    • multipart/form-data

    If the image is accessible via a public URL, set the Content-Type to application/json and send JSON in the body of the HTTP request in the following format

    {"url":"imageurl"}
    where imageurl is a public URL pointing to the image. For example, to generate a thumbnail of this picture of a skier, submit the following JSON:

    {"url":"http://mezzotint.de/wp-content/uploads/2014/12/2013-skier-edge-01-Kopie.jpg"}

    Man skiing  alps

    If you plan to send the image itself to the web service, set the content type to either "application/octet-stream" or "multipart/form-data" and submit the binary image in the body of the HTTP request.

    Here is a sample console application that uses the service to generate a thumbnail from a file on disc. You can download the full source code at
    https://github.com/DavidGiard/CognitiveSvcsDemos

    Note: You will need to create the folder "c:\test" to store the generated thumbnail.

       

                 // TODO: Replace this value with your Computer Vision API Key
                string computerVisionKey = "XXXXXXXXXXXXXXXX"

                var client = new HttpClient();
                var queryString = HttpUtility.ParseQueryString(string.Empty);

                client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);

                queryString["width"] = "300";
                queryString["height"] = "300";
                queryString["smartCropping"] = "true";
                var uri = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/generateThumbnail?" + queryString;

                HttpResponseMessage response;

                string originalPicture = "http://davidgiard.com/content/Giard/_DGInAppleton.png";
                var jsonBody = "{'url': '" + originalPicture + "'}";
                byte[] byteData = Encoding.UTF8.GetBytes(jsonBody);

                using (var content = new ByteArrayContent(byteData))
                {
                     content.Headers.ContentType = new MediaTypeHeaderValue("application/json");
                    response = await client.PostAsync(uri, content);
                }       
                if (response.StatusCode == System.Net.HttpStatusCode.OK)
                {
                     // Write thumbnail to file
                    var responseContent = await response.Content.ReadAsByteArrayAsync();
                     string folder = @"c:\test";
                    string thumbnaileFullPath = string.Format("{0}\\thumbnailResult_{1:yyyMMddhhmmss}.jpg", folder, DateTime.Now);
                    using (BinaryWriter binaryWrite = new BinaryWriter(new FileStream(thumbnaileFullPath, FileMode.Create, FileAccess.Write)))
                     {
                        binaryWrite.Write(responseContent);
                    }
                    // Show BEFORE and AFTER to user
                    Process.Start(thumbnaileFullPath);
                     Process.Start(originalPicture);
                    Console.WriteLine("Done! Thumbnail is at {0}!", thumbnaileFullPath);
                }
                else
                {
                    Console.WriteLine("Error occurred. Thumbnail not created");
                 }

            }            

    The result is shown in Figure 2 below.
    Th02Results
    Figure 2

    One thing to note. The Thumbnail API is part of the Computer Vision API. As of this writing, the free version of the Computer Vision API is limited to 5,000 transactions per month. If you want more than that, you will need to upgrade to the Standard version, which charges $1.50 per 1000 transactions.

    But this should be plenty for you to learn this API for free and build and test your applications until you need to put them into production.
    The code above can be found on GitHub.

    You can find the full documentation – including an in-browser testing tool - for this API here.

    The Cognitive Services Custom Vision API provides a simple way to generate thumbnail images from pictures.

    Thursday, December 28, 2017 10:31:00 AM (GMT Standard Time, UTC+00:00)
    # Wednesday, December 27, 2017

    As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language.

    Your application uses Cognitive Services by calling one or more RESTful web services. These services require you to pass a key in the header of each HTTP call. You can generate this key from the Azure portal.

    If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.

    Once you have an Azure Account, navigate to the Azure Portal.

    CsKey01-Portal
    Figure 1

    Here you can create a Cognitive Services API key. Click the button in the top left of the portal (Figure 2)

    CsKey02-New
    Figure 2

    It’s worth noting that the “New” button caption sometimes changes to “Create a Resource” (Figure 2a)

    CsKey02-CreateResourceButton
    Figure 2a

    From the flyout menu, select AI+Cognitive Services. A list of Cognitive Services displays. Select the service you want to call. For this demo,I will select Computer Vision API, as shown in Figure 3.

    CsKey03-AICogServices
    Figure 3

    The Face API blade displays as shown in Figure 4.

    CsKey04-ComputerVisionBlade
    Figure 4

    At the Name textbox, enter a name for this service account.

    At the Subscription dropdown, select the Azure subscription to associate with this service.

    At the Location dropdown, select the region in which you want to host this service. You should select a region close to those who will be consuming the service. Make note of the region you selected.

    At the Pricing Tier dropdown, select the pricing tier you want to use. Currently, the choices are F0 (which is free, but limited to 20 calls per minute); and S1 (which is not free, but allows more calls.) Click the View full pricing details link to see how much S1 will cost.

    At the Resource Group field, select or create an Azure Resource Group. Resource Groups allow you to logically group different Azure resources, so you can manage them together.

    Click the [Create] button to create the account. The creation typically takes less than a minute and a message displays when the service is created, as shown in Figure 5.

    CsKey05-GoToResourceButton
    Figure 5

    Click the [Go to resource] button to open a blade to configure the newly-created service. Alternatively, you can select "All Resources" on the left menu and search for your service by name. Either way, the service blade displays, as as shown in Figure 6.

    CsKey06-ComputerVisionBlade
    Figure 6

    The important pieces of information in this blade are the Endpoint (on the Overview tab, Figure 7) and the Access Keys (on the Keys tab, as shown in Figure 8). Within this blade, you also have the opportunity to view log files and other tools to help troubleshoot your service. And you can set authorization and other restrictions to your service.

    CsKey07-ComputerVisionOverview
    Figure 7

    CsKey08-ComputerVisionKeys
    Figure 8

    The process is almost identical when you create a key for any other Cognitive Service. The only difference is that you will select a different service set in the AI+Cognitive Services flyout.

    Wednesday, December 27, 2017 10:35:00 AM (GMT Standard Time, UTC+00:00)
    # Tuesday, December 26, 2017

    Microsoft Cognitive Services is a set of APIs that take advantage of Machine Learning to provide developers with an easy way to analyze images, speech, language, and others.

    If you have worked with or studied Machine Learning, you know that you can accomplish a lot, but that it requires a lot of computing power, a lot of time, and a lot of data. Since most of us have a limited amount of each of these, we can take advantage of the fact that Microsoft has data, time, and the computing power of Azure. They have used this power to analyze large data sets and expose the results via a set of web services, collectively known as Cognitive Services.

    The APIs of Cognitive Services are divided into 5 broad categories: Vision, Speech, Language, Knowledge, and Search.

    Vision APIs

    The Vision APIs provide information about a given photograph or video. For example, several Vision APIs are capable of recognizing  faces in an image. One analyzes each face and deduces that person's emotion; another can compare 2 pictures and decide whether or not 2 photographs are the same person; a third guesses the age of each person in a photo.

    Speech APIs

    The Speech APIs can convert speech to text or text to speech. It can also recognize the voice of a given speaker (You might use this to authenticate users, for example) and infer the intent of the speaker from his words and tone. The Translator Speech API supports translations between 10 different spoken languages.

    Language

    The Language APIs include a variety of services. A spell checker is smart enough to recognize common proper names and homonyms. And the Translator Text API can detect the language in which a text is written and translate that text into another language. The Text Analytics API analyzes a document for the sentiment expressed, returning a score based on how positive or negative is the wording and tone of the document. The most interesting API in this group is the Language Understanding Intelligence Service (LUIS) that allows you to build custom language models so that your application can understand questions and statements from your users in a variety of formats.

    Knowledge

    Knowledge includes a variety of APIs - from customer recommendations to smart querying and information about the context of text. Many of these services take advantage of natural language processing. As of this writing, all of these services are in preview.

    Search

    The Search APIs allow you to retrieve Bing search results with a single web service call.

    You can use these APIs. To get started, you need an Azure account. You can get a free Azure trial at https://azure.microsoft.com/.

    Each API offers a free option that restricts the number and/or frequency of calls, but you can break through that boundary for a charge.  Because they are hosted in Azure, the paid services can scale out to meet increased demand.

    You call most of these APIs by passing and receiving JSON to a RESTful web service. Some of the more complex services offer configuration and setup beforehand.

    These APIs are capable of analyzing pictures, text, and speech because each service draws on the knowledge learned from parsing countless photos, documents, etc. beforehand.
     
    You can find documentation, sample code, and even a place to try out each API live in your browser at https://azure.microsoft.com/en-us/services/cognitive-services/

    A couple of fun applications of Cognitive Services are how-old.net (which guesses the ages of people in photographs) and what-dog.net (which identifies the breed of dog in a photo).

    Below is a screenshot from the Azure documentation page, listing the sets of services. But keep checking back, because this list grows and each set contains one or more services.

    List of Cognitive Services
     
    Sign up today and start building apps. It’s fun, it's useful, and it’s free!

    Tuesday, December 26, 2017 10:25:00 AM (GMT Standard Time, UTC+00:00)
    # Tuesday, February 28, 2017

    Last week, Ed Charbeneau interviewed me for his Eat Sleep Dev podcast. The topic was Cognitive Services – a technology I’m passionate about.

    You can listen to that interview below.

    Tuesday, February 28, 2017 3:43:00 PM (GMT Standard Time, UTC+00:00)