# Thursday, October 31, 2019

GCast 65:

Artificial Intelligence, Whale Sharks, and WildBook

With help from Microsoft's AI for Earth, Wild Me has created WildBook to use artificial intelligence to track the location and migration of animals.

AI | GCast | Screencast | Video
Thursday, October 31, 2019 1:51:29 PM (GMT Standard Time, UTC+00:00)
# Thursday, August 29, 2019

GCast 63:

Sentiment Analysis JavaScript Demo

In this video, I walk you through a JavaSript application that calls the Sentiment Analysis Cognitive Service.

Thursday, August 29, 2019 1:09:57 PM (GMT Daylight Time, UTC+01:00)
# Friday, August 23, 2019

GCast 62:

Sentiment Analysis Cognitive Service

This video explains the Sentiment Analysis service, which is part of the Text Analytics Cognitive Service.

Friday, August 23, 2019 4:47:19 AM (GMT Daylight Time, UTC+01:00)
# Friday, August 16, 2019

In the last article, I walked through the syntax of calling the Bing Spell Check service.

In this article, I will walk through a simple JavaScript application that calls this service.

If you want to follow along this sample is part of my Cognitive Services demos, which you can find on GitHub at https://github.com/DavidGiard/CognitiveSvcsDemos 

This project is found in the "SpellCheckDemo" folder.

Here is the main web page:

Listing 1:

<html>
<head>
    <title>Spell Check Demo</title>
    <script src="scripts/jquery-1.10.2.min.js"></script>
    <script src="scripts/script.js"></script>
    <script src="scripts/getkey.js"></script>
    <link rel="stylesheet" href="css/site.css">
</head>
 <body>
     <h1>Spell Check Demo</h1>
     <div>
         <textarea id="TextToCheck">Life ig buuutiful all the tyme
         </textarea>
     </div>
    <button id="SpellCheckButton">Check Spelling!</button>
     <div id="NewTextDiv"></div>
     <div id="OutputDiv"></div>

</body>
</html>
  

As you can see, the page consists of a text area with some misspelled text; a button; and 2 empty divs.

The page looks like this when rendered in a browser:

scjs01-PageOnLoad
Fig. 1

When the user clicks the button, we want to call the Spell Check service, sending it the text in the text area.

We want to display the values in the web service response in the OutputDiv div; and we want to display some of the raw information in the response in the NewTextDiv div.

Below is the screen after clicking the [Check Spelling] button

scjs02-PageAfterClick

Fig. 2

We need a reference to the outputDiv, so we can easily write to it.

Listing 2:

var outputDiv = document.getElementById("OutputDiv");
  

Next, we bind code to the button's click event, as shown in Listing 3.

Listing 3:

var spellCheckButton = document.getElementById("SpellCheckButton"); 
spellCheckButton.onclick = function () { 
    // Replace this with your Spell Check API key from Aure 
    var subscriptionKey = "xxxxxxxxxxxxxxxxxxxxxxxx"; 

    outputDiv.innerHTML = "Thinking...";

    var textToCheck = document.getElementById("TextToCheck").textContent; 
    var webSvcUrl = "https://api.cognitive.microsoft.com/bing/v7.0/spellcheck/?text=" + textToCheck; 
    webSvcUrl = webSvcUrl + "&mode=proof&mkt=en-US";

    var httpReq = new XMLHttpRequest(); 
    httpReq.open("GET", webSvcUrl, true); 
    httpReq.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey) 
    httpReq.setRequestHeader("contentType", "application/json") 
    httpReq.onload = onSpellCheckSuccess; 
    httpReq.onerror = onSpellCheckError; 
    httpReq.send(null); 
};
  

This code gets the text from the text area and makes an asynchronous HTTP GET request to the Spell Check API, passing the API key in the header. When the API sends a response, this will call the onSpellCheckSuccess or onSpellCheckError function, depending on the success of the call.

Listing 4 shows the onSpellCheckSuccess function:

Listing 4:

function onSpellCheckSuccess(evt) { 
    var req = evt.srcElement; 
    var resp = req.response; 
    var data = JSON.parse(resp);

    var flaggedTokens = data.flaggedTokens; 
    if (data.flaggedTokens.length > 0) { 
        var newText = document.getElementById("TextToCheck").textContent; 
        ; 
        var outputHtml = ""; 
         flaggedTokens.forEach(flaggedToken => { 
            var token = flaggedToken.token; 
            var tokenType = flaggedToken.type; 
            var offset = flaggedToken.offset; 
            var suggestions = flaggedToken.suggestions; 
            outputHtml += "<div>" 
            outputHtml += "<h3>Token: " + token + "</h3>"; 
            outputHtml += "Type: " + tokenType + "<br/>"; 
            outputHtml += "Offset: " + offset + "<br/>"; 
             outputHtml += "<div>Suggestions</div>"; 
            outputHtml += "<ul>";

            if (suggestions.length > 0) { 
                 suggestions.forEach(suggestion => { 
                     outputHtml += "<li>" + suggestion.suggestion; 
                     outputHtml += " (" + (suggestion.score * 100).toFixed(2) + "%)" 
                }); 
                outputHtml += "</ul>"; 
                outputHtml += "</div>";

                newText = replaceTokenWithSuggestion(newText, token, offset, suggestions[0].suggestion) 
            } 
            else { 
                 outputHtml += "<ul><li>No suggestions for this token</ul>"; 
            } 
        });

        newText = "<h2>New Text:</h2>" + newText; 
        var newTextDiv = document.getElementById("NewTextDiv"); 
        newTextDiv.innerHTML = newText;

        outputHtml = "<h2>Details</h2>" + outputHtml; 
        outputDiv.innerHTML = outputHtml;

    } 
    else { 
        outputDiv.innerHTML = "No errors found."; 
    } 
};
  

As you can see, we parse out the JSON object from the response and retrieve each flaggedToken from that object. For each flaggedToken, we output information, such as the original text (or token), the tokenType, and suggested replacements, along with the score of each replacement.

If an error occurs when calling the API service, the onSpellCheckError function is called, as shown in Listing 5.

Listing 5:

function onSpellCheckError(evt) { 
    outputDiv.innerHTML = "An error has occurred!!!"; 
};
  

Finally, we replace each token with the first suggestion, using the code in Listing 6.

Listing 6*:

function replaceTokenWithSuggestion(originalString, oldToken, offset, newWord) { 
    var textBeforeToken = originalString.substring(0, offset);

    var textAfterToken = ""; 
    if (originalString.length > textBeforeToken.length + oldToken.length) { 
        textAfterToken = originalString.substring(offset + oldToken.length, originalString.length); 
    }

    var newString = textBeforeToken + newWord + textAfterToken;

    return newString; 
 }
  

Here is the full JavaScript:

Listing 7:

window.onload = function () {

    var outputDiv = document.getElementById("OutputDiv");
    // var subscriptionKey = getKey();

    var spellCheckButton = document.getElementById("SpellCheckButton");
    spellCheckButton.onclick = function () {
        var subscriptionKey = getKey();
        var textToCheck = document.getElementById("TextToCheck").textContent;

        var webSvcUrl = "https://api.cognitive.microsoft.com/bing/v7.0/spellcheck/?text=" + textToCheck;
        webSvcUrl = webSvcUrl + "&mode=proof&mkt=en-US";

        outputDiv.innerHTML = "Thinking...";

        var httpReq = new XMLHttpRequest();
        httpReq.open("GET", webSvcUrl, true);
        httpReq.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey)
        httpReq.setRequestHeader("contentType", "application/json")
        httpReq.onload = onSpellCheckSuccess;
        httpReq.onerror = onSpellCheckError;
        httpReq.send(null);
    };

    function onSpellCheckSuccess(evt) {
        var req = evt.srcElement;
        var resp = req.response;
        var data = JSON.parse(resp);

        var flaggedTokens = data.flaggedTokens;
        if (data.flaggedTokens.length > 0) {
            var newText = document.getElementById("TextToCheck").textContent;
            ;
            var outputHtml = "";
            flaggedTokens.forEach(flaggedToken => {
                var token = flaggedToken.token;
                var tokenType = flaggedToken.type;
                var offset = flaggedToken.offset;
                var suggestions = flaggedToken.suggestions;
                outputHtml += "<div>"
                outputHtml += "<h3>Token: " + token + "</h3>";
                outputHtml += "Type: " + tokenType + "<br/>";
                outputHtml += "Offset: " + offset + "<br/>";
                outputHtml += "<div>Suggestions</div>";
                outputHtml += "<ul>";

                if (suggestions.length > 0) {
                    suggestions.forEach(suggestion => {
                        outputHtml += "<li>" + suggestion.suggestion;
                        outputHtml += " (" + (suggestion.score * 100).toFixed(2) + "%)" 
                    });
                    outputHtml += "</ul>";
                    outputHtml += "</div>";

                    newText = replaceTokenWithSuggestion(newText, token, offset, suggestions[0].suggestion)
                }
                else {
                    outputHtml += "<ul><li>No suggestions for this token</ul>";
                }
            });

            newText = "<h2>New Text:</h2>" + newText;
            var newTextDiv = document.getElementById("NewTextDiv");
            newTextDiv.innerHTML = newText;

            outputHtml = "<h2>Details</h2>" + outputHtml;
            outputDiv.innerHTML = outputHtml;

        }
        else {
            outputDiv.innerHTML = "No errors found.";
        }
    };

    function onSpellCheckError(evt) {
        outputDiv.innerHTML = "An error has occurred!!!";
    };

    function replaceTokenWithSuggestion(originalString, oldToken, offset, newWord) {
        var textBeforeToken = originalString.substring(0, offset);

        var textAfterToken = "";
        if (originalString.length > textBeforeToken.length + oldToken.length) {
            textAfterToken = originalString.substring(offset + oldToken.length, originalString.length);
        }

        var newString = textBeforeToken + newWord + textAfterToken;

        return newString;
    }

};
  

Hopefully, this sample gives you an idea how to get started building your first app that uses the Bing Spell Check API.



* This code currently has a bug in it: It only works if each suggestion is the same length as the token it replaces. I plan to fix this bug, but I'm publishing now because:

  1. It is not a fatal bug and
  2. It is not relevant to the call to the API, which is the primary point I'm showing in this article.
Friday, August 16, 2019 9:00:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, August 14, 2019

In the last article, I showed how to create a Bing Spell Check service in Azure. Once you have created this service, you can now pass text to a web service to perform spell checking.

Given a text sample, the service checks the spelling of each token in the sample. A token is a word or two word that should be a single word, such as "arti cle", which is a misspelling of the word "article".

It returns an array of unrecognized tokens, along with suggested replacements for these misspelled tokens.

URL and querystring arguments

The URL for the web service is
https://api.cognitive.microsoft.com/bing/v7.0/spellcheck

You can add some optional querystring parameters to this URL:

mode
Set this to "proof" if you want to check for spelling, grammar, and punctuation errors
Set it to "spell" if you only want to check for spelling errors.

If you omit the "mode" querystring argument, it defaults to "proof".

mkt
Set this to the Market Code of the country/language/culture you want to test. This is in the format [Language Code]-[Country Code], such as "en-US" for United States English. A full list of Market Codes can be fond here.

The "Proof" mode supports only en-US,  es-ES, and pt-BR Market Codes.

If you omit the mkt argument, the service will guess the market based on the text. Therefore, it is a good idea to include this value, even though it is optional.

Below is an example of a URL with some querystring values set.

https://api.cognitive.microsoft.com/bing/v7.0/spellcheck?mode=proof&mkt=en-us

POST vs GET

You have the option to submit either an HTTP POST or an HTTP GET request to the URL. We will discuss the differences below.

If you use the GET verb, you pass the text to check in the querystring, as in the following example:

https://api.cognitive.microsoft.com/bing/v7.0/spellcheck?mode=proof&mkt=en-us&text=Life+ig+buuutifull+all+the+tyme

With the GET method, the text is limited to 1,500 characters

If you use the POST verb, the text is passed in the body of the request, as in the following example:

text=Life+ig+buuutifull+all+the+tyme

With the POST method, you can send text up to 10,000 characters long.

Results

If successful, the web service will return an HTTP 200 ("OK") response, along with the following data in JSON format in the body of the response:

_type: "SpellCheck"

An array of "flaggedTokens", representing spelling errors found

Each flaggedToken consists of the following information:

  • offset: The position of the offending token within the text
  • token: The token text
  • type: The reason this token is in this list (usually "UnknownToken")
  • suggestion: An array of suggested replacements for the offending token. Each suggestion consists of the following:
  • score: a value (0-1) indicating the likelihood that this suggestion is the appropriate replacement

Below is an example of a response:

{
   "_type": "SpellCheck",
   "flaggedTokens": [{
     "offset": 5,
     "token": "ig",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "is",
       "score": 0.8922398888897022
     }]
   }, {
     "offset": 8,
     "token": "buuutifull",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "beautiful",
       "score": 0.8922398888897022
     }]
   }, {
     "offset": 27,
     "token": "tyme",
     "type": "UnknownToken",
     "suggestions": [{
       "suggestion": "time",
       "score": 0.8922398888897022
     }]
   }]
 }
  

In this article, I showed how to call the Bing Spell Check service with either a GET or POST HTTP request.

Wednesday, August 14, 2019 8:53:00 AM (GMT Daylight Time, UTC+01:00)

The Bing Spell Check API allows you to call a simple web service to perform spell checking on your text.

Before you get started, you must log into a Microsoft Azure account and create a new Bing Spell Check Service. Here are the steps to do this:

In the Azure Portal, click the [Create a resource] button (Fig. 1); then, search for and select "Bing Spell Check", as shown in Fig. 2.

sc01-CreateResourceButton
Fig. 1

sc02-SearchForBingSpellCheck
Fig. 2

The "Bing Spell Check" (currently on version 7) page displays, which describes the service and provides links to documentation and information about the service, as shown in Fig. 3

sc03-BingSpellCheckLandingPage
Fig. 3

Click the [Create] button to open the "Create" blade, as shown in Fig. 4.

sc04-CreateSpellCheckBlade
Fig. 4

At the "Name" field, enter a unique name for your service.

At the "Subscription" dropdown, select the subscription in which to create the service. Most of you will have only one subscription.

At the "Pricing Tier" dropdown, select the free or paid tier, as shown in Fig. 5.

sc05-PricingTiers
Fig. 5

The number of calls are severely limited for the free tier, so this is most useful for testing and learning the service. You may only create one free Spell Check service per subscription.

At the "Resource Group" field, select a resource group to associate with this service or click the "Create new" link to associate it with a newly-created resource group. A resource group provides a way to group together related service, making it easier to manage them together.

Click the [Create] button to begin creating the service. This process takes only a few seconds.

Open the service and select the "Keys" blade, as shown in Fig. 6.

sc06-KeysBlade
Fig. 6

Either one of the keys listed on this page must be passed in the header of your web service call.

Save a copy of one of these keys. You will need it when I show you how to call the Bing Spell Check Service in tomorrow’s article.

Wednesday, August 14, 2019 1:46:16 AM (GMT Daylight Time, UTC+01:00)
# Thursday, August 1, 2019

GCast 59:

Cognitive Services Text Recognition service

Learn to extract text from an image using the new Text Recognition service.

Thursday, August 1, 2019 11:53:50 PM (GMT Daylight Time, UTC+01:00)
# Monday, June 24, 2019

Episode 569

John Alexander on ML.NET

John Alexander describes how .NET developers can use ML.NET to build and consume Machine Learning solutions.

Monday, June 24, 2019 9:01:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, June 12, 2019

In a previous article, I showed how to use the Microsoft Cognitive Services Computer Vision API to perform Optical Character Recognition (OCR) on a document containing a picture of text. We did so by making an HTTP POST to a REST service.

If you are developing with .NET languages, such as C# Visual Basic, or F#, a NuGet Package makes this call easier. Classes in this package abstract the REST call, so can write less and simpler code; and strongly-typed objects allow you to make the call and parse the results more easily.


To get started, you will first need to create a Computer Vision service in Azure and retrieve the endpoint and key, as described here.

Then, you can create a new C# project in Visual Studio. I created a WPF application, which can be found and downloaded at my GitHub account. Look for the project named "OCR-DOTNETDemo". Fig. 1 shows how to create a new WPF project in Visual Studio.

od01-FileNewProject
Fig. 1

In the Solution Explorer, right-click the project and select "Manage NuGet Packages", as shown in Fig. 2.

od02-ManageNuGet
Fig. 2

Search for and install the "Microsoft.Azure.CognitiveServices.Vision.ComputerVision", as shown in Fig. 3.

od03-NuGet
Fig. 3

The important classes in this package are:

  • OcrResult
    A class representing the object returned from the OCR service. It consists of an array of OcrRegions, each of which contains an array of OcrLines, each of which contains an array of OcrWords. Each OcrWord has a text property, representing the text that is recognized. You can reconstruct all the text in an image by looping through each array.
  • ComputerVisionClient
    This class contains the RecognizePrintedTextInStreamAsync method, which abstracts the HTTP REST call to the OCR service.
  • ApiKeyServiceClientCredentials
    This class constructs credentials that will be passed in the header of the HTTP REST call.

Create a new class in the project named "OCRServices" and make its scope "internal" or "public"

Add the following "using" statements to the top of the class:

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
using System.IO;
  


Add the following methods to this class:

Listing 1:

internal static async Task<OcrResult> UploadAndRecognizeImageAsync(string imageFilePath, OcrLanguages language) 
 { 
    string key = "xxxxxxx"; 
    string endPoint = "https://xxxxx.api.cognitive.microsoft.com/"; 
    var credentials = new ApiKeyServiceClientCredentials(key);

    using (var client = new ComputerVisionClient(credentials) { Endpoint = endPoint }) 
    { 
        using (Stream imageFileStream = File.OpenRead(imageFilePath)) 
        { 
             OcrResult ocrResult = await client.RecognizePrintedTextInStreamAsync(false, imageFileStream, language); 
            return ocrResult; 
        } 
    } 
}

internal static async Task<string> FormatOcrResult(OcrResult ocrResult) 
{ 
    var sb = new StringBuilder(); 
    foreach(OcrRegion region in  ocrResult.Regions) 
    { 
        foreach (OcrLine line in region.Lines) 
        { 
             foreach (OcrWord word in line.Words) 
            { 
                 sb.Append(word.Text); 
                sb.Append(" "); 
            } 
            sb.Append("\r\n"); 
        } 
         sb.Append("\r\n\r\n"); 
    } 
    return sb.ToString(); 
}
  

The UploadAndRecognizeImageAsync method calls the HTTP REST OCR service (via the NuGet library extractions) and returns a strongly-typed object representing the results of that call. Replace the key and the endPoint in this method with those associated with your Computer Vision service.

The FormatOcrResult method loops through each region, line, and word of the service's return object. It concatenates the text of each word, separating words by spaces, lines by a carriage return and line feed, and regions by a double carriage return / line feed.

Add a Button and a TextBlock to the MainWindow.xaml form.

In the click event of that button add the following code.

Listing 2:

private async void GetText_Click(object sender, RoutedEventArgs e) 
{ 
    string imagePath = @"xxxxxxx.jpg"; 
    OutputTextBlock.Text = "Thinking…"; 
    var language = OcrLanguages.En; 
    OcrResult ocrResult =  await OCRServices.UploadAndRecognizeImageAsync(imagePath, language); 
     string resultText = await OCRServices.FormatOcrResult(ocrResult); 
    OutputTextBlock.Text = resultText; 
 }
  


Replace xxxxxxx.jpg with the full path of an image file on disc that contains pictures of text.

You will need to add the following using statement to the top of MainWindow.xaml.cs.

using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
  

If you like, you can add code to allow users to retrieve an image and display that image on your form. This code is in the sample application from my GitHub repository, if you want to view it.

Running the form should look something like Fig. 4.

od04-RunningApp
Fig. 4

Wednesday, June 12, 2019 9:46:00 AM (GMT Daylight Time, UTC+01:00)
# Friday, June 7, 2019

The Microsoft Cognitive Services Computer Vision API contains functionality to infer a lot of information about a given image. One capability is to convert pictures of text into text, a process known as "Optical Characer Recognition" or "OCR".

Performing OCR on an image is simple and inexpensive. It is done through a web service call; but first, you must set up the Computer Vision Service, as described in this article.

In that article, you were told to save two pieces of information about the service: The API Key and the URL. Here is where you will use them.

HTTP Endpoint

The OCR service is a web service. To call it, you send an HTTP POST request to an HTTP endpoint. The endpoint consists of the URL copied above, followed by "vision/v2.0/ocr", followed by some optional querystring parameters (which we will discuss later).

So, if you create your service in the EAST US Azure region, the copied URL will be

https://eastus.api.cognitive.microsoft.com/

and the HTTP endpoint for the OCR service will be

https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr

Querystring Parameters

The optional querystring parameters are

language:

The 2-character language code of the text you are recognizing. This helps the service more accurately and quickly match pictures of words to the words they represent. If you omit this parameter, the system will analyze the text and guess an appropriate language. Currently, the service supports 26 languages. The 2-character code of each supported language is listed in Appendix 1 at the bottom of this article.

detectOrientation

"true", if you want the service to adjust the orientation of the image before performing OCR. If you pass "false" or omitting this parameter, the service will assume the image is oriented correctly.

If you have an image with English text and you want the service to detect and adjust the image's orientation, the above URL becomes:

https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true

HTTP Headers

In the header of the HTTP request, you must add the following name/value pairs:

Ocp-Apim-Subscription-Key

The API key you copied above

Content-Type

The media type of the image you are passing to the service in the body of the HTTP request

Possible values are:

  • application/json
  • application/octet-stream
  • multipart/form-data

The value you pass must be consistent with the data in the body.

If you select "application/json", you must pass in the request body a URL pointing to the image on the public Internet.

If you select "application/json" or "application/octet-stream", you must pass the actual binary image in the request body.

Body

In the body of the HTTP request, you pass the image you want the service to analyze.

If you selected "application/json" as the Content-Type in the header, pass a URL within a JSON document, with the following format:

{"url":"image_url"}

where image_url is a URL pointing to the image you want to recognize.

Here is an example:

{"url":"https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png"}

If you selected "application/octet-stream" or "multipart/form-data" as the Content-Type in the header, pass the actual binary image in the body of the request.

The service has some restrictions on the images it can analyze.

It cannot analyze an image larger than 4MB.

The width and height of the image must be between 50 and 4,200 pixels

The image must be one of the following formats: JPEG, PNG, GIF, BMP.

Sample call with Curl

Here is an example of a call to the service, using Curl:

curl -v -X POST "https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: f27c7436c3a64d91a177111a6b594537" --data-ascii "{'url' : 'https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png'}"

(NOTE: I modified the key, so it will not work. You will need to replace it with your own key if you want this to work.)

Response

If all goes well, you will receive an HTTP 200 (OK) response.

In the body of that response will be the results of the OCR in JSON format.

At the top level is the language, textAngle, and orientation

Below that is an array of 0 or more text regions. Each region represents a block of text within the image.

Each region contains an array of 0 or more lines of text.

Each line contains an array of 0 or more words.

Each region, line, and word contains a bounding box, consisting of the left, top, width, and height of the word(s) within.

Here is a partial example of the JSON returned from a successful web service call:

{
    "language": "en",
    "textAngle": 0.0,
    "orientation": "Up",
    "regions": [
        {
            "boundingBox": "147,96,622,1095",
            "lines": [
                {
                    "boundingBox": "408,96,102,56",
                    "words": [
                        {
                            "boundingBox": "408,96,102,56",
                            "text": "Hey"
                        }
                    ]
                },
                {
                    "boundingBox": "282,171,350,45",
                    "words": [
                        {
                            "boundingBox": "282,171,164,45",
                            "text": "Diddle"
                        },
                        {
                            "boundingBox": "468,171,164,45",
                            "text": "Diddle"
                        }
                    ]
                },
                etc...
                 }
            ]
        }
    ]
}
  

The full JSON can be found in Appendix 2 below.

Errors

If an error occurs, the response will not by HTTP 200. It will be an HTTP Response code greater than 400. Additional error information will be in the body of the response.

Common errors include:

  • Images too large or too small
  • Image not found (It might require a password or be behind a firewall)
  • Invalid image format
  • Incorrect API key
  • Incorrect URL (It must match the API key. If you have multiple services, it’s easy to mix them up)
  • Miscellaneous spelling errors (e.g., not entering a valid language code or misspelling a header parameter)

In this article, I showed how to call the Cognitive Services OCR Computer Vision Service.

Appendix 1: Supported languages

zh-Hans (ChineseSimplified)
zh-Hant (ChineseTraditional)
cs (Czech)
da (Danish)
nl (Dutch)
en (English)
fi (Finnish)
fr (French)
de (German)
el (Greek)
hu (Hungarian)
it (Italian)
ja (Japanese)
ko (Korean)
nb (Norwegian)
pl (Polish)
pt (Portuguese,
ru (Russian)
es (Spanish)
sv (Swedish)
tr (Turkish)
ar (Arabic)
ro (Romanian)
sr-Cyrl (SerbianCyrillic)
sr-Latn (SerbianLatin)
sk (Slovak)

Appendix 2: JSON Response Example

{
    "language": "en",
    "textAngle": 0.0,
    "orientation": "Up",
    "regions": [
        {
            "boundingBox": "147,96,622,1095",
            "lines": [
                {
                    "boundingBox": "408,96,102,56",
                    "words": [
                        {
                            "boundingBox": "408,96,102,56",
                            "text": "Hey"
                        }
                    ]
                },
                {
                    "boundingBox": "282,171,350,45",
                    "words": [
                        {
                            "boundingBox": "282,171,164,45",
                            "text": "Diddle"
                        },
                        {
                            "boundingBox": "468,171,164,45",
                            "text": "Diddle"
                        }
                    ]
                },
                {
                    "boundingBox": "239,336,441,46",
                    "words": [
                        {
                            "boundingBox": "239,336,87,46",
                            "text": "Hey"
                        },
                        {
                            "boundingBox": "359,337,144,35",
                            "text": "diddle"
                        },
                        {
                            "boundingBox": "536,337,144,35",
                            "text": "diddle"
                        }
                    ]
                },
                {
                    "boundingBox": "169,394,576,35",
                    "words": [
                        {
                            "boundingBox": "169,394,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "279,402,73,27",
                            "text": "cat"
                        },
                        {
                            "boundingBox": "383,394,83,35",
                            "text": "and"
                        },
                        {
                            "boundingBox": "500,394,70,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "604,394,141,35",
                            "text": "fiddle"
                        }
                    ]
                },
                {
                    "boundingBox": "260,452,391,50",
                    "words": [
                        {
                            "boundingBox": "260,452,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "370,467,80,20",
                            "text": "cow"
                        },
                        {
                            "boundingBox": "473,452,178,50",
                            "text": "jumped"
                        }
                    ]
                },
                {
                    "boundingBox": "277,509,363,35",
                    "words": [
                        {
                            "boundingBox": "277,524,100,20",
                            "text": "over"
                        },
                        {
                            "boundingBox": "405,509,71,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "509,524,131,20",
                            "text": "moon."
                        }
                    ]
                },
                {
                    "boundingBox": "180,566,551,49",
                    "words": [
                        {
                            "boundingBox": "180,566,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "292,566,103,35",
                            "text": "little"
                        },
                        {
                            "boundingBox": "427,566,82,49",
                            "text": "dog"
                        },
                        {
                            "boundingBox": "546,566,185,49",
                            "text": "laughed"
                        }
                    ]
                },
                {
                    "boundingBox": "212,623,493,51",
                    "words": [
                        {
                            "boundingBox": "212,631,42,27",
                            "text": "to"
                        },
                        {
                            "boundingBox": "286,638,72,20",
                            "text": "see"
                        },
                        {
                            "boundingBox": "390,623,96,35",
                            "text": "such"
                        },
                        {
                            "boundingBox": "519,638,20,20",
                            "text": "a"
                        },
                        {
                            "boundingBox": "574,631,131,43",
                            "text": "sport."
                        }
                    ]
                },
                {
                    "boundingBox": "301,681,312,35",
                    "words": [
                        {
                            "boundingBox": "301,681,90,35",
                            "text": "And"
                        },
                        {
                            "boundingBox": "425,681,70,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "528,681,85,35",
                            "text": "dish"
                        }
                    ]
                },
                {
                    "boundingBox": "147,738,622,50",
                    "words": [
                        {
                            "boundingBox": "147,753,73,20",
                            "text": "ran"
                        },
                        {
                            "boundingBox": "255,753,114,30",
                            "text": "away"
                        },
                        {
                            "boundingBox": "401,738,86,35",
                            "text": "with"
                        },
                        {
                            "boundingBox": "519,738,71,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "622,753,147,35",
                            "text": "spoon."
                        }
                    ]
                },
                {
                    "boundingBox": "195,1179,364,12",
                    "words": [
                        {
                            "boundingBox": "195,1179,45,12",
                            "text": "Nursery"
                        },
                        {
                            "boundingBox": "242,1179,38,12",
                            "text": "Rhyme"
                        },
                        {
                            "boundingBox": "283,1179,36,9",
                            "text": "Charts"
                        },
                        {
                            "boundingBox": "322,1179,28,12",
                            "text": "from"
                        },
                        {
                            "boundingBox": "517,1179,11,10",
                            "text": "C"
                        },
                        {
                            "boundingBox": "531,1179,28,9",
                            "text": "2017"
                        }
                    ]
                },
                {
                    "boundingBox": "631,1179,90,12",
                    "words": [
                        {
                            "boundingBox": "631,1179,9,9",
                            "text": "P"
                        },
                        {
                            "boundingBox": "644,1182,6,6",
                            "text": "a"
                        },
                        {
                            "boundingBox": "655,1182,7,9",
                            "text": "g"
                        },
                        {
                            "boundingBox": "667,1182,7,6",
                            "text": "e"
                        },
                        {
                            "boundingBox": "690,1179,31,12",
                            "text": "7144"
                        }
                    ]
                }
            ]
        }
    ]
}
  
Friday, June 7, 2019 9:09:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, June 5, 2019

The Microsoft Cognitive Services Computer Vision API contains functionality to infer a lot of information about a given image.

As of this writing, the API is on version 2.0 and supports the following capabilities:

Analyze an Image

Get general information about an image, such as the objects found, what each object is and where it is located. It can even identify potentially pornographic images.

Analyze Faces

Find the location of each face in a video and determine information about each face, such as is age, gender, and type of facial hair or glasses.

Optical Character Recognition (OCR)

Convert a picture of text into text

Recognize Celebrities

Recognize famous people from photos of their face

Recognize Landmarks

Recognize famous landmarks, such as the Statue of Liberty or Diamond Head Volcano.

Analyze Video

Retrieve keywords to describe a video at different points in time as it plays.

Generate a Thumbnail

Change the size and shape of an image, without cropping out the main subject.

Getting Started

To get started, you need to create a Computer Vision Service. To do this, navigate to the Azure Portal, login in, click the [Create a resource] button (Fig. 1) and enter "Computer Vision" in the Search box, as shown in Fig. 2.

cv01-CreateResource
Fig. 1

cv02-SearchForComputerVision
Fig. 2

A dialog displays, with information about the Computer Vision Service, as shown in Fig. 3.

cv03-ComputerVisionSplashPage
Fig. 3

Click the [Create] button to display the Create Computer Vision Service blade, as shown in Fig. 4.

cv04-NewSvc
Fig. 4

At the "Name" field, enter a name by which you can easily identify this service. This name must be unique among your services, but need not be globally unique.

At the "Subscription" field, select the Subscription with which you want to associate this service. Most of you will only have one subscription.

At the "Location" field, select the Azure Region in which to store this service. Consider where the users of this service will be, so you can reduce latency.

At the "Pricing tier" field, select "F0" to use this service for free or "S1" to incur a small charge for each call to the service. If you select the free service, you will be limited in the number and frequency of calls that can be made.

At the "Resource group" field, select a resource group in which to store your service or click "Create new" to store it in a newly-created resource group. A resource group is a logical container for Azure resources.

Click the [Create] button to create the Computer Vision service.

Usually, it takes less than a minute to create a Computer Vision Service. When Azure has created this service, you can navigate to it by its name or the name of the resource group.

Two pieces of information are critical when using the service: The Endpoint and the API keys.

The Endpoint can be found on the service's Overview blade, as shown in Fig. 5.

cv05-OverviewBlade
Fig. 5

The API Keys can be found on the service's "Keys" blade, as shown in Fig. 6. There are 2 keys, in case one key is compromised; you can use the other key, while the first is regenerated, in order to minimize downtime.

cv06-KeysBlade
Fig. 6

Copy the URL and and one of the API keys. You will need it to call the web services. We will describe how to make specific calls in future articles.

Wednesday, June 5, 2019 4:46:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, March 4, 2019

Episode 553

Jennifer Marsman on AI for Earth

Jennifer Marsman describes how Microsoft's AI for Earth team is using data to make the world a better place.

Monday, March 4, 2019 9:07:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, January 24, 2019

GCast 32:

Handwriting OCR with Cognitive Services

See how to perform OCR on images with handwritten text, using Microsoft Cognitive Services. I walk through the API and show sample JavaScript code.

Thursday, January 24, 2019 8:21:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, January 17, 2019

GCast 31:

OCR with Cognitive Services

Cognitive Services can automatically detect text from pictures of text. This video shows how.

Thursday, January 17, 2019 8:17:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, January 10, 2019

GCast 30:

Creating Applications with the Analyze Image Cognitive Services API

Learn how to create C# and node applications using the "Analyze Image" service of the Microsoft Cognitive Services Vision API.

Thursday, January 10, 2019 7:28:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, January 3, 2019

GCast 29:

Introducing Cognitive Services and Computer Vision

Microsoft Cognitive Services allow you to take advantage of Machine Learning without all the complexities of Machine Learning. In this video, I introduce Cognitive Services by showing how to use Computer Vision to analyze an image, automatically detecting properties of that image.

Thursday, January 3, 2019 12:53:21 PM (GMT Standard Time, UTC+00:00)
# Thursday, December 27, 2018

GCast 28:

Natural Language Processing with LUIS

Learn how to use Microsoft Language Understanding Information Service (LUIS) to build models that provide Natural Language Processing (NLP) for your application.

Thursday, December 27, 2018 9:53:00 AM (GMT Standard Time, UTC+00:00)
# Monday, July 30, 2018
Monday, July 30, 2018 11:10:00 AM (GMT Daylight Time, UTC+01:00)