Calling the "Recognize Text" Cognitive Service from a .NET Application

In a recent article, I introduced you to the "Recognize Text" API that returns the text in an image - process known as "Optical Character Recognition", or "OCR".

In this article, I will show how to call this API from a .NET application.

Recall that the "Recognize Text" API consists of two web service calls:

We call the "Recognize Text" web service and pass an image to begin the process.

We call the "Get Recognize Text Operation Result" web service to check the status of the processing and retrieive the resulting text, when the process is complete.

The sample .NET application

If you want to follow along, the code is available in the RecognizeTextDemo found in this GitHub repository.

To get started, you will need to create a Computer Vision key, as described here.

Creating this service gives you a URI endpoint to call as a web service, and an API key, which must be passed in the header of web service calls.

The App

To run the app, you will need to copy the key created above into the App.config file. Listing 1 shows a sample config file:

Listing 1:


   
      key="ComputerVisionKey" value="5070eab11e9430cea32254e3b50bfdd5" />

You will also need an image with some text in it. For this demo, we will use the image shown in Fig. 1.

Fig. 1

When you run the app, you will see the screen in Fig. 2.

Fig. 2

Press the [Get File] button and select the saved image, as shown in Fig. 3.

Fig. 3

Click the [Open] button. The Open File Dialog closes, the full path of the image is displays on the form, and the [Start OCR] button is enabled, as shown in Fig. 4.

Fig. 4

Click the [Start OCR] button to call a service that starts the OCR. If an error occurs, it is possible that you did not configure the key correctly or that you are not connected to the Internet.

When the service call returns, the URL of the "Get Text" service displays (beneath the "Location Address" label), and the [Get Text] button is enabled, as shown in Fig. 5.

Fig. 5

Click the [Get Text] button. This calls the Location Address service and displays the status. If the status is "Succeeded", it displays the text in the image, as shown in Fig. 6.

Fig. 6

## The code

Let's take a look at the code in this application. It is all written in C#. The relevant parts are the calls to the two web service: "Recognize Text" and "Get Recognize Text Operation Result". The first call kicks off the OCR job; the second call returns the status of the job and returns the text found, when complete.

The code is in the TextService static class.

This class has a constant: visionEndPoint, which is the base URL of the Computer Vision Cognitive Service you created above. The code in the repository is in Listing 2. You may need to modify the URL, if you created your service in a different region.

Listing 2:

const string visionEndPoint = "https://westus.api.cognitive.microsoft.com/";

### Recognize Text

The call to the "Recognize Text" API is in Listing 1:

Listing 3:

public static async Task<string> GetRecognizeTextOperationResultsFromFile(string imageLocation, string computerVisionKey)
{
    var cogSvcUrl = visionEndPoint + "vision/v2.0/recognizeText?mode=Printed";
    HttpClient client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
    HttpResponseMessage response;
    // Convert image to a Byte array
    byte[] byteData = null;
    using (FileStream fileStream = new FileStream(imageLocation, FileMode.Open, FileAccess.Read))
    {
        BinaryReader binaryReader = new BinaryReader(fileStream);
        byteData = binaryReader.ReadBytes((int)fileStream.Length);
    }

    // Call web service; pass image; wait for response
    using (ByteArrayContent content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(cogSvcUrl, content);
    }

    // Read results
    RecognizeTextResult results = null;
    if (response.IsSuccessStatusCode)
    {
        var data = await response.Content.ReadAsStringAsync();
        results = JsonConvert.DeserializeObject(data);
    }
    var headers = response.Headers;
    var locationHeaders = response.Headers.GetValues("Operation-Location");
    string locationAddress = "";
    IEnumerable<string> values;
    if (headers.TryGetValues("Operation-Location", out values))
    {
        locationAddress = values.First();
    }
    return locationAddress;
}

The first thing we do is construct the specific URL of this service call.

Then we use the System.Net.Http library to submit an HTTP POST request to this URL, passing in the image as an array of bytes in the body of the request. For more information on passing a binary file to a web service, see this article.

When the response returns, we check the headers for the "Operation-Location", which is the URL of the next web service to call. The URL contains a GUID that uniquely identifies this job. We save this for our next call.

Get Recognize Text Operation Result

After kicking of the OCR, we need to call a different service to check the status and get the results. The code in Listing 4 does this.

Listing 4:

public static async Task GetRecognizeTextOperationResults(string locationAddress, string computerVisionKey) 
 { 
    var client = new HttpClient(); 
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey); 
    var response = await client.GetAsync(locationAddress); 
    RecognizeTextResult results = null; 
    if (response.IsSuccessStatusCode) 
    { 
        var data = await response.Content.ReadAsStringAsync(); 
        results = JsonConvert.DeserializeObject(data); 
    } 
    return results; 
 }

This code is much simpler because it is an HTTP GET and we don't need to pass anything in the request body.

We simply submit an HTTP GET request and use the Newtonsoft.Json libary to convert the response to a string.

Here is the complete code in the TextService class:

Listing 5:

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using TextLib.Models;

namespace TextLib
{

    public static class TextService
    {
        const string visionEndPoint = "https://westus.api.cognitive.microsoft.com/";

public static async Task<string> GetRecognizeTextOperationResultsFromFile(string imageLocation, string computerVisionKey)
{
    var cogSvcUrl = visionEndPoint + "vision/v2.0/recognizeText?mode=Printed";
    HttpClient client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
    HttpResponseMessage response;
    // Convert image to a Byte array
    byte[] byteData = null;
    using (FileStream fileStream = new FileStream(imageLocation, FileMode.Open, FileAccess.Read))
    {
        BinaryReader binaryReader = new BinaryReader(fileStream);
        byteData = binaryReader.ReadBytes((int)fileStream.Length);
    }

    // Call web service; pass image; wait for response
    using (ByteArrayContent content = new ByteArrayContent(byteData))
    {
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(cogSvcUrl, content);
    }

    // Read results
    RecognizeTextResult results = null;
    if (response.IsSuccessStatusCode)
    {
        var data = await response.Content.ReadAsStringAsync();
        results = JsonConvert.DeserializeObject(data);
    }
    var headers = response.Headers;
    var locationHeaders = response.Headers.GetValues("Operation-Location");
    string locationAddress = "";
    IEnumerable<string> values;
    if (headers.TryGetValues("Operation-Location", out values))
    {
        locationAddress = values.First();
    }
    return locationAddress;
}

        public static async Task GetRecognizeTextOperationResults(string locationAddress, string computerVisionKey)
        {
            var client = new HttpClient();
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", computerVisionKey);
            var response = await client.GetAsync(locationAddress);
            RecognizeTextResult results = null;
            if (response.IsSuccessStatusCode)
            {
                var data = await response.Content.ReadAsStringAsync();
                results = JsonConvert.DeserializeObject(data);
            }
            return results;
        }

    }
}

The remaining code

There is other code in this application to do things like select the file from disk and loop through the JSON to concatenate all the text; but this code is very simple and (hopefully) self-documenting. You may choose other ways to get the file and handle the JSON in the response.

In this article, I've focused on the code to manage the Cognitive Services calls and responses to those calls in order to get the text from a picture of text.