In a recent article, I introduced you to the "Recognize Text" API that returns the text in an image - process known as "Optical Character Recognition", or "OCR".
In this article, I will show how to call this API from a .NET application.
Recall that the "Recognize Text" API consists of two web service calls:
We call the "Recognize Text" web service and pass an image to begin the process.
We call the "Get Recognize Text Operation Result" web service to check the status of the processing and retrieive the resulting text, when the process is complete.
The sample .NET application
If you want to follow along, the code is available in the RecognizeTextDemo found in this GitHub repository.
To get started, you will need to create a Computer Vision key, as described here.
Creating this service gives you a URI endpoint to call as a web service, and an API key, which must be passed in the header of web service calls.
To run the app, you will need to copy the key created above into the App.config file. Listing 1 shows a sample config file:
You will also need an image with some text in it. For this demo, we will use the image shown in Fig. 1.
When you run the app, you will see the screen in Fig. 2.
Press the [Get File] button and select the saved image, as shown in Fig. 3.
Click the [Open] button. The Open File Dialog closes, the full path of the image is displays on the form, and the [Start OCR] button is enabled, as shown in Fig. 4.
Click the [Start OCR] button to call a service that starts the OCR. If an error occurs, it is possible that you did not configure the key correctly or that you are not connected to the Internet.
When the service call returns, the URL of the "Get Text" service displays (beneath the "Location Address" label), and the [Get Text] button is enabled, as shown in Fig. 5.
Click the [Get Text] button. This calls the Location Address service and displays the status. If the status is "Succeeded", it displays the text in the image, as shown in Fig. 6.
## The code
Let's take a look at the code in this application. It is all written in C#. The relevant parts are the calls to the two web service: "Recognize Text" and "Get Recognize Text Operation Result". The first call kicks off the OCR job; the second call returns the status of the job and returns the text found, when complete.
The code is in the TextService static class.
This class has a constant: visionEndPoint, which is the base URL of the Computer Vision Cognitive Service you created above. The code in the repository is in Listing 2. You may need to modify the URL, if you created your service in a different region.
### Recognize Text
The call to the "Recognize Text" API is in Listing 1:
The first thing we do is construct the specific URL of this service call.
Then we use the System.Net.Http library to submit an HTTP POST request to this URL, passing in the image as an array of bytes in the body of the request. For more information on passing a binary file to a web service, see this article.
When the response returns, we check the headers for the "Operation-Location", which is the URL of the next web service to call. The URL contains a GUID that uniquely identifies this job. We save this for our next call.
Get Recognize Text Operation Result
After kicking of the OCR, we need to call a different service to check the status and get the results. The code in Listing 4 does this.
This code is much simpler because it is an HTTP GET and we don't need to pass anything in the request body.
We simply submit an HTTP GET request and use the Newtonsoft.Json libary to convert the response to a string.
Here is the complete code in the TextService class:
The remaining code
There is other code in this application to do things like select the file from disk and loop through the JSON to concatenate all the text; but this code is very simple and (hopefully) self-documenting. You may choose other ways to get the file and handle the JSON in the response.
In this article, I've focused on the code to manage the Cognitive Services calls and responses to those calls in order to get the text from a picture of text.