In a previous article, I showed how to use the Microsoft Cognitive Services Computer Vision API to perform Optical Character Recognition (OCR) on a document containing a picture of text. We did so by making an HTTP POST to a REST service.
If you are developing with .NET languages, such as C# Visual Basic, or F#, a NuGet Package makes this call easier. Classes in this package abstract the REST call, so can write less and simpler code; and strongly-typed objects allow you to make the call and parse the results more easily.
To get started, you will first need to create a Computer Vision service in Azure and retrieve the endpoint and key, as described here.
Then, you can create a new C# project in Visual Studio. I created a WPF application, which can be found and downloaded at my GitHub account. Look for the project named "OCR-DOTNETDemo". Fig. 1 shows how to create a new WPF project in Visual Studio.
In the Solution Explorer, right-click the project and select "Manage NuGet Packages", as shown in Fig. 2.
Search for and install the "Microsoft.Azure.CognitiveServices.Vision.ComputerVision", as shown in Fig. 3.
The important classes in this package are:
A class representing the object returned from the OCR service. It consists of an array of OcrRegions, each of which contains an array of OcrLines, each of which contains an array of OcrWords. Each OcrWord has a text property, representing the text that is recognized. You can reconstruct all the text in an image by looping through each array.
This class contains the RecognizePrintedTextInStreamAsync method, which abstracts the HTTP REST call to the OCR service.
This class constructs credentials that will be passed in the header of the HTTP REST call.
Create a new class in the project named "OCRServices" and make its scope "internal" or "public"
Add the following "using" statements to the top of the class:
Add the following methods to this class:
The UploadAndRecognizeImageAsync method calls the HTTP REST OCR service (via the NuGet library extractions) and returns a strongly-typed object representing the results of that call. Replace the key and the endPoint in this method with those associated with your Computer Vision service.
The FormatOcrResult method loops through each region, line, and word of the service's return object. It concatenates the text of each word, separating words by spaces, lines by a carriage return and line feed, and regions by a double carriage return / line feed.
Add a Button and a TextBlock to the MainWindow.xaml form.
In the click event of that button add the following code.
Replace xxxxxxx.jpg with the full path of an image file on disc that contains pictures of text.
You will need to add the following using statement to the top of MainWindow.xaml.cs.
If you like, you can add code to allow users to retrieve an image and display that image on your form. This code is in the sample application from my GitHub repository, if you want to view it.
Running the form should look something like Fig. 4.