In a previous article, I described the details of the OCR Service, which is part of the Microsoft Cognitive Services Computer Vision API.
To make this API useful, you need to write some code and build an application that calls this service.
In this article, I will show an example of a JavaScript application that calls the OCR web service.
If you want to follow along, you can find all the code in the "OCRDemo" project, included in this set of demos.
To use this demo project, you will first need to create a Computer Vision API service, as described here.
Read the project's read.me file, which explains the setup you need to do in order to run this with your account.
If you open index.html in the browser, you will see that it displays an image of a poem, along with some controls on the left:
- A dropdown list to change the poem image
- A dropdown list to select the language of the poem text
- A [Get Text] button that calls the web service.
Fig. 1 shows index.html when it first loads:
Let's look at the JavaScript that runs when you click the [Get Text] button. You can find it in script.js
print 'hello world!'$("#GetTextFromPictureButton").click(function () { var outputDiv = $("#OutputDiv"); outputDiv.text("Thinking…"); var url = $("#ImageUrlDropdown").val(); var language = $("#LanguageDropdown").val(); try { var computerVisionKey = getKey(); } catch(err) { outputDiv.html(missingKeyErrorMsg); return; } var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr"; webSvcUrl = webSvcUrl + "?language=" + language; $.ajax({ type: "POST", url: webSvcUrl, headers: { "Ocp-Apim-Subscription-Key": computerVisionKey }, contentType: "application/json", data: '{ "Url": "' + url + '" }' }).done(function (data) { outputDiv.text(""); var regionsOfText = data.regions; for (var r = 0; r < regionsOfText.length; h++) { var linesOfText = data.regions[r].lines; for (var l = 0; l < linesOfText.length; l++) { var output = ""; var thisLine = linesOfText[l]; var words = thisLine.words; for (var w = 0; w < words.length; w++) { var thisWord = words[w]; output += thisWord.text; output += " "; } var newDiv = "" + output + ""; outputDiv.append(newDiv); } outputDiv.append("
"); } }).fail(function (err) { $("#OutputDiv").text("ERROR!" + err.responseText); });
This code uses jQuery to simplify selecting elements, but raw JavaScript would work just as well.
On the page is an empty div with the id="OutputDiv"
In the first two lines, we select this div and set its text to "Thinking…" while the web service is being called.
var outputDiv = $("#OutputDiv");
outputDiv.text("Thinking…");
Next, we get the URL of the image containing the currently displayed poem and the selected language. These both come from the selected items of the two dropdowns.
var url = $("#ImageUrlDropdown").val(); var language = $("#LanguageDropdown").val();
Then, we get the API key, which is in the getKey() function, which is stored in the getkey.js file. You will need to update this file yourself, adding your own key, as described in the read.me.
try { var computerVisionKey = getKey(); } catch(err) { outputDiv.html(missingKeyErrorMsg); return; }
Now, it's time to call the web service. My Computer Vision API service was created in the West Central US region, so I've hard-coded the URL. You may need to change this, if you created your service in a different region.
I add a querystring parameter to the URL to indicate the slected language.
Then, I call the web service by submitting an HTTP POST request to the web service URL, passing in the appropriate headers and constructing a JSON document to pass in the request body.
var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr"; webSvcUrl = webSvcUrl + "?language=" + language; $.ajax({ type: "POST", url: webSvcUrl, headers: { "Ocp-Apim-Subscription-Key": computerVisionKey }, contentType: "application/json", data: '{ "Url": "' + url + '" }'
Finally, I process the results when the HTTP response returns.
JavaScript is a dynamic language, so I don't need to create any classes to identify the structure of the JSON that is returned; I just need to know the names of each property.
The returned JSON contains an array of regions; each region contains an array of lines; and each line contains an array of words.
In this simple example, I simply loop through each word in each line in each region, concatenating them together and adding some HTML to format line breaks.
Then, I append this HTML to the outputDiv and follow it up with a horizontal rule to emphasize that it is the end.
}).done(function (data) { outputDiv.text(""); var regionsOfText = data.regions; for (var r = 0; r < regionsOfText.length; h++) { var linesOfText = data.regions[r].lines; for (var l = 0; l < linesOfText.length; l++) { var output = ""; var thisLine = linesOfText[l]; var words = thisLine.words; for (var w = 0; w < words.length; w++) { var thisWord = words[w]; output += thisWord.text; output += " "; } var newDiv = "" + output + ""; outputDiv.append(newDiv); } outputDiv.append("
"); }
I also, catch errors that might occur, displaying a generic message in the outputDiv, where the returned text would have been.
catch(err) { outputDiv.html(missingKeyErrorMsg); return; }
Fig. 2 shows the results after a successful web service call.
Try this yourself to see it in action. The process is very similar in other languages.