Optical Character Recognition with Project Oxford

Project Oxford offers a set of APIs to analyze the content of images. One of these APIs is a REST web service that can determine the words and punctuation contained in a picture. This is accomplished by a simple REST web service call.

To begin, you must register with Project Oxford at http://www.projectoxford.ai.

Then, get the key at https://www.projectoxford.ai/Subscription

Figure 1: Subscription key

To call the API we send a POST request to https://api.projectoxford.ai/vision/v1/ocr

If you like, you may add optional querystring parameters to the URL, language and detectionOrientation to have the service determine automatically whether the text is tilted. If you omit those parameters, Oxford will make an effort to determine these values on its own. As you might guess, it is faster if you provide this information to Oxford.

In the header of the request, you must provide your key as in the following example:

Ocp-Apim-Subscription-Key:15e24a988a179f13a25aac4713aec800

Optionally, you can provide the content-type of the data you are sending. To send a URL, use

Content-Type: application/json

To send an image stream, you can set the Coneplication/octet-stream orten-Type to application/octet-stream or multipart/form-data.

In the body of POST request, you can send JSON that includes the URL of the image location. Here is an example:

{ "Url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png"}

This web service returns a JSON object containing an array of regions, each of which representing a block of text found in the image. Within each region is an array of lines and within each line is an array of words.

Region, line, and word objects contain a boundingBox object with coordinates of where to find the corresponding object within the image. Each word object contains the actual text detected, including any punctuation.

The beauty of a REST web service is that you can call it from any language or platform that supports HTTP requests (which is pretty much all of them).

The following example uses JavaScript and jQuery to call this API. It assumes that you have a DIV tag on the page with id="OutputDiv" and that you have a reference to jQuery before this code.

var myKey="";

var url="http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png";

$.ajax({

    type: "POST",

    url: "https://api.projectoxford.ai/vision/v1/ocr?language=en",

    headers: { "Ocp-Apim-Subscription-Key":myKey },

    contentType: "application/json",

    data: '{ "Url": "' + url + '" }'

}).done(function (data) {

    var outputDiv = $("#OutputDiv");

    outputDiv.text("");

    var linesOfText = data.regions[0].lines;

    // Loop through each line of text and create a DIV tag

    // containg each word, separated by a space

    // Append this newly-created DIV to OutputDiv

    for (var i = 0; i < linesOfText.length; i++) {

        var output = "";

        var thisLine = linesOfText[i];

        var words = thisLine.words;

        for (var j = 0; j < words.length; j++) {

            var thisWord = words[j];

            output += thisWord.text;

            output += " ";

        var newDiv = "" + output + "
";

        outputDiv.append(newDiv);

}).fail(function (err) {

    $("#OutputDiv").text("ERROR!" + err.responseText);

});

The call to the web service is done with the line

$.ajax({

    type: "POST",

    url: "https://api.projectoxford.ai/vision/v1/ocr?language=en",

    headers: { "Ocp-Apim-Subscription-Key":myKey },

    contentType: "application/json",

    data: '{ "Url": "' + url + '" }'

which sends a POST request and passes the URL as part of a JSON object in the request Body.

This request is Asynchronous, so the "done" function is called when it returns successfully.

}).done(function (data) {

The function tied to the "done" event parses through the returned JSON and displays it on the screen.

If an error occurs, we output a simple error message to the user in the "fail" function.

}).fail(function (err) {

    $("#OutputDiv").text("ERROR!" + err.responseText);

});

Most of the code above is just formatting the output, so the REST call itself is quite simple. Project Oxford makes this type of analysis much easier for developers, regardless of their platform.

You can find this code at my Github repository.

In this article, you learned about the Project Oxford OCR API and how to call it from a JavaScript application.