As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language. One of these APIs is a REST web service that can determine the words and punctuation contained in a picture. This is accomplished by a simple REST web service call.
The Cognitive Services Optical Character Recognition (OCR) service is part of the Custom Vision API. It takes as input a picture of text and returns the words found in the image.
To get started, you will need an Azure account and a Cognitive Services Vision API key.
If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.
Once you have an Azure Account, follow the instructions in this article to generate a Cognitive Services Computer Vision key.
To use this API, you simply have to make a POST request to the following URL:
https://[location].api.cognitive.microsoft.com/vision/v1.0/ocr
where [location] is the Azure location where you created your API key (above).
Optionally, you can add the following 2 querystring parameters to the URL:
- Language: the 2-digit language abbreviation abbreviation. Use “en” for English. Currently, 25 languages are supported. If omitted, the service will attempt to auto-detect the language
- detectOrientation: Set this to “true” if you want to support upside-down or rotated images.
The HTTP header of the request should include the following:
Ocp-Apim-Subscription-Key.
The Cognitive Services Computer Vision key you generated above.
Content-Type
This tells the service how you will send the image. The options are:
- application/json
- application/octet-stream
- multipart/form-data
If the image is accessible via a public URL, set the Content-Type to application/json and send JSON in the body of the HTTP request in the following format
{"url":"imageurl"}
where imageurl is a public URL pointing to the image. For example, to perform OCR on an image of an Edgar Allen Poe poem, submit the following JSON:
{"url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png"}
If you plan to send the image itself to the web service, set the content type to either "application/octet-stream" or “multipart/form-data” and submit the binary image in the body of the HTTP request.
The full request looks something like:POST https://westus.api.cognitive.microsoft.com/vision/v1.0/ocr HTTP/1.1
Content-Type: application/json
Host: westus.api.cognitive.microsoft.com
Content-Length: 62
Ocp-Apim-Subscription-Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
{ "url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png" }
For example, passing a URL with the following picture:
(found online at http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png)
returned the following data:
{
"textAngle": 0.0,
"orientation": "NotDetected",
"language": "en",
"regions": [
{
"boundingBox": "31,6,435,478",
"lines": [
{
"boundingBox": "114,6,352,23",
"words": [
{
"boundingBox": "114,6,24,22",
"text": "A"
},
{
"boundingBox": "144,6,93,23",
"text": "Dream"
},
{
"boundingBox": "245,6,95,23",
"text": "Within"
},
{
"boundingBox": "350,12,14,16",
"text": "a"
},
{
"boundingBox": "373,6,93,23",
"text": "Dream"
}
]
},
{
"boundingBox": "31,50,187,16",
"words": [
{
"boundingBox": "31,50,31,12",
"text": "Take"
},
{
"boundingBox": "66,50,23,12",
"text": "this"
},
{
"boundingBox": "93,50,24,12",
"text": "kiss"
},
{
"boundingBox": "121,54,33,12",
"text": "upon"
},
{
"boundingBox": "158,50,19,12",
"text": "the"
},
{
"boundingBox": "181,50,37,12",
"text": "brow!"
}
]
},
{
"boundingBox": "31,67,194,16",
"words": [
{
"boundingBox": "31,67,31,15",
"text": "And,"
},
{
"boundingBox": "67,67,12,12",
"text": "in"
},
{
"boundingBox": "82,67,46,16",
"text": "parting"
},
{
"boundingBox": "132,67,31,12",
"text": "from"
},
{
"boundingBox": "167,71,25,12",
"text": "you"
},
{
"boundingBox": "195,71,30,11",
"text": "now,"
}
]
},
{
"boundingBox": "31,85,159,12",
"words": [
{
"boundingBox": "31,85,32,12",
"text": "Thus"
},
{
"boundingBox": "67,85,35,12",
"text": "much"
},
{
"boundingBox": "107,86,16,11",
"text": "let"
},
{
"boundingBox": "126,89,20,8",
"text": "me"
},
{
"boundingBox": "150,89,40,8",
"text": "avow-"
}
]
},
{
"boundingBox": "31,102,193,16",
"words": [
{
"boundingBox": "31,103,26,11",
"text": "You"
},
{
"boundingBox": "61,106,19,8",
"text": "are"
},
{
"boundingBox": "84,104,21,10",
"text": "not"
},
{
"boundingBox": "109,106,44,12",
"text": "wrong,"
},
{
"boundingBox": "158,102,27,12",
"text": "who"
},
{
"boundingBox": "189,102,35,12",
"text": "deem"
}
]
},
{
"boundingBox": "31,120,214,16",
"words": [
{
"boundingBox": "31,120,29,12",
"text": "That"
},
{
"boundingBox": "64,124,21,12",
"text": "my"
},
{
"boundingBox": "89,121,29,15",
"text": "days"
},
{
"boundingBox": "122,120,30,12",
"text": "have"
},
{
"boundingBox": "156,121,30,11",
"text": "been"
},
{
"boundingBox": "191,124,7,8",
"text": "a"
},
{
"boundingBox": "202,121,43,14",
"text": "dream;"
}
]
},
{
"boundingBox": "31,138,175,16",
"words": [
{
"boundingBox": "31,139,22,11",
"text": "Yet"
},
{
"boundingBox": "57,138,11,12",
"text": "if"
},
{
"boundingBox": "70,138,31,16",
"text": "hope"
},
{
"boundingBox": "105,138,21,12",
"text": "has"
},
{
"boundingBox": "131,138,37,12",
"text": "flown"
},
{
"boundingBox": "172,142,34,12",
"text": "away"
}
]
},
{
"boundingBox": "31,155,140,16",
"words": [
{
"boundingBox": "31,156,13,11",
"text": "In"
},
{
"boundingBox": "48,159,8,8",
"text": "a"
},
{
"boundingBox": "59,155,37,16",
"text": "night,"
},
{
"boundingBox": "100,159,14,8",
"text": "or"
},
{
"boundingBox": "118,155,12,12",
"text": "in"
},
{
"boundingBox": "134,159,7,8",
"text": "a"
},
{
"boundingBox": "145,155,26,16",
"text": "day,"
}
]
},
{
"boundingBox": "31,173,144,15",
"words": [
{
"boundingBox": "31,174,13,11",
"text": "In"
},
{
"boundingBox": "48,177,8,8",
"text": "a"
},
{
"boundingBox": "59,173,43,15",
"text": "vision,"
},
{
"boundingBox": "107,177,13,8",
"text": "or"
},
{
"boundingBox": "124,173,12,12",
"text": "in"
},
{
"boundingBox": "140,177,35,11",
"text": "none,"
}
]
},
{
"boundingBox": "31,190,180,16",
"words": [
{
"boundingBox": "31,191,11,11",
"text": "Is"
},
{
"boundingBox": "47,190,8,12",
"text": "it"
},
{
"boundingBox": "59,190,58,12",
"text": "therefore"
},
{
"boundingBox": "121,190,19,12",
"text": "the"
},
{
"boundingBox": "145,191,23,11",
"text": "less"
},
{
"boundingBox": "173,191,38,15",
"text": "gone?"
}
]
},
{
"boundingBox": "31,208,150,12",
"words": [
{
"boundingBox": "31,208,20,12",
"text": "All"
},
{
"boundingBox": "55,208,24,12",
"text": "that"
},
{
"boundingBox": "83,212,19,8",
"text": "we"
},
{
"boundingBox": "107,212,19,8",
"text": "see"
},
{
"boundingBox": "131,212,13,8",
"text": "or"
},
{
"boundingBox": "148,212,33,8",
"text": "seem"
}
]
},
{
"boundingBox": "31,226,194,12",
"words": [
{
"boundingBox": "31,227,11,11",
"text": "Is"
},
{
"boundingBox": "46,226,21,12",
"text": "but"
},
{
"boundingBox": "71,230,7,8",
"text": "a"
},
{
"boundingBox": "82,226,40,12",
"text": "dream"
},
{
"boundingBox": "126,226,41,12",
"text": "within"
},
{
"boundingBox": "171,230,7,8",
"text": "a"
},
{
"boundingBox": "182,226,43,12",
"text": "dream."
}
]
},
{
"boundingBox": "31,261,133,12",
"words": [
{
"boundingBox": "31,262,5,11",
"text": "I"
},
{
"boundingBox": "41,261,33,12",
"text": "stand"
},
{
"boundingBox": "78,261,32,12",
"text": "amid"
},
{
"boundingBox": "114,261,19,12",
"text": "the"
},
{
"boundingBox": "137,265,27,8",
"text": "roar"
}
]
},
{
"boundingBox": "31,278,169,15",
"words": [
{
"boundingBox": "31,278,18,12",
"text": "Of"
},
{
"boundingBox": "52,282,7,8",
"text": "a"
},
{
"boundingBox": "63,278,95,12",
"text": "surf-tormented"
},
{
"boundingBox": "162,278,38,15",
"text": "shore,"
}
]
},
{
"boundingBox": "31,296,174,15",
"words": [
{
"boundingBox": "31,296,28,12",
"text": "And"
},
{
"boundingBox": "63,297,4,11",
"text": "I"
},
{
"boundingBox": "72,296,28,12",
"text": "hold"
},
{
"boundingBox": "104,296,41,12",
"text": "within"
},
{
"boundingBox": "149,300,20,11",
"text": "my"
},
{
"boundingBox": "173,296,32,12",
"text": "hand"
}
]
},
{
"boundingBox": "31,314,169,16",
"words": [
{
"boundingBox": "31,314,42,12",
"text": "Grains"
},
{
"boundingBox": "78,314,15,12",
"text": "of"
},
{
"boundingBox": "95,314,19,12",
"text": "the"
},
{
"boundingBox": "119,315,43,15",
"text": "golden"
},
{
"boundingBox": "167,314,33,12",
"text": "sand-"
}
]
},
{
"boundingBox": "31,331,189,16",
"words": [
{
"boundingBox": "31,332,31,11",
"text": "How"
},
{
"boundingBox": "66,331,28,12",
"text": "few!"
},
{
"boundingBox": "99,333,20,14",
"text": "yet"
},
{
"boundingBox": "123,331,27,12",
"text": "how"
},
{
"boundingBox": "154,331,28,16",
"text": "they"
},
{
"boundingBox": "186,335,34,12",
"text": "creep"
}
]
},
{
"boundingBox": "31,349,206,16",
"words": [
{
"boundingBox": "31,349,55,16",
"text": "Through"
},
{
"boundingBox": "90,353,20,11",
"text": "my"
},
{
"boundingBox": "115,349,44,16",
"text": "fingers"
},
{
"boundingBox": "163,351,12,10",
"text": "to"
},
{
"boundingBox": "179,349,20,12",
"text": "the"
},
{
"boundingBox": "203,350,34,15",
"text": "deep,"
}
]
},
{
"boundingBox": "31,366,182,16",
"words": [
{
"boundingBox": "31,366,39,12",
"text": "While"
},
{
"boundingBox": "74,367,5,11",
"text": "I"
},
{
"boundingBox": "83,370,39,12",
"text": "weep-"
},
{
"boundingBox": "126,366,36,12",
"text": "while"
},
{
"boundingBox": "166,367,5,11",
"text": "I"
},
{
"boundingBox": "175,367,38,15",
"text": "weep!"
}
]
},
{
"boundingBox": "31,384,147,16",
"words": [
{
"boundingBox": "31,385,11,11",
"text": "O"
},
{
"boundingBox": "47,384,31,12",
"text": "God!"
},
{
"boundingBox": "84,388,21,8",
"text": "can"
},
{
"boundingBox": "110,385,4,11",
"text": "I"
},
{
"boundingBox": "119,386,20,10",
"text": "not"
},
{
"boundingBox": "144,388,34,12",
"text": "grasp"
}
]
},
{
"boundingBox": "31,402,170,16",
"words": [
{
"boundingBox": "31,402,37,12",
"text": "Them"
},
{
"boundingBox": "72,402,29,12",
"text": "with"
},
{
"boundingBox": "105,406,7,8",
"text": "a"
},
{
"boundingBox": "116,402,42,16",
"text": "tighter"
},
{
"boundingBox": "162,403,39,15",
"text": "clasp?"
}
]
},
{
"boundingBox": "31,419,141,12",
"words": [
{
"boundingBox": "31,420,11,11",
"text": "O"
},
{
"boundingBox": "47,419,31,12",
"text": "God!"
},
{
"boundingBox": "84,423,21,8",
"text": "can"
},
{
"boundingBox": "110,420,4,11",
"text": "I"
},
{
"boundingBox": "119,421,20,10",
"text": "not"
},
{
"boundingBox": "144,423,28,8",
"text": "save"
}
]
},
{
"boundingBox": "31,437,179,16",
"words": [
{
"boundingBox": "31,438,26,11",
"text": "One"
},
{
"boundingBox": "62,437,31,12",
"text": "from"
},
{
"boundingBox": "97,437,19,12",
"text": "the"
},
{
"boundingBox": "120,437,45,16",
"text": "pitiless"
},
{
"boundingBox": "169,438,41,11",
"text": "wave?"
}
]
},
{
"boundingBox": "31,454,161,12",
"words": [
{
"boundingBox": "31,455,11,11",
"text": "Is"
},
{
"boundingBox": "47,454,15,12",
"text": "all"
},
{
"boundingBox": "66,454,25,12",
"text": "that"
},
{
"boundingBox": "94,458,19,8",
"text": "we"
},
{
"boundingBox": "118,458,19,8",
"text": "see"
},
{
"boundingBox": "142,458,13,8",
"text": "or"
},
{
"boundingBox": "159,458,33,8",
"text": "seem"
}
]
},
{
"boundingBox": "31,472,185,12",
"words": [
{
"boundingBox": "31,473,23,11",
"text": "But"
},
{
"boundingBox": "58,476,7,8",
"text": "a"
},
{
"boundingBox": "69,472,40,12",
"text": "dream"
},
{
"boundingBox": "113,472,41,12",
"text": "within"
},
{
"boundingBox": "158,476,7,8",
"text": "a"
},
{
"boundingBox": "169,472,47,12",
"text": "dream?"
}
]
}
]
}
]
}
Note that the image is split into an array of regions; each region contains an array of lines; and each line contains an array of words. This is done so that you can replace or block out one or more specific words, lines, or regions.
Below is a jQuery code snippet making a request to this service to perform OCR on images of text. You can download the full application at https://github.com/DavidGiard/CognitiveSvcsDemos.
var language = $("#LanguageDropdown").val();
var computerVisionKey = getKey() || "Copy your Subscription key here";
var webSvcUrl = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";
webSvcUrl = webSvcUrl + "?language=" + language;
$.ajax({
type: "POST",
url: webSvcUrl,
headers: { "Ocp-Apim-Subscription-Key": computerVisionKey },
contentType: "application/json",
data: '{ "Url": "' + url + '" }'
}).done(function (data) {
outputDiv.text("");var regionsOfText = data.regions;
for (var h = 0; h < regionsOfText.length; h++) {
var linesOfText = data.regions[h].lines;
for (var i = 0; i < linesOfText.length; i++) {
var output = "";var thisLine = linesOfText[i];
var words = thisLine.words;
for (var j = 0; j < words.length; j++) {
var thisWord = words[j];
output += thisWord.text;
output += " ";}
var newDiv = "" + output + "";
outputDiv.append(newDiv);}
outputDiv.append("
");
}
}).fail(function (err) {
$("#OutputDiv").text("ERROR!" + err.responseText);
});
You can find the full documentation – including an in-browser testing tool - for this API here.
Sending requests to the Cognitive Services OCR API makes it simple to convert a picture of text into text.