The Microsoft Cognitive Services Computer Vision API contains functionality to infer a lot of information about a given image. One capability is to convert pictures of text into text, a process known as "Optical Characer Recognition" or "OCR".

Performing OCR on an image is simple and inexpensive. It is done through a web service call; but first, you must set up the Computer Vision Service, as described in this article.

In that article, you were told to save two pieces of information about the service: The API Key and the URL. Here is where you will use them.

HTTP Endpoint

The OCR service is a web service. To call it, you send an HTTP POST request to an HTTP endpoint. The endpoint consists of the URL copied above, followed by "vision/v2.0/ocr", followed by some optional querystring parameters (which we will discuss later).

So, if you create your service in the EAST US Azure region, the copied URL will be

https://eastus.api.cognitive.microsoft.com/

and the HTTP endpoint for the OCR service will be

https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr

Querystring Parameters

The optional querystring parameters are

language:

The 2-character language code of the text you are recognizing. This helps the service more accurately and quickly match pictures of words to the words they represent. If you omit this parameter, the system will analyze the text and guess an appropriate language. Currently, the service supports 26 languages. The 2-character code of each supported language is listed in Appendix 1 at the bottom of this article.

detectOrientation

"true", if you want the service to adjust the orientation of the image before performing OCR. If you pass "false" or omitting this parameter, the service will assume the image is oriented correctly.

If you have an image with English text and you want the service to detect and adjust the image's orientation, the above URL becomes:

https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true

HTTP Headers

In the header of the HTTP request, you must add the following name/value pairs:

Ocp-Apim-Subscription-Key

The API key you copied above

Content-Type

The media type of the image you are passing to the service in the body of the HTTP request

Possible values are:

  • application/json
  • application/octet-stream
  • multipart/form-data

The value you pass must be consistent with the data in the body.

If you select "application/json", you must pass in the request body a URL pointing to the image on the public Internet.

If you select "application/json" or "application/octet-stream", you must pass the actual binary image in the request body.

Body

In the body of the HTTP request, you pass the image you want the service to analyze.

If you selected "application/json" as the Content-Type in the header, pass a URL within a JSON document, with the following format:

{"url":"image_url"}

where image_url is a URL pointing to the image you want to recognize.

Here is an example:

{"url":"https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png"}

If you selected "application/octet-stream" or "multipart/form-data" as the Content-Type in the header, pass the actual binary image in the body of the request.

The service has some restrictions on the images it can analyze.

It cannot analyze an image larger than 4MB.

The width and height of the image must be between 50 and 4,200 pixels

The image must be one of the following formats: JPEG, PNG, GIF, BMP.

Sample call with Curl

Here is an example of a call to the service, using Curl:

curl -v -X POST "https://eastus.api.cognitive.microsoft.com/vision/v2.0/ocr?language=en&detectOrientation=true" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: f27c7436c3a64d91a177111a6b594537" --data-ascii "{'url' : 'https://www.themeasuredmom.com/wp-content/uploads/2016/03/Slide11.png'}"

(NOTE: I modified the key, so it will not work. You will need to replace it with your own key if you want this to work.)

Response

If all goes well, you will receive an HTTP 200 (OK) response.

In the body of that response will be the results of the OCR in JSON format.

At the top level is the language, textAngle, and orientation

Below that is an array of 0 or more text regions. Each region represents a block of text within the image.

Each region contains an array of 0 or more lines of text.

Each line contains an array of 0 or more words.

Each region, line, and word contains a bounding box, consisting of the left, top, width, and height of the word(s) within.

Here is a partial example of the JSON returned from a successful web service call:

{
    "language": "en",
    "textAngle": 0.0,
    "orientation": "Up",
    "regions": [
        {
            "boundingBox": "147,96,622,1095",
            "lines": [
                {
                    "boundingBox": "408,96,102,56",
                    "words": [
                        {
                            "boundingBox": "408,96,102,56",
                            "text": "Hey"
                        }
                    ]
                },
                {
                    "boundingBox": "282,171,350,45",
                    "words": [
                        {
                            "boundingBox": "282,171,164,45",
                            "text": "Diddle"
                        },
                        {
                            "boundingBox": "468,171,164,45",
                            "text": "Diddle"
                        }
                    ]
                },
                etc...
                 }
            ]
        }
    ]
}
  

The full JSON can be found in Appendix 2 below.

Errors

If an error occurs, the response will not by HTTP 200. It will be an HTTP Response code greater than 400. Additional error information will be in the body of the response.

Common errors include:

  • Images too large or too small
  • Image not found (It might require a password or be behind a firewall)
  • Invalid image format
  • Incorrect API key
  • Incorrect URL (It must match the API key. If you have multiple services, it’s easy to mix them up)
  • Miscellaneous spelling errors (e.g., not entering a valid language code or misspelling a header parameter)

In this article, I showed how to call the Cognitive Services OCR Computer Vision Service.

Appendix 1: Supported languages

zh-Hans (ChineseSimplified)
zh-Hant (ChineseTraditional)
cs (Czech)
da (Danish)
nl (Dutch)
en (English)
fi (Finnish)
fr (French)
de (German)
el (Greek)
hu (Hungarian)
it (Italian)
ja (Japanese)
ko (Korean)
nb (Norwegian)
pl (Polish)
pt (Portuguese,
ru (Russian)
es (Spanish)
sv (Swedish)
tr (Turkish)
ar (Arabic)
ro (Romanian)
sr-Cyrl (SerbianCyrillic)
sr-Latn (SerbianLatin)
sk (Slovak)

Appendix 2: JSON Response Example

{
    "language": "en",
    "textAngle": 0.0,
    "orientation": "Up",
    "regions": [
        {
            "boundingBox": "147,96,622,1095",
            "lines": [
                {
                    "boundingBox": "408,96,102,56",
                    "words": [
                        {
                            "boundingBox": "408,96,102,56",
                            "text": "Hey"
                        }
                    ]
                },
                {
                    "boundingBox": "282,171,350,45",
                    "words": [
                        {
                            "boundingBox": "282,171,164,45",
                            "text": "Diddle"
                        },
                        {
                            "boundingBox": "468,171,164,45",
                            "text": "Diddle"
                        }
                    ]
                },
                {
                    "boundingBox": "239,336,441,46",
                    "words": [
                        {
                            "boundingBox": "239,336,87,46",
                            "text": "Hey"
                        },
                        {
                            "boundingBox": "359,337,144,35",
                            "text": "diddle"
                        },
                        {
                            "boundingBox": "536,337,144,35",
                            "text": "diddle"
                        }
                    ]
                },
                {
                    "boundingBox": "169,394,576,35",
                    "words": [
                        {
                            "boundingBox": "169,394,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "279,402,73,27",
                            "text": "cat"
                        },
                        {
                            "boundingBox": "383,394,83,35",
                            "text": "and"
                        },
                        {
                            "boundingBox": "500,394,70,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "604,394,141,35",
                            "text": "fiddle"
                        }
                    ]
                },
                {
                    "boundingBox": "260,452,391,50",
                    "words": [
                        {
                            "boundingBox": "260,452,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "370,467,80,20",
                            "text": "cow"
                        },
                        {
                            "boundingBox": "473,452,178,50",
                            "text": "jumped"
                        }
                    ]
                },
                {
                    "boundingBox": "277,509,363,35",
                    "words": [
                        {
                            "boundingBox": "277,524,100,20",
                            "text": "over"
                        },
                        {
                            "boundingBox": "405,509,71,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "509,524,131,20",
                            "text": "moon."
                        }
                    ]
                },
                {
                    "boundingBox": "180,566,551,49",
                    "words": [
                        {
                            "boundingBox": "180,566,79,35",
                            "text": "The"
                        },
                        {
                            "boundingBox": "292,566,103,35",
                            "text": "little"
                        },
                        {
                            "boundingBox": "427,566,82,49",
                            "text": "dog"
                        },
                        {
                            "boundingBox": "546,566,185,49",
                            "text": "laughed"
                        }
                    ]
                },
                {
                    "boundingBox": "212,623,493,51",
                    "words": [
                        {
                            "boundingBox": "212,631,42,27",
                            "text": "to"
                        },
                        {
                            "boundingBox": "286,638,72,20",
                            "text": "see"
                        },
                        {
                            "boundingBox": "390,623,96,35",
                            "text": "such"
                        },
                        {
                            "boundingBox": "519,638,20,20",
                            "text": "a"
                        },
                        {
                            "boundingBox": "574,631,131,43",
                            "text": "sport."
                        }
                    ]
                },
                {
                    "boundingBox": "301,681,312,35",
                    "words": [
                        {
                            "boundingBox": "301,681,90,35",
                            "text": "And"
                        },
                        {
                            "boundingBox": "425,681,70,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "528,681,85,35",
                            "text": "dish"
                        }
                    ]
                },
                {
                    "boundingBox": "147,738,622,50",
                    "words": [
                        {
                            "boundingBox": "147,753,73,20",
                            "text": "ran"
                        },
                        {
                            "boundingBox": "255,753,114,30",
                            "text": "away"
                        },
                        {
                            "boundingBox": "401,738,86,35",
                            "text": "with"
                        },
                        {
                            "boundingBox": "519,738,71,35",
                            "text": "the"
                        },
                        {
                            "boundingBox": "622,753,147,35",
                            "text": "spoon."
                        }
                    ]
                },
                {
                    "boundingBox": "195,1179,364,12",
                    "words": [
                        {
                            "boundingBox": "195,1179,45,12",
                            "text": "Nursery"
                        },
                        {
                            "boundingBox": "242,1179,38,12",
                            "text": "Rhyme"
                        },
                        {
                            "boundingBox": "283,1179,36,9",
                            "text": "Charts"
                        },
                        {
                            "boundingBox": "322,1179,28,12",
                            "text": "from"
                        },
                        {
                            "boundingBox": "517,1179,11,10",
                            "text": "C"
                        },
                        {
                            "boundingBox": "531,1179,28,9",
                            "text": "2017"
                        }
                    ]
                },
                {
                    "boundingBox": "631,1179,90,12",
                    "words": [
                        {
                            "boundingBox": "631,1179,9,9",
                            "text": "P"
                        },
                        {
                            "boundingBox": "644,1182,6,6",
                            "text": "a"
                        },
                        {
                            "boundingBox": "655,1182,7,9",
                            "text": "g"
                        },
                        {
                            "boundingBox": "667,1182,7,6",
                            "text": "e"
                        },
                        {
                            "boundingBox": "690,1179,31,12",
                            "text": "7144"
                        }
                    ]
                }
            ]
        }
    ]
}