# Tuesday, 28 February 2017

Last week, Ed Charbeneau interviewed me for his Eat Sleep Dev podcast. The topic was Cognitive Services – a technology I’m passionate about.

You can listen to that interview below.

Tuesday, 28 February 2017 15:43:00 (GMT Standard Time, UTC+00:00)
# Monday, 16 January 2017
# Tuesday, 11 October 2016

Microsoft Cognitive Services provides a number of APIs to take advantage of Machine Learning. One of the simplest APIs to use is Sentiment Analysis.

Sentiment Analysis examines one or more text entries and determines whether each text reflects a positive or negative sentiment. It returns a number between 0 and 1: A higher number indicates a more positive sentiment, while a lower number indicates a more negative sentiment.

To use this service, POST a JSON message to the following URL: https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment

Unlike some web Cognitive Service URLs, this one takes no querystring parameters.

In the HTTP header, pass the following information: Content-Type and the Ocp-Apim-Subscription-Key.

The API is a simple REST web service located at https://api.projectoxford.ai/emotion/v1.0/recognize. POST to this service with a header that includes:
Ocp-Apim-Subscription-Key:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

where xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx is your key.

In the Content-Type, pass "application/json".

For the Ocp-Apim-Subscription-Key, include the the Text Analytics key. You can find your key at https://www.projectoxford.ai/Subscription?popup=True

In the body, pass a JSON object that contains an array of documents. Each document contains 3 properties:

language - the Language of the text you want to analyze. Valid values are "English", "Spanish", "French", and "Portuguese".

id - A string that uniquely identifies this document. Used to match the return value to the corresponding text.

text - the text to analyze

Below is a sample JSON body:

{
"documents": [
{
"language": "English",
"id": "text01",
"text": "This is a great day."
}
]
}

After you POST this to the URL, you should expect a response that includes JSON. If all goes well, you will receive an HTTP 200 response and the returned JSON will include an array of documents (the same number that you passed in the Request body). Each Response document will contain

id - matching the id of the document in the Request document.

score - A value between 0 and 1. The higher the score, the more positive the sentiment of the text; The lower the score, the more negative the text sentiment.

You may also receive an array of errors. Each error contains the following properties:

id - matching the id of the document in the Request document.

message - a detailed error message.

Below is an sample response JSON body

{
"documents": [
{
"score": 0.95412,
"id": "text01"
}
]
}

Here is a bit of code to call this API from JavaScript. I am using jQuery's Ajax method and displaying output in a div, like the following:

<div id="OutputDiv"></div> 

var subscriptionKey = "566375db01ad43dc8f62dcc8dc3e5c1f";
var textToAnalyze = "Life is beautiful";

var webSvcUrl = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment";

var outputDiv = $("#OutputDiv");
outputDiv.text("Thinking...");

$.ajax({
type: "POST",
url: webSvcUrl,
headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
contentType: "application/json",
data: '{"documents": [ { "language": "en", "id": "text01", "text": "'+ textToAnalyze + '" }]}'
}).done(function (data) {
if (data.errors.length > 0) {
outputDiv.html("Error: " + data.errors[0]);
}
else if (data.documents.length > 0) {
var score = data.documents[0].score;
if (score > 0.5){
outputText = "That is a Positive thing to say!";
}
else{
outputText = "That is a Negative thing to say!";
}
outputDiv.html(outputText);
}
else {
outputDiv.text("No text to analyze.");
}

}).fail(function (err) {
$("#OutputDiv").text("ERROR! " + err.responseText);
});

Tuesday, 11 October 2016 06:48:00 (GMT Daylight Time, UTC+01:00)
# Friday, 23 September 2016

Last month, I delivered a presentation on Cognitive Services at That Conference in Wisconsin Dells, WI. Carl Schweitzer of MS Dev Show interviewed me to discuss the features of these APIs. My interview starts at the 7:58 mark of the video below.

Friday, 23 September 2016 05:44:26 (GMT Daylight Time, UTC+01:00)
# Tuesday, 19 July 2016

Recently, I was interviewed by Matthew Groves for his Cross Cutting Concerns podcast. We talked about Machine Learning in general and Microsoft Cognitive Services in particular. You can listen to the interview here or in the embedded link below:

Or better yet, subscribe to this podcast!

Tuesday, 19 July 2016 07:31:00 (GMT Daylight Time, UTC+01:00)
# Sunday, 15 May 2016

In an earlier article, I described how to call the Cognitive Services API from JavaScript. The key parts of the code (using the jQuery library) is shown below:

var subscriptionKey = "cd529ca0a97f48b8a1f3bc36ecd73600";
var imageUrl = $("#imageUrlTextbox").val();
var webSvcUrl = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair,headPose,glasses";
$.ajax({
    type: "POST",
    url: webSvcUrl,
    headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
    contentType: "application/json",
    data: '{ "Url": "' + imageUrl + '" }'
}).done(function (data) {
        // ... 
        // Code to run when web service returns data...
        // ...
        }); 

Do you spot the problem with this code?

Although it works very well from a functional point of view, it has a security flaw” As with all Cognitive Services APIs, the Face API requires you to pass a key in the header of your HTTP request. Because there is no easy way to hide this key, it is easy for a hacker to steal your key. Using your key, they can make calls and charge them to your account - either using up your quota of free calls or (worse) charging your credit card for their calls.

One way to hide the key is to make the call to Cognitive Services from a custom web service. This web service can be a simple pass-through, but it allows you to hide the key on the server, where it is much more difficult for a hacker to find it.

We can use Web API to create this custom web service. With Web API, is a simple matter to store the Subscription Key in a secure configuration file and retrieve it at run time. The relevant code goes in an ApiController class as shown below.

// POST: api/Face
public IEnumerable<Face> Post([FromBody]string imageUrl)
{
    return GetFaceData(imageUrl);
} 
 
public static IEnumerable<Face> GetFaceData(string imageUrl)
{
    var webSvcUrl = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair,headPose,glasses";
    string subscriptionKey = ConfigurationManager.AppSettings["SubscriptionKey"];
    if (subscriptionKey == null)
    {
        throw new ConfigurationErrorsException("Web Service is missing Subscription Key");
    }
    WebRequest Request = WebRequest.Create(webSvcUrl);
    Request.Method = "POST";
    Request.ContentType = "application / json";
    Request.Headers.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
    using (var streamWriter = new StreamWriter(Request.GetRequestStream()))
    {
        string json = "{"
        + "\"url\": \"" + imageUrl + "\""
        + "}"; 
 
        streamWriter.Write(json);
        streamWriter.Flush();
        streamWriter.Close(); 
 
        var httpResponse = (HttpWebResponse)Request.GetResponse(); 
 
        string result = "";
        using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
        {
            result = streamReader.ReadToEnd();
        }
        httpResponse.Close(); 
 
        List<Face> faces = JsonConvert.DeserializeObject<List<Face>>(result);
        return faces;
    }
} 

Of course, we could also use the .NET library to abstract the call to the Face API, but I wanted to keep this code as close to the original JavaScript code as possible.  This method returns a List of Face objects. The Face object maps to the JSON data returned by the Face API. You can see this class below.

public class Face
{
    public string FaceId { get; set; }
    public FaceRectangle FaceRectangle { get; set; }
    public FaceLandmarks FaceLandmarks { get; set; }
    public FaceAttributes FaceAttributes { get; set; }
}

Now, we can modify our JavaScript code to call our custom API, instead of calling Face API directly, and we can eliminate the need to pass the key from the client-side JavaScript, as shown below:

var imageUrl = $("#imageUrlTextbox").val();
webSvcUrl = "api/face";
$.ajax({
    type: "POST",
    url: webSvcUrl,
    data: JSON.stringify(imageUrl),
    contentType: "application/json;charset=utf-8"
}).done(function (data) { 
// ... 
// Code to run when web service returns data...
// ...
}); 

In this listing, we have removed the Subscription-Key from the Header and we have changed the Web Service URL. This is JavaScript is calling a web service in its own domain, so it is trusted. The default Cross-Site scripting settings of most web servers will keep JavaScript outside this domain from accessing our custom web service.

This pattern works with any of the Cognitive Services APIs. In fact, it works with any API that requires the client to pass a secure key. We can abstract away the direct call to that API and hide the key on the server, making our code more secure.

Code

  • You can find the full code of the sample described above here.
  • You can find the code for the “unprotected” sample here.
Sunday, 15 May 2016 00:50:49 (GMT Daylight Time, UTC+01:00)
# Friday, 13 May 2016

Microsoft Cognitive Services (MCS) allows you to tap into the power of Machine Learning and perform sophisticated analysis of photographs, simply by calling a web service.

The Face API in MCS returns an array of all the faces found in a photo, along with information about each face, such as the location of the eyes, nose, and mouth; the age and gender of the person, and information about eyeglasses and facial hair.

You can sign up to use MCS for free at https://www.microsoft.com/cognitive-services/. Information specific to the Face API can be found at https://www.microsoft.com/cognitive-services/en-us/face-api. (fig 1)

FaceAPI02-SubscriptionKey 
Fig. 1

To use the Face API, click the [Get Started for Free] button. You will see a list of subscription keys. Scroll down to "Face" section and click "Copy" next to one of the Face Subscription keys to save it to your clipboard or click "Show" to reveal the key.

FaceAPI01-APIpage 
Fig. 2

To call the Face API, send an HTTP POST request to https://api.projectoxford.ai/face/v1.0/detect

You may add optional querystring parameters to the above URL:

returnFaceID: If set to "true", the web service will return a GUID representing the face, so that you can make repeated inquiries about this face.

returnFaceLandmarks: If set to "true", the web service will return a "faceLandmarks" object containing a list of points identifying where location of the edges of the eyes, eyebrows, nose, and mouth.

returnFaceAttributes: A comma-delimited list of face attributes the web service should return. Allowable attributes are
age: an age number in years.
gender, smile, facialHair, headPose, and glasses.

The service will always return a rectangle identifying the outline of the face. Adding more properties to return will, of course, slow down both the computation and the download of the data.

You must pass the subscription key (described above) in the header of your HTTP request as in the following example:

Ocp-Apim-Subscription-Key:52b24a988a179f13a25aac4713aec800 

The photo itself will be in the body of the POST request. In the content-type header parameter, you can specify how you plan to send the photo to the server. If you plan to send a link to the URL of a photo, set the content-type to "application/json"; if you plan to send the photo as binary data, set the content-type to "application/octet-stream".

The body of the request contains the image. If you selected "application/json" as the content type, send the URL in the following JSON format:

{ "Url": "http://davidgiard.com/themes/Giard/images/logo.png"}

If successful, the web service will return (formatted as JSON) an array of "face" objects - one for each face detected in the photo. Each object will contain the top, left, height, and width values to define a rectangle outlining that face. If you declared that you want more data (e.g., FaceID, Face Landmanrks, and Face Attributes), that data will also be returned in each face object.

Below is an example of JavaScript / jQuery code to call this API.

var subscriptionKey = "Copy your Subscription key here"; 
 
var imageUrl = "http://davidgiard.com/themes/Giard/images/logo.png";
 
var webSvcUrl = "https://api.projectoxford.ai/face/v1.0/detect?returnFaceId=true&returnFaceLandmarks=true&returnFaceAttributes=age,gender,smile,facialHair,headPose,glasses";
 
var outputDiv = $("#OutputDiv");
outputDiv.text("Thinking..."); 
 
$.ajax({
    type: "POST",
    url: webSvcUrl,
    headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
    contentType: "application/json",
    data: '{ "Url": "' + imageUrl + '" }'
}).done(function (data) { 
 
    if (data.length > 0) {
        var firstFace = data[0];
        var faceId = firstFace.faceId;
        var faceRectange = firstFace.faceRectangle;
        var faceWidth = faceRectange.width;
        var faceHeight = faceRectange.height;
        var faceLeft = faceRectange.left;
        var faceTop = faceRectange.top; 
 
        var faceLandmarks = firstFace.faceLandmarks;
        var faceAttributes = firstFace.faceAttributes; 
 
        var leftPupil = faceLandmarks.pupilLeft;
        var rightPupil = faceLandmarks.pupilRight;
        var mouth = faceLandmarks.mouthLeft;
        var nose = faceLandmarks.noseLeftAlarOutTip;
        var mouthTop = faceLandmarks.upperLipTop;
        var mouthBottom = faceLandmarks.underLipBottom;
        leftEyeWidth = faceLandmarks.eyebrowLeftInner.x - faceLandmarks.eyebrowLeftOuter.x;
        rightEyeWidth = faceLandmarks.eyebrowRightOuter.x - faceLandmarks.eyebrowRightInner.x;
        mouthWidth = faceLandmarks.mouthRight.x - faceLandmarks.mouthLeft.x; 
 
        var mouthLeft = faceLandmarks.mouthLeft;
        var mouthRight = faceLandmarks.mouthRight;
        var mouthTop = faceLandmarks.upperLipTop;
        var mouthBottom = faceLandmarks.underLipBottom; 
 
        var outputText = "";
        outputText += "Face ID: " + faceId + "<br>";
        outputText += "Top: " + faceTop + "<br>";
        outputText += "Left: " + faceLeft + "<br>";
        outputText += "Width: " + faceWidth + "<br>";
        outputText += "Height: " + faceHeight + "<br>";
        outputText += "Right Pupil: " + rightPupil.x + ", " + rightPupil.y + "<br>";
        outputText += "Left Pupil: " + leftPupil.x + ", " + leftPupil.y + "<br>";
        outputText += "Mouth: <br>";
        outputText += " -Left: " + mouthLeft.x + ", " + mouthLeft.y + "<br>";
        outputText += " -Right: " + mouthRight.x + ", " + mouthRight.y + "<br>";
        outputText += " -Top: " + mouthTop.x + ", " + mouthTop.y + "<br>";
        outputText += " -Bottom: " + mouthBottom.x + ", " + mouthBottom.y + "<br>";
        outputText += "Attributes:" + "<br>";
        outputText += "age: " + faceAttributes.age + "<br>";
        outputText += "gender: " + faceAttributes.gender + "<br>";
        outputText += "smile: " + (faceAttributes.smile || "n/a") + "<br>";
        outputText += "glasses: " + faceAttributes.glasses + "<br>";
        outputDiv.html(outputText); 
 
    }
    else {
        outputDiv.text("No faces detected."); 
 
    } 
 
}).fail(function (err) {
    $("#OutputDiv").text("ERROR!" + err.responseText);
}); 
 
service call is performed by the following line: 
 
$.ajax({
    type: "POST",
    url: webSvcUrl,
    headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
    contentType: "application/json",
    data: '{ "Url": "' + imageUrl + '" }' 
 

This request is Asynchronous, so the "done" function is called when it returns successfully.

}).done(function (data) { 

The function tied to the "done" event parses through the returned JSON and displays it on the screen.

If an error occurs, we output a simple error message to the user in the "fail" function.

}).fail(function (err) {
          $("#OutputDiv").text("ERROR!" + err.responseText);
  }); 

The rest of the code above simply grabs the first face in the JSON array and drills down into properties of that face, displaying those properties in a DIV on the page.

For example, in the attached site.css stylesheet, I’ve defined 2 classes - .Rectangle and .FaceLabel - that initially hide objects on the page via the display: none style. These classes also set the position to “absolute” allowing us to position them exactly where we want within a container. The z-order is incresed, so that these items will appear on top of the face in the picture.  The relevant CSS is shown below:

#Rectangle
{
    opacity: 0.3;
    z-index: 10;
    position: absolute;
    display: none;
  }
 
.FaceLabel{
    position: absolute;
    z-index: 20;
    display: none;
    font-size: 8px;
    padding: 0px;
    margin: 0px;
    background-color: white;
    color: black;
    padding: 1px;
  }

Our page contains object with these classes to identify parts of the face identified. Initially, they will be invisible until we determine where to place them via the information returned by the Face API.

<div id="PhotoDiv">
    <img id="ImageToAnalyze" src="images/CartoonFace.png">
    <div class="FaceLabel" id="LeftEyeDiv">LEFT</div>
    <div class="FaceLabel" id="RightEyeDiv">RIGHT</div>
    <div class="FaceLabel" id="NoseDiv">NOSE</div>
    <div class="FaceLabel" id="MouthDiv">MOUTH</div>
    <img src="images/Rectangle.png" id="Rectangle">
</div>

When the call to the Face API web service returns successfully, we drill down into the returned JSON to find out the outline of the face and the location of the eyes, nose, and mouth. Then, we make these objects visible (set the display style to “block”)  and place them above the corresponding facial feature (set the “top” and “left” styles). In the case of the Rectangle image, we also resize it to cover the face detected. The rectangle’s “opacity” style is 0.3, making it translucent enough to see the face behind it. Here is the JavaScript to accomplish this:

$("#Rectangle").css("top", faceTop);
$("#Rectangle").css("left", faceLeft);
$("#Rectangle").css("height", faceHeight);
$("#Rectangle").css("width", faceHeight);
$("#Rectangle").css("display", "block");
$("#LeftEyeDiv").css("top", leftPupil.y);
$("#LeftEyeDiv").css("left", leftPupil.x);
$("#LeftEyeDiv").css("display", "block");
$("#RightEyeDiv").css("top", rightPupil.y);
$("#RightEyeDiv").css("left", rightPupil.x);
$("#RightEyeDiv").css("display", "block");
$("#NoseDiv").css("top", nose.y);
$("#NoseDiv").css("left", noseHorizontalCenter);
$("#NoseDiv").css("display", "block");
$("#MouthDiv").css("top", mouthVerticalCenter);
$("#MouthDiv").css("left", mouthTop.x);
$("#MouthDiv").css("display", "block");

Below is the output of the web page analyzing a photo of my face:

FaceApi03-WebPage

As you can see, calling the Cognitive Services Face API, is a simple matter of making a call to a web service and reading the JSON data returned by that service.

You can find this code in my GitHub repository.

Friday, 13 May 2016 10:11:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 12 May 2016

Microsoft Cognitive Services is a set of APIs built on Machine Learning and exposed as REST Web Services. The Speech API offers a way to listen to speech and convert it into text.

GCast 16: Cognitive Services - Speech to Text

Thursday, 12 May 2016 13:10:38 (GMT Daylight Time, UTC+01:00)
# Thursday, 05 May 2016

Microsoft Cognitive Services is a set of APIs based on Machine Learning and exposed as REST web services. The Emotion API analyzes pictures of faces and determines the emotion shown in each face.

GCast 15: Cognitive Services - Emotion API

Thursday, 05 May 2016 15:35:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 28 April 2016

Microsoft Cognitive Services uses Machine Learning to recognize text within pictures and exposes this functionality via a RESTful Web Service.

GCast 14: Cognitive Services - Thumbnail Generation

Thursday, 28 April 2016 12:12:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 21 April 2016

Microsoft Cognitive Services uses Machine Learning to recognize text within pictures and exposes this functionality via a RESTful Web Service.

Cognitive Services - Optical Character Recognition

Thursday, 21 April 2016 11:52:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 14 April 2016

Learn how to use the Microsoft Cognitive Services Face API to recognize faces and the properties of faces within image. This API is based on Machine Learning.

Cognitive Services - Face API

Thursday, 14 April 2016 21:26:30 (GMT Daylight Time, UTC+01:00)
# Thursday, 07 April 2016

Microsoft Cognitive Services (formerly Project Oxford) provides a set of APIs for analyzing images, video, speech, and language. Learn how to get started and start developing against these APIs..

Thursday, 07 April 2016 12:48:00 (GMT Daylight Time, UTC+01:00)
# Saturday, 02 April 2016

At the beginning of this semester, Microsoft hired a new Student Partner at Indiana University. A Microsoft Student Partner (MSP) is a college student responsible for helping promote the Microsoft platform on campus. Part of that responsibility involves hosting technical events.

This new MSP invited me to campus to co-present with him. He reserved a room, ordered pizza, invited students, and brought some giveaways. We decided to present on Project Oxford - a set of web services that use machine learning to analyze images, videos, and voice. A day before the presentation, Project Oxford was renamed to Microsoft Cognitive Services, so I had to rush to update my slides and re-test all my demos.

The event was a success. Students studying computer science and related fields attended, students who were curious about the technology attended, and one Informatics professor attended.

For me, it was a rewarding experience - in part because it was a chance to connect with students and to share a cool technology that Microsoft is offering;

But more importantly, I was excited to work with the new Indiana University MSP - Tim Giard.

Tim is my son and a junior majoring in Informatics at IU. This was our first chance to work together professionally and it was one of the highlights of my year.

V__7763

Saturday, 02 April 2016 14:59:41 (GMT Daylight Time, UTC+01:00)
# Saturday, 19 March 2016

Project Oxford offers a set of APIs to analyze the content of images. One of these APIs is a REST web service that can determine the words and punctuation contained in a picture. This is accomplished by a simple REST web service call.

To begin, you must register with Project Oxford at http://www.projectoxford.ai.

Then, get the key at https://www.projectoxford.ai/Subscription

Thu04-ShowKey
Figure 1: Subscription key

To call the API we send a POST request to https://api.projectoxford.ai/vision/v1/ocr

If you like, you may add optional querystring parameters to the URL, language and detectionOrientation to have the service determine automatically whether the text is tilted. If you omit those parameters, Oxford will make an effort to determine these values on its own. As you might guess, it is faster if you provide this information to Oxford.

In the header of the request, you must provide your key as in the following example:

Ocp-Apim-Subscription-Key:15e24a988a179f13a25aac4713aec800

Optionally, you can provide the content-type of the data you are sending. To send a URL, use

Content-Type: application/json

To send an image stream, you can set the Coneplication/octet-stream orten-Type to application/octet-stream or multipart/form-data.

In the body of POST request, you can send JSON that includes the URL of the image location. Here is an example:

{ "Url": "http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png"}

This web service returns a JSON object containing an array of regions, each of which representing a block of text found in the image. Within each region is an array of lines and within each line is an array of words.

Region, line, and word objects contain a boundingBox object with coordinates of where to find the corresponding object within the image. Each word object contains the actual text detected, including any punctuation.

The beauty of a REST web service is that you can call it from any language or platform that supports HTTP requests (which is pretty much all of them).

The following example uses JavaScript and jQuery to call this API. It assumes that you have a DIV tag on the page with id="OutputDiv" and that you have a reference to jQuery before this code.

var myKey="<replace_with_your_subscription_key>";
var url="http://media.tumblr.com/tumblr_lrbhs0RY2o1qaaiuh.png";
$.ajax({
    type: "POST",
    url: "https://api.projectoxford.ai/vision/v1/ocr?language=en",
    headers: { "Ocp-Apim-Subscription-Key":myKey },
    contentType: "application/json",
    data: '{ "Url": "' + url + '" }'
}).done(function (data) {
    var outputDiv = $("#OutputDiv");
    outputDiv.text(""); 
 
    var linesOfText = data.regions[0].lines;
    // Loop through each line of text and create a DIV tag 
    // containg each word, separated by a space
    // Append this newly-created DIV to OutputDiv
    for (var i = 0; i < linesOfText.length; i++) {
        var output = "";
        var thisLine = linesOfText[i];
        var words = thisLine.words;
        for (var j = 0; j < words.length; j++) {
            var thisWord = words[j];
            output += thisWord.text;
            output += " ";
        }
        var newDiv = "<div>" + output + "</div>";
        outputDiv.append(newDiv);
    }
}).fail(function (err) {
    $("#OutputDiv").text("ERROR!" + err.responseText);
}); 

The call to the web service is done with the line

$.ajax({
    type: "POST",
    url: "https://api.projectoxford.ai/vision/v1/ocr?language=en",
    headers: { "Ocp-Apim-Subscription-Key":myKey },
    contentType: "application/json",
    data: '{ "Url": "' + url + '" }' 

which sends a POST request and passes the URL as part of a JSON object in the request Body.

This request is Asynchronous, so the "done" function is called when it returns successfully.

            }).done(function (data) {

The function tied to the "done" event parses through the returned JSON and displays it on the screen.

If an error occurs, we output a simple error message to the user in the "fail" function.

}).fail(function (err) {
    $("#OutputDiv").text("ERROR!" + err.responseText);
}); 

Most of the code above is just formatting the output, so the REST call itself is quite simple. Project Oxford makes this type of analysis much easier for developers, regardless of their platform.

You can find this code at my Github repository.

In this article, you learned about the Project Oxford OCR API and how to call it from a JavaScript application.

Saturday, 19 March 2016 17:22:45 (GMT Standard Time, UTC+00:00)
# Thursday, 17 March 2016

Speech recognition is a problem on which computer scientists have been working for years. Project Oxford applies the science of Machine Learning to this problem in order to recognize words spoken and determine their probable meaning based on context.

Project Oxford exposes a REST web service so that you can add speech recognition to your application.

Before you can use the Speech API, you must register at Project Oxford. and retrieve the Speech API key

SpeechKey
Figure 1: Speech API Key

The easiest way to use this API in a .NET application is to use the SpeechRecognition library. A NuGet package makes it easy to add this library to your application. In Visual Studio 2015, create a new WPF application (File | New | Project | Windows | WPF Application). Then, right-click the project in the Solution Explorer and select Manage NuGet Packages. Search for and add the "Microsoft.ProjectOxford.SpeechRecognition" package. Select the "x64" or "x86" version that corresponds with your version of Windows.

NuGet
Figure 2: NuGet dialog

Now, you can start using the library to call the Speech API.

Add the following using statement to the top of a class file:

using Microsoft.ProjectOxford.SpeechRecognition; 

Within the class, declare a private instance of the MicrophoneRecognitionClient class

MicrophoneRecognitionClient _microphoneRecognitionClient; 

To begin listening to speech, instantiate the MicrophoneRecognitionClient object by using the SpeechRecognitionServiceFactory.CreateMicrophoneClient method and pass and pass in  the Speech Recognition Mode, the language to listen for, and your Speech Subscription Key.

The Speech Recognition Mode is an enum that can be either ShortPhrase or LongDictation. These are optimized for shorter or longer voice messages, respectively. Below is an example of this creating a new MicrophoneRecognitionClient instance:

var speechRecognitionMode = SpeechRecognitionMode.ShortPhrase;
string language = "en-us";
string subscriptionKey = ConfigurationManager.AppSettings["SpeechKey"].ToString(); 
 
_microphoneRecognitionClient
        = SpeechRecognitionServiceFactory.CreateMicrophoneClient
                        (
                        speechRecognitionMode,
                        language,
                        subscriptionKey
                        ); 

Now that you have a MicrophoneRecognitionClient object, wire up the OnPartialResponseReceived and the OnResponseReceived events to listen for speech and call the API to turn that speech into text.

_microphoneRecognitionClient.OnPartialResponseReceived += OnPartialResponseReceivedHandler;
_microphoneRecognitionClient.OnResponseReceived += OnMicShortPhraseResponseReceivedHandler;

The MicrophoneRecognitionClient object calls the web service frequently - often after every word - to interpret what words has heard so far. When it makes this call, its OnPartialResponseReceived event fires. 

The signature of OnPartialResponseReceivedHandler is:

void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)

and you can retrieve Oxford's text interpretation of the spoken words from e.PartialResult. Oxford may revise its interpretation of words spoken at the beginning of a sentence when it receives more of the sentence to provide some context.

After a significant pause, the MicrophoneRecognitionClient object will decide that the user has finished speaking. At this point, it fires the OnResponseReceived event, giving you a chance to clean up. The EndMicAndRecognition method of the MicrophoneRecognitionClient stops listening and severs the connection to the web service.

Here is some code that may be appropriate in the OnResponseReceived event handler:

_microphoneRecognitionClient.EndMicAndRecognition();
_microphoneRecognitionClient.Dispose();
_microphoneRecognitionClient = null; 

I have created a sample WPF app with a single window containing the following XAML:

<StackPanel Name="MainStackPanel" Orientation="Vertical" VerticalAlignment="Top">
    <Button Name="RecordButton" Width="250" Height="100" 
            FontSize="32" VerticalAlignment="Top" 
            Click="RecordButton_Click">
        Start!
    </Button>
    <TextBox Name="OutputTextbox" VerticalAlignment="Top" Width="600" 
        TextWrapping="Wrap" FontSize="18"></TextBox>
</StackPanel> 

The code-behind for this window is listed below. It includes some visual cues that the app is listening and displays the latest text returned from the Speech API.

using System;
using System.Configuration;
using System.Threading;
using System.Windows;
using System.Windows.Media;
using Microsoft.ProjectOxford.SpeechRecognition; 
 
namespace SpeechToTextDemo
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        AutoResetEvent _FinalResponseEvent;
        MicrophoneRecognitionClient _microphoneRecognitionClient; 
 
        public MainWindow()
        {
            InitializeComponent();
            RecordButton.Content = "Start\nRecording";
            _FinalResponseEvent = new AutoResetEvent(false);
            OutputTextbox.Background = Brushes.White;
            OutputTextbox.Foreground = Brushes.Black;
        } 
 
        private void RecordButton_Click(object sender, RoutedEventArgs e)
        {
            RecordButton.Content = "Listening...";
            RecordButton.IsEnabled = false;
            OutputTextbox.Background = Brushes.Green;
            OutputTextbox.Foreground = Brushes.White;
            ConvertTextToSpeech();
        } 
 
        /// <summary>
        /// Start listening. 
        /// </summary>
        private void ConvertTextToSpeech()
        {
            var speechRecognitionMode = SpeechRecognitionMode.ShortPhrase;
            string language = "en-us";
            string subscriptionKey = ConfigurationManager.AppSettings["SpeechKey"].ToString(); 
 
            _microphoneRecognitionClient
                    = SpeechRecognitionServiceFactory.CreateMicrophoneClient
                                    (
                                    speechRecognitionMode,
                                    language,
                                    subscriptionKey
                                    ); 
 
            _microphoneRecognitionClient.OnPartialResponseReceived += OnPartialResponseReceivedHandler;
            _microphoneRecognitionClient.OnResponseReceived += OnMicShortPhraseResponseReceivedHandler;
            _microphoneRecognitionClient.StartMicAndRecognition(); 
 
        } 
 
        void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)
        {
            string result = e.PartialResult;
            Dispatcher.Invoke(() =>
            {
                OutputTextbox.Text = (e.PartialResult);
                OutputTextbox.Text += ("\n"); 
 
            });
        } 
 
        /// <summary>
        /// Speaker has finished speaking. Sever connection to server, stop listening, and clean up
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        void OnMicShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
        {
            Dispatcher.Invoke((Action)(() =>
            {
                _FinalResponseEvent.Set();
                _microphoneRecognitionClient.EndMicAndRecognition();
                _microphoneRecognitionClient.Dispose();
                _microphoneRecognitionClient = null;
                RecordButton.Content = "Start\nRecording";
                RecordButton.IsEnabled = true;
                OutputTextbox.Background = Brushes.White;
                OutputTextbox.Foreground = Brushes.Black; 
 
            }));
        }
    }
}

You can download this project from my GitHub repository.

In this article, you learned how to use the Project Oxford Speech Recognition .NET library to take advantage of the Oxford Speech API and add text-to-speech capabilities to your application.

Thursday, 17 March 2016 12:26:00 (GMT Standard Time, UTC+00:00)
# Wednesday, 16 March 2016

In the last article, we showed how to call the Project Oxford Emotions API via REST in order to determine the emotions of every person in a picture.

In this article, I will show you how to use a .NET library to call this API. A .NET library simplifies the process by abstracting away HTTP calls and providing strongly-typed objects with which to work in your .NET code.

As with the REST call, we begin by signing up for Project Oxford and getting the key for this API, which you can do at https://www.projectoxford.ai/Subscription?popup=True.

Em01-GetKey
Figure 1: Key

To use the .NET library, launch Visual Studio and create a new Universal Windows App (File | New | Project | Windows | Blank (Universal Windows))

Add the Emotions NuGet Package to your project (Right-click project | Manage NuGet Packages); then search for and install Microsoft.ProjectOxford.Emotion. This will add the appropriate references to your project.

In your code, add the following statement to top of your class file.

using Microsoft.ProjectOxford.Emotion;
using Microsoft.ProjectOxford.Emotion.Contract; 

To use this library, we create an instance of the EmotionServiceClient class, passing in our key to the constructor.

var emotionServiceClient = new EmotionServiceClient(emotionApiKey);

The RecognizeAsync method of this class accepts the URL of an image and returns an array of Emotion objects.

Emotion[] emotionResult = await emotionServiceClient.RecognizeAsync(imageUrl); 

Each emotion object represents a single face detected in the picture and contains the following properties:

FaceRectangle: This indicates the location of the face

Scores: A set of values corresponding to each emotion (anger, content, disgust, fear, happiness, neutral, sadness, and surprise) with a value indicating the confidence with which Oxford thinks the face matches this emotion. Confidence values are between 0 and 1 and higher values indicate a higher confidence that this is the correct emotion.

The code below returns a string indicating the most likely emotion for every face in an image.

var sb = new StringBuilder();
var faceNumber = 0;
foreach (Emotion em in emotionResult)
{
    faceNumber++;
    var scores = em.Scores;
    var anger = scores.Anger;
    var contempt = scores.Contempt;
    var disgust = scores.Disgust;
    var fear = scores.Fear;
    var happiness = scores.Happiness;
    var neutral = scores.Neutral;
    var surprise = scores.Surprise;
    var sadness = scores.Sadness; 
 
    var emotionScoresList = new List<EmotionScore>();
    emotionScoresList.Add(new EmotionScore("anger", anger));
    emotionScoresList.Add(new EmotionScore("contempt", contempt));
    emotionScoresList.Add(new EmotionScore("disgust", disgust));
    emotionScoresList.Add(new EmotionScore("fear", fear));
    emotionScoresList.Add(new EmotionScore("happiness", happiness));
    emotionScoresList.Add(new EmotionScore("neutral", neutral));
    emotionScoresList.Add(new EmotionScore("surprise", surprise));
    emotionScoresList.Add(new EmotionScore("sadness", sadness)); 
 
    var maxEmotionScore = emotionScoresList.Max(e => e.EmotionValue);
    var likelyEmotion = emotionScoresList.First(e => e.EmotionValue == maxEmotionScore); 
 
    string likelyEmotionText = string.Format("Face {0} is {1:N2}% likely to experiencing: {2}\n\n", 
        faceNumber, likelyEmotion.EmotionValue * 100, likelyEmotion.EmotionName.ToUpper());
    sb.Append(likelyEmotionText); 
 
}
var resultsText = sb.ToString(); 
 

This will return a string similar to the following:

Face 1 is 99.36% likely to experiencing: NEUTRAL

Face 2 is 100.00% likely to experiencing: HAPPINESS

Face 3 is 95.02% likely to experiencing: SADNESS

You can download this Visual Studio 2015 Universal Windows App project from here.

Full documentation on the Emotion library is available here. You can find a more complete (although more complicated) demo of this library here.

In this article, you learned how to use the .NET libraries to call the Project Oxford Emotion API and detect emotion in the faces of an image.

Wednesday, 16 March 2016 13:11:00 (GMT Standard Time, UTC+00:00)
# Tuesday, 15 March 2016

It's difficult enough for humans to recognize emotions in the faces of other humans. Can a computer accomplish this task? It can if we train it to and if we give it enough examples of different faces with different emotions.

When we supply data to a computer with the objective of training that computer to recognize patterns and predict new data, we call that Machine Learning. And Microsoft has done a lot of Machine Learning with a lot of faces and a lot of data and they are exposing the results for you to use.

The Emotions API in Project Oxford looks at pictures of people and determines their emotions. Possible emotions returned are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. Each emotion is assigned a confidence level between 0 and 1 - higher numbers indicate a higher confidence that this is the emotion expressed in the face. If a picture contains multiple faces, the emotion of each face is returned.

The API is a simple REST web service located at https://api.projectoxford.ai/emotion/v1.0/recognize. POST to this service with a header that includes:
Ocp-Apim-Subscription-Key:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

where xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx is your key. You can find your key at https://www.projectoxford.ai/Subscription?popup=True

and a body that includes the following data:

{ "url": "http://xxxx.com/xxxx.jpg" }

where http://xxxx.com/xxxx.jpg is the URL of an image.
The full request looks something like:
POST https://api.projectoxford.ai/emotion/v1.0/recognize HTTP/1.1
Content-Type: application/json
Host: api.projectoxford.ai
Content-Length: 62
Ocp-Apim-Subscription-Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

{ "url": "http://xxxx.com/xxxx.jpg" }

This will return JSON data identifying each face in the image and a score indicating how confident this API is that the face is expressing each of 8 possible emotions. For example, passing a URL with a picture below of 3 attractive, smiling people

SpartaHack-068-X2[1] 
(found online at https://giard.smugmug.com/Tech-Community/SpartaHack-2016/i-4FPV9bf/0/X2/SpartaHack-068-X2.jpg)

returned the following data:

[
  {
    "faceRectangle": {
      "height": 113,
      "left": 285,
      "top": 156,
      "width": 113
    },
    "scores": {
      "anger": 1.97831262E-09,
      "contempt": 9.096525E-05,
      "disgust": 3.86221245E-07,
      "fear": 4.26409547E-10,
      "happiness": 0.998336,
      "neutral": 0.00156954059,
      "sadness": 8.370223E-09,
      "surprise": 3.06117772E-06
    }
  },
  {
    "faceRectangle": {
      "height": 108,
      "left": 831,
      "top": 169,
      "width": 108
    },
    "scores": {
      "anger": 2.63808062E-07,
      "contempt": 5.387114E-08,
      "disgust": 1.3360991E-06,
      "fear": 1.407629E-10,
      "happiness": 0.9999967,
      "neutral": 1.63170478E-06,
      "sadness": 2.52861843E-09,
      "surprise": 1.91028926E-09
    }
  },
  {
    "faceRectangle": {
      "height": 100,
      "left": 591,
      "top": 168,
      "width": 100
    },
    "scores": {
      "anger": 3.24157673E-10,
      "contempt": 4.90155344E-06,
      "disgust": 6.54665473E-06,
      "fear": 1.73284559E-06,
      "happiness": 0.9999156,
      "neutral": 6.42121E-05,
      "sadness": 7.02297257E-06,
      "surprise": 5.53670576E-09
    }
  }
]

A high value for the 3 happiness scores and the very low values for all the other scores suggest a very high degree of confidence that the people in this photo is happy.

Here is the request in the popular HTTP analysis tool Fiddler [http://www.telerik.com/fiddler]:

Request

Em01-Fiddler-Request

Response:

Em02-Fiddler-Response

Sending requests to Project Oxford REST API makes it simple to analyze the emotions of people in a photograph.

Tuesday, 15 March 2016 09:57:07 (GMT Standard Time, UTC+00:00)
# Monday, 14 March 2016

Generating a thumbnail image from a larger image sounds easy – just shrink the dimensions of the original, right? But it becomes more complicated if the thumbnail image is a different shape than the original. In this case, we will need to crop or distort the original image. Distorting the image tends to look very bad; and when we crop an image, we will need to ensure that the primary subject of the image remains in the generated thumbnail. To do this, we need to identify the primary subject of the image. That's easy enough for a human observer to do, but a difficult thing for a computer to do, which is necessary if we want to automate this process.

This is where machine learning can help. By analyzing many images, Machine Learning can figure out what parts of a picture are likely to be the main subject. Once this is known, it becomes a simpler matter to crop the picture in such a way that the main subject remains.

Project Oxford uses Machine Learning so that you don't have to. It exposes an API to create an intelligent thumbnail image from any picture.

You can see this in action at www.projectoxford.ai/demo/vision#Thumbnail.


Thu01-LiveDemo
Figure 1

With this live, in-browser demo, you can either select an image from the gallery and view the generated thumbnails; or provide your own image - either from your local computer or from a public URL. The page uses the Thumbnail API to create thumbnails of 6 different dimensions.

Thu02-LiveDemo-2
Figure 2

For your own application, you can either call the REST Web Service directly or (for a .NET application) use a custom library. The library simplifies development by abstracting away HTTP calls via strongly-typed objects.

To get started, you will need a free Project Oxford account and you will need to sign into projectoxford.ai with a Microsoft account.

For this API, you need a key. From the Computer Vision API page, (Figure 3); click the [Try for free >] button; then, click the "Show" link under the Primary key of the "Computer Vision" section (Figue 4).

Thu03-ComputerVisionApiPage
Figure 3 

Thu04-ShowKey
Figure 4 

To use the SDK, add the Microsoft.ProjectOxford.Video NuGet package to your project: Right-click on your project, select Manage NuGet Packages, search for "ProjectOxford.Video", select the package from the list, and click the [Install] button, as shown in Figure 5

Thu05-NuGet
Figure 5

This adds a reference to Microsoft.ProjectOxford.Vision.dll, which contains classes that make it easier to call this API.

Add the following statement to the top of a class file to use this library.

using Microsoft.ProjectOxford.Vision;

Now, you can use the methods in the VisionServiceClient class to interact with the API.

Create a VisionServiceClient with the following code:

string subscriptionKey = "15e24a988f484591b17bcc4713aec800";
IVisionServiceClient visionClient = new VisionServiceClient(subscriptionKey);

where “xxxxxxxxxxxxxxxxxxxxxxxxxxx” is your subscription key.

Next, use the GetThumbnailAsync method to generate a thumbnail image. The following code creates a 200x100 thumbnail of a photo of a buoy in Stockholm, Sweden.

string originalPicture = @"https://giard.smugmug.com/Travel/Sweden-2015/i-ncF6hXw/0/L/IMG_1560-L.jpg";
int width = 200;
int height = 100;
bool smartCropping = true;
byte[] thumbnailResult = null;
thumbnailResult = visionClient.GetThumbnailAsync(originalPicture, width, height, smartCropping).Result;

The result is an array of bytes, but you can save the corresponding image to a file with the following code:

string folder = @"c:\test";
string thumbnaileFullPath = string.Format("{0}\\thumbnailResult_{1:yyyMMddhhmmss}.jpg", folder, DateTime.Now);
using (BinaryWriter binaryWrite = new BinaryWriter(new FileStream(thumbnaileFullPath, FileMode.Create, FileAccess.Write)))
{
    binaryWrite.Write(thumbnailResult);
}

Below is the full listing in a Console App to generate a thumbnail; then open both the original image and the saved thumbnail image for comparison.

using System;
using System.Diagnostics;
using System.IO;
using Microsoft.ProjectOxford.Vision;
 
namespace ThumbNailConsole
{
    class Program
    {
        static void Main(string[] args)
        {
 
            string subscriptionKey = "15e24a988f484591b17bcc4713aec800";
            IVisionServiceClient visionClient = new VisionServiceClient(subscriptionKey);
 
            string originalPicture = @"https://giard.smugmug.com/Travel/Sweden-2015/i-ncF6hXw/0/L/IMG_1560-L.jpg";
            int width = 200;
            int height = 100;
            bool smartCropping = true;
            byte[] thumbnailResult = null;
            thumbnailResult = visionClient.GetThumbnailAsync(originalPicture, width, height, smartCropping).Result;
 
            string folder = @"c:\test";
            string thumbnaileFullPath = string.Format("{0}\\thumbnailResult_{1:yyyMMddhhmmss}.jpg", folder, DateTime.Now);
            using (BinaryWriter binaryWrite = new BinaryWriter(new FileStream(thumbnaileFullPath, FileMode.Create, FileAccess.Write)))
            {
                binaryWrite.Write(thumbnailResult);
            }
 
            Process.Start(thumbnaileFullPath);
            Process.Start(originalPicture);
 
            Console.WriteLine("Done! Thumbnail is at {0}!", thumbnaileFullPath);
        }
    }
}

The result is shown in Figure 6 below.

Thu06-Output

One thing to note. The Thumbnail API is part of the Computer Vision API. As of this writing, the free version of the Computer Vision API is limited to 5,000 transactions per month. If you want more than that, you will need to upgrade to the Standard version, which charges $1.50 per 1000 transactions.

But this should be plenty for you to learn this API for free and build and test your applications until you need to put them into production.

The code above can be found on GitHub.

Monday, 14 March 2016 04:01:00 (GMT Standard Time, UTC+00:00)
# Sunday, 13 March 2016

Project Oxford is a set of APIs that take advantage of Machine Learning to provide developers with

These technologies require Machine Learning, which requires a lot of computing power and a lot of data. Most of us have neither, but Microsoft does and has used it to create the APIs in Project Oxford.

Project Oxford provides APIs to analyze pictures and voice and provide intelligent information about them.

There are three broad categories of services: Vision, Voice, and Language.

The Vision APIs analyzes pictures and recognizes objects in those pictures.  For example, several Vision APIs are capable of recognizing  faces in an image. One analyzes each face and deduces that person's emotion; another can compare 2 pictures and decide whether or not 2 photographs are the same person; a third guesses the age of each person in a photo.

The Speech APIs can convert speech to text or text to speech. It can also recognize the voice of a given speaker (if you want to use that for authentication in your app, for example) and infer the intent of the speaker from his words and tone.

The Language APIs seem more of a grab bag to me. A spell checker is smart enough to recognize common proper names and homonyms.

All these APIs are currently in Preview but I've played with them and they appear very solid. Many of theme even provide a confidence factor to let you know how confident you should be in the value returned. For example, 2 faces may represent the same person but it helps to know how closely they match.

You can use these APIs. To get started, you need a Project Oxford account, but you can get one for free at projectoxford.ai.

Each API offers a free option that restricts the number and/or frequency of calls, but you can break through that boundary for a charge.

You can also find documentation, sample code, and even a place to try out each API live in your browser at projectoxford.ai.

You call each one by passing and receiving JSON to a RESTful web service, but some of them offer an SDK to make it easier to make that call from a .NET application.

You can see a couple of fun applications of Project Oxford at how-old.net (which guesses the ages of people in photographs) and what-dog.net (which identifies the breed of dog in a photo).

Sign up today and start building apps. It’s fun and it’s free!

Sunday, 13 March 2016 03:14:12 (GMT Standard Time, UTC+00:00)