# Monday, 26 June 2017
Monday, 26 June 2017 12:55:57 (GMT Daylight Time, UTC+01:00)
# Monday, 19 June 2017
# Monday, 24 April 2017
Monday, 24 April 2017 12:54:00 (GMT Daylight Time, UTC+01:00)
# Monday, 03 April 2017
Monday, 03 April 2017 16:51:06 (GMT Daylight Time, UTC+01:00)
# Tuesday, 07 February 2017

Managing Big Data takes a lot of process power. Data often needs to be captured, scrubbed, merged, and queried and each of these things can take many hours of compute time. But often they can be performed in parallel - reducing the amount of time, but increasing the number of computers required.

You could buy a bunch of computers, cluster them, and process your data on this process. But this is expensive and these computers are likely to sit idle most of the time.

Cloud Computing tends to be an ideals solution for most Big Data processing because you can rent the servers you need and only pay for them while they are running.

Microsoft Azure offers a full suite of Big Data tools. These tools are based on the popular Hadoop open source project and are collectively known as "HD Insight".

HBase

HBase is a NoSQL data store that is optimized for big data. Unlike SQL Server and other relational databases, the database does not enforce referential integrity, pre-defined schemas, or auto-generated keys. The developer must code these features into the client application. Because the database doesn't need to worry about these things, inputting data tends to be much faster than in a relational database.

HBase also can be scaled to store petabytes of data.

Storm

Apache Storm is a framework that allows you to build workflow engines against real-time data. This is ideal for scenarios like collecting IoT data. The Storm topology consists of a Stream, which is a container that holds a Spout and one or more Bolts. A Spout is a component that accepts data into the Stream and hands it off to Bolts. Each Bolt takes in data; preforms some discrete actions, such as cleaning up the data or looking up values from IDs; and passes data onto one or more other Bolts. Data is passed as "Tuples", which are sets of name-value pairs formatted as JSON. You can write your code in C#, Java, or Python and a Visual Studio template helps you create these components.

Hive

Hive is a data warehouse. With it, you can query NoSQL data (such as Hive) and relational data (such as SQL Server). Hive ships with a query language - HiveQL - that is similar to SQL. Where HiveQL falls short, you can even write user-defined functions to perform more complex calculations.

Spark

Spark is a visualization tool. In Spark, you can write code in R, Python, or Scala. Jupyter notebooks are a popular interactive tools that allow you to create templates consisting of text and code, so that you can generate real-time reports. Jupyter notebooks support both Python and Scala. Spark also ships with a number of libraries that make it easier to connect to data, create graphs, and perform a number of other tasks.

Clusters

Each of the services described above supports running in clusters of servers. In a cluster, these servers process in parallel, greatly reducing the amount of time required to process the data.  You can easily create a cluster in the portal or you can write a script in PowerShell or CLI.

The ease of creating clusters is a big advantage of running HD Insight over deploying your own Hadoop servers and clustering them yourself. Of course, the other advantage is that you do not have to purchase and maintain servers that are only being used occasionally, which can be a big cost saving.

Warning

One word of caution about using these services. You pay for each server in a cluster by the minute. This can quickly add up. Typically, you don't need to have your cluster running for very long in order to complete tasks, so it is a good idea to shut them down when they are finished. Because of this, it's a good idea to script the creation and deletion of your cluster to make it easy to perform these tasks.

Tuesday, 07 February 2017 18:08:01 (GMT Standard Time, UTC+00:00)
# Monday, 23 January 2017
Monday, 23 January 2017 11:43:00 (GMT Standard Time, UTC+00:00)
# Monday, 24 October 2016
Monday, 24 October 2016 09:53:00 (GMT Daylight Time, UTC+01:00)
# Tuesday, 11 October 2016

Microsoft Cognitive Services provides a number of APIs to take advantage of Machine Learning. One of the simplest APIs to use is Sentiment Analysis.

Sentiment Analysis examines one or more text entries and determines whether each text reflects a positive or negative sentiment. It returns a number between 0 and 1: A higher number indicates a more positive sentiment, while a lower number indicates a more negative sentiment.

To use this service, POST a JSON message to the following URL: https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment

Unlike some web Cognitive Service URLs, this one takes no querystring parameters.

In the HTTP header, pass the following information: Content-Type and the Ocp-Apim-Subscription-Key.

The API is a simple REST web service located at https://api.projectoxford.ai/emotion/v1.0/recognize. POST to this service with a header that includes:
Ocp-Apim-Subscription-Key:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

where xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx is your key.

In the Content-Type, pass "application/json".

For the Ocp-Apim-Subscription-Key, include the the Text Analytics key. You can find your key at https://www.projectoxford.ai/Subscription?popup=True

In the body, pass a JSON object that contains an array of documents. Each document contains 3 properties:

language - the Language of the text you want to analyze. Valid values are "English", "Spanish", "French", and "Portuguese".

id - A string that uniquely identifies this document. Used to match the return value to the corresponding text.

text - the text to analyze

Below is a sample JSON body:

{
"documents": [
{
"language": "English",
"id": "text01",
"text": "This is a great day."
}
]
}

After you POST this to the URL, you should expect a response that includes JSON. If all goes well, you will receive an HTTP 200 response and the returned JSON will include an array of documents (the same number that you passed in the Request body). Each Response document will contain

id - matching the id of the document in the Request document.

score - A value between 0 and 1. The higher the score, the more positive the sentiment of the text; The lower the score, the more negative the text sentiment.

You may also receive an array of errors. Each error contains the following properties:

id - matching the id of the document in the Request document.

message - a detailed error message.

Below is an sample response JSON body

{
"documents": [
{
"score": 0.95412,
"id": "text01"
}
]
}

Here is a bit of code to call this API from JavaScript. I am using jQuery's Ajax method and displaying output in a div, like the following:

<div id="OutputDiv"></div> 

var subscriptionKey = "566375db01ad43dc8f62dcc8dc3e5c1f";
var textToAnalyze = "Life is beautiful";

var webSvcUrl = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment";

var outputDiv = $("#OutputDiv");
outputDiv.text("Thinking...");

$.ajax({
type: "POST",
url: webSvcUrl,
headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
contentType: "application/json",
data: '{"documents": [ { "language": "en", "id": "text01", "text": "'+ textToAnalyze + '" }]}'
}).done(function (data) {
if (data.errors.length > 0) {
outputDiv.html("Error: " + data.errors[0]);
}
else if (data.documents.length > 0) {
var score = data.documents[0].score;
if (score > 0.5){
outputText = "That is a Positive thing to say!";
}
else{
outputText = "That is a Negative thing to say!";
}
outputDiv.html(outputText);
}
else {
outputDiv.text("No text to analyze.");
}

}).fail(function (err) {
$("#OutputDiv").text("ERROR! " + err.responseText);
});

Tuesday, 11 October 2016 06:48:00 (GMT Daylight Time, UTC+01:00)
# Saturday, 21 May 2016

Last month, I had the privilege of attending the AWS Summit in Chicago. It was a great experience for me because, although I do a lot of work with cloud computing, I have very little experience with the Amazon Web Services (AWS) platform.

The most interesting session I attended was about a service called "Aurora" (Amazon tends to give all their services catchy names). This is a relational database that looks and acts almost exactly like MySQL but runs much faster. The official product page brags that Aurora is a "MySQL-compatible relational database with 5X performance", however the session I attended claimed that they found cases in which Aurora was 63 times faster than MySQL. The presenters didn't share details of those cases, but even if results are only a fraction of that speed, it's still an impressive performance improvement.

Because Aurora is MySQL-compliant, you should be able to plug it into any application and use it just like MySQL. The SQL syntax is identical and the management tools will be familiar to anyone used to managing MySQL.

Of course, the fact that Aurora is hosted on a cloud platform like AWS gives it the advantage of high availability and flexible scaling that cloud computing offers.

Since most of my cloud computing experience is with Microsoft Azure, I tend to use Azure as a reference point for the services I saw at this summit. I was drawn to Aurora in part because I'm not aware of the same offering in Microsoft Azure.

MySQL as a service is available on Azure, but it's offered and supported by ClearDb - a third party.  If you want better performance or scalability on Azure than that offered by ClearDb, you will need to either switch to a different database or create a Virtual Machine and install MySQL on that, in which case you would be using Infrastructure as a Service, instead of Software as a Service.

In many cases, this is a non-issue. If you are building a new application, you have the flexibility to choose your preferred database technology. MySQL and SQL server have very similar languages; and, although I won't get into a debate here as to which is "better", it would be difficult to argue that SQL server is significantly less reliable or enterprise-ready than MySQL.

But there are times when you don't have a choice of database technologies. For example, if you have a large legacy application that you want to migrate to Azure, it may be a daunting task to migrate every stored procedure and SQL statement to use T-SQL.  Or if you are using a framework that is specifically built on top of MySQL, it makes sense to use that database, rather than re-writing the entire data access layer. Luckily, some frameworks have alternative data access layers. For example, Project Nami is a data access layer for WordPress that uses SQL Server as a data store, rather than MySQL.

Although the various cloud computing companies follow one another and are likely to build a service when they see traction on their competitor's platform, I find it interesting to see these gaps in offerings.

Saturday, 21 May 2016 11:28:00 (GMT Daylight Time, UTC+01:00)
# Monday, 28 March 2016
Monday, 28 March 2016 10:27:00 (GMT Daylight Time, UTC+01:00)
# Sunday, 13 March 2016

Project Oxford is a set of APIs that take advantage of Machine Learning to provide developers with

These technologies require Machine Learning, which requires a lot of computing power and a lot of data. Most of us have neither, but Microsoft does and has used it to create the APIs in Project Oxford.

Project Oxford provides APIs to analyze pictures and voice and provide intelligent information about them.

There are three broad categories of services: Vision, Voice, and Language.

The Vision APIs analyzes pictures and recognizes objects in those pictures.  For example, several Vision APIs are capable of recognizing  faces in an image. One analyzes each face and deduces that person's emotion; another can compare 2 pictures and decide whether or not 2 photographs are the same person; a third guesses the age of each person in a photo.

The Speech APIs can convert speech to text or text to speech. It can also recognize the voice of a given speaker (if you want to use that for authentication in your app, for example) and infer the intent of the speaker from his words and tone.

The Language APIs seem more of a grab bag to me. A spell checker is smart enough to recognize common proper names and homonyms.

All these APIs are currently in Preview but I've played with them and they appear very solid. Many of theme even provide a confidence factor to let you know how confident you should be in the value returned. For example, 2 faces may represent the same person but it helps to know how closely they match.

You can use these APIs. To get started, you need a Project Oxford account, but you can get one for free at projectoxford.ai.

Each API offers a free option that restricts the number and/or frequency of calls, but you can break through that boundary for a charge.

You can also find documentation, sample code, and even a place to try out each API live in your browser at projectoxford.ai.

You call each one by passing and receiving JSON to a RESTful web service, but some of them offer an SDK to make it easier to make that call from a .NET application.

You can see a couple of fun applications of Project Oxford at how-old.net (which guesses the ages of people in photographs) and what-dog.net (which identifies the breed of dog in a photo).

Sign up today and start building apps. It’s fun and it’s free!

Sunday, 13 March 2016 03:14:12 (GMT Standard Time, UTC+00:00)
# Wednesday, 14 October 2015

I recently spoke with Data Scientist Richard Conway from Elasta Games, who described how his company does analysis of online games.

You can watch and listen to this interview below.

Wednesday, 14 October 2015 12:36:00 (GMT Daylight Time, UTC+01:00)
# Monday, 24 February 2014
Monday, 24 February 2014 18:01:00 (GMT Standard Time, UTC+00:00)
# Monday, 28 January 2013
Monday, 28 January 2013 15:40:30 (GMT Standard Time, UTC+00:00)
# Thursday, 19 August 2010
Thursday, 19 August 2010 01:41:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 27 May 2010

Tonight, I attended the Cloud Camp Detroit, which was built primarily around Open Spaces discussion and a panel discussion in front of an audience. The basics of cloud computing, specific cloud implementations and issues such as security were discussed. The "eyes-front" presentations were limited to half a dozen lightning talks.

I had a chance to interact with a lot of people far more experienced than me in this area. Many of them work outside the .Net world, so talking with them helps me see the software industry in a different perspective.

I filled in for a sick friend to deliver a presentation on Windows Azure. Below are the slides from my presentation. Thanks to Abe Pachikara of Microsoft for supplying the slides.

Thursday, 27 May 2010 05:04:45 (GMT Daylight Time, UTC+01:00)