# Thursday, 05 July 2018

GCast 5:

Azure Databricks

Learn how to create a free library and Jupyter notebooks hosted in Azure.

Thursday, 05 July 2018 09:43:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 28 June 2018

GCast 4:

Azure Notebooks

Learn how to create a free library and Jupyter notebooks hosted in Azure.

Thursday, 28 June 2018 07:45:00 (GMT Daylight Time, UTC+01:00)
# Saturday, 23 June 2018

On May 19, I delivered a presentation titled "How Cloud Computing Empowers a Data Scientist" at the Chicago AI & Data Science Conference.

I described ways that the cloud has accelerated the fields of data science, machine learning, and artificial intelligence; and I gave examples of Azure tools that facilitate development in these fields.

You can watch the video below or at https://youtu.be/H19IW6nykZo

Saturday, 23 June 2018 08:17:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 21 June 2018

GCast 3:

Creating and Deploying a Predictive Web Services with Azure ML Studio

Building on Episode 2, I show you how to create, publish, and call a predictive web service from a trained model built with Azure ML Studio.

Thursday, 21 June 2018 08:26:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 14 June 2018

GCast 2:

Azure Machine Learning Studio

Azure Machine Learning Studio is a graphical designer that allows you to quickly build Machine Learning solutions without writing a lot of code.

Thursday, 14 June 2018 02:47:24 (GMT Daylight Time, UTC+01:00)
# Friday, 25 May 2018

One of the challenges of working with data is what to do with missing data.

A missing column in a dataset can be, but is not limited to the following:

  • No text or numbers between 2 column delimiters
  • An empty string ("")
  • A blank string (e.g., "   ")
  • The number 0
  • A special indicator, such as "NA" or "NONE"
  • An inconsistent data type, such as a number where a string is expected
  • A value that makes no sense in the context of the data.

The last one requires some domain knowledge about the data, so it is often difficult to spot.

There are two strategies for dealing with missing data

  1. Delete or ignore the entire row
  2. Replace the column with a reasonable value.

If only a few rows contain missing data, it may be efficient to simply delete these rows.

But if many rows contain missing data, it probably makes sense to keep them as other columns may contain valuable information. In this case, we will want to replace the missing data with a reasonable value.

But what is a reasonable value?

Options include replacing the column with an average value, such as the mean or median of the non-missing values. Of course, this is only valid for numeric data that is ordinal, that is data in which higher numbers indicate a higher value and not simply a discrete category.

The Pandas library contains some simple functions for deleting rows and replacing values. The fillna function is the simplest way to do this.

# Replace all missing values with 0

# Replace all missing values with the string 'Missing'

You can delete invalid or missing rows by overwriting a dataset with a filtered version of that set, as in the following examples

# Delete all rows with area = 0 
df = df[df.area != 0]

# Delete all rows with null area 
df = df[df.area.notnull()] 

But for values that are not missing, but are inappropriate (e.g., using 0 to represent a missing data point, when 0 could be a valid measurement), we can use the map function.

Below, we use the map function to want to replace any value in the 'area' column that has a value of 0 with the mean value for this column.

# Replace 0 area with the mean
mean_area = df['area'].mean()   
df['area'] = df['area'].map({0: mean_area})    

In this article, we discussed ways to use Pandas handle missing data in a dataframe.

Friday, 25 May 2018 00:44:34 (GMT Daylight Time, UTC+01:00)
# Friday, 18 May 2018

I was working with a dataset in Azure ML Studio and I needed to replace values in a column.

Reasons for replacing value include:

  • Replacing codes with a more readable word or words
  • Consistency when combining 2 sets of data
  • Converting to numeric values in order to assign values to discrete strings
  • Converting to numeric values in order to work with an algorithm that only accepts numeric values

There is no built-in shape to do this, but you can do so with a couple lines of code.

I can demonstrate by creating a new ML studio experiment and dragging the "Automobile Price Data" sample dataset onto the experiment design surface, as shown in Fig. 1.

Fig. 1

If we click on this shape and select "Visualize" (Fig. 2), we can see the data in the dataset. Click the "drive-wheels" column to see details about the data in that column (Fig. 3).

Fig. 2

Fig. 3

You can see from the visualization that the "drive-wheels" column contains 3 distinct values: "fwd", "rwd", "4wd"
Imagine I wanted to replace these with "FRONT", "REAR", and "FOUR", respectively. (Maybe to be consistent with a second dataset I plan to merge with this one.)

Drag an "Execute Python Script" shape to the  Experiment and connect its input to the output of the data shape (Fig. 4).

Fig. 4

In the Properties of the "Execute Python Script" shape, replace the existing code with the following:

import pandas as pd
def azureml_main(dataframe1 = None):
    dataframe1['drive-wheels'] = dataframe1['drive-wheels'].map({'fwd': 'FRONT', 'rwd': 'REAR', '4wd': 'FOUR'})
    return dataframe1,

The azureml_main function is required by ML Studio. It accepts one parameter - a dataframe, which we name “dataframe1”

The first line of code maps the 3 existing drive-wheels values to 3 new values for every row and saves these 3 new values back to the dataset. By returning that dataframe, we make this updated data the output of this shape, so it can be used by later steps in our experiment.

After running this experiment, we can click the script shape and Visualize the output and see that each value in "drive-wheels" has been replaced, as shown in Fig 5.

Fig. 5

This article shows a simple way to replace values in a dataframe column with new values in Azure ML Studio.

Friday, 18 May 2018 22:15:08 (GMT Daylight Time, UTC+01:00)
# Thursday, 17 May 2018

Many Machine Learning solutions have the same steps in common. For example, you will need to retrieve data from one or more sources; you will want to split your data into training and testing subsets; and you will want to clean up your source data. I refer to these as "plumbing tasks” because they are common to so many projects; and spending time coding these tasks takes time away from working on your data and your solution.

Azure Machine Learning Studio can help.

Azure Machine Learning Studio or "ML Studio" is a graphical design tool for building machine learning solutions. It includes a design surface and a set of shapes to perform specific tasks.

To work with ML Studio, drag a shape onto the design surface, set some properties, and connect the inputs and/or outputs to other shapes to build the workflow of your solution. For example, you can drag an "Import Data" shape onto a form and set information about the data source and data type. This is "plumbing" code that you do not have to write.


If a shape for a desired task does not exist, there are shapes that allow you to write custom code. Supported languages are Python and R.

When you finish building and testing your solution, buttons at the bottom allow you to configure and deploy a web service, so that your model is accessible via a simple API. There is even a test page, allowing you to call this API from within your browser.

You can get a free trial at https://studio.azureml.net/

There are limits to the free version. You cannot configure the size and number of instances on which it will run, and you are limited to 10 GB storage. If you cannot work within these restrictions, you can sign up for an Azure account and pay for the resources you use. Current pricing is available at this link.


If you are looking for a quick and simple way to build a machine learning solution, Azure Machine Learning Studio may be the tool for you.

Thursday, 17 May 2018 23:58:00 (GMT Daylight Time, UTC+01:00)
# Monday, 14 May 2018
Monday, 14 May 2018 09:33:00 (GMT Daylight Time, UTC+01:00)
# Tuesday, 08 May 2018

IMG_0394The DataFest concept was created back in 2011 by the American Statistical Association. Students are provided a large data set and are given 2 full days to report on some useful insights and/or visualizations about the data.

IMG_0388I attended the ASA DataFest at the University of Toronto May 1-2. The event was organized by UT Professor Nathan Taback, who served as host.

The students – all from U of T - worked in teams of 2-4 and presented their findings on the evening of the second day. This was a judged competition with prizes for the top 4 teams.

Students had no knowledge of the data set before it was made available to them when they showed up at the venue. Data was provided by the Indeed job search engine and included information on job postings in the United States, Canada, and Germany.

IMG_0406 Following Dr. Taback's opening remarks, I delivered a presentation on Data Science tools in Azure, including demos of Machine Learning Studio and Azure Notebooks. Over half the teams ended up using these tools in their analysis.

IMG_0407In addition to the opening ceremonies, I served as a mentor during the DataFest and a judge at the end. Several professors and students donated their time as mentors during the event and judges included professors and industry professionals. I also recruited local MVPs Atley Hunter and Vivek Patel, along with user group leader Ashraf Ghonaim to serve as mentors and/or judges.

Almost 200 students attended, and 19 teams presented their findings on Day 2.

IMG_0414The winning team used Azure ML Studio to split users into low, medium, and high salary ranges and determine the factors required to move from one level to the next level above.

Microsoft donated prizes and money for food to the event (along with my time) and Azure credits for the students to use.

Tuesday, 08 May 2018 12:01:00 (GMT Daylight Time, UTC+01:00)
# Monday, 09 April 2018
Monday, 09 April 2018 09:35:00 (GMT Daylight Time, UTC+01:00)
# Monday, 22 January 2018
Monday, 22 January 2018 15:07:48 (GMT Standard Time, UTC+00:00)
# Wednesday, 27 December 2017

As I discussed in a previous article, Microsoft Cognitive Services includes a set of APIs that allow your applications to take advantage of Machine Learning in order to analyze, image, sound, video, and language.

Your application uses Cognitive Services by calling one or more RESTful web services. These services require you to pass a key in the header of each HTTP call. You can generate this key from the Azure portal.

If you don't have an Azure account, you can get a free one at https://azure.microsoft.com/free/.

Once you have an Azure Account, navigate to the Azure Portal.

Figure 1

Here you can create a Cognitive Services API key. Click the button in the top left of the portal (Figure 2)

Figure 2

It’s worth noting that the “New” button caption sometimes changes to “Create a Resource” (Figure 2a)

Figure 2a

From the flyout menu, select AI+Cognitive Services. A list of Cognitive Services displays. Select the service you want to call. For this demo,I will select Computer Vision API, as shown in Figure 3.

Figure 3

The Face API blade displays as shown in Figure 4.

Figure 4

At the Name textbox, enter a name for this service account.

At the Subscription dropdown, select the Azure subscription to associate with this service.

At the Location dropdown, select the region in which you want to host this service. You should select a region close to those who will be consuming the service. Make note of the region you selected.

At the Pricing Tier dropdown, select the pricing tier you want to use. Currently, the choices are F0 (which is free, but limited to 20 calls per minute); and S1 (which is not free, but allows more calls.) Click the View full pricing details link to see how much S1 will cost.

At the Resource Group field, select or create an Azure Resource Group. Resource Groups allow you to logically group different Azure resources, so you can manage them together.

Click the [Create] button to create the account. The creation typically takes less than a minute and a message displays when the service is created, as shown in Figure 5.

Figure 5

Click the [Go to resource] button to open a blade to configure the newly-created service. Alternatively, you can select "All Resources" on the left menu and search for your service by name. Either way, the service blade displays, as as shown in Figure 6.

Figure 6

The important pieces of information in this blade are the Endpoint (on the Overview tab, Figure 7) and the Access Keys (on the Keys tab, as shown in Figure 8). Within this blade, you also have the opportunity to view log files and other tools to help troubleshoot your service. And you can set authorization and other restrictions to your service.

Figure 7

Figure 8

The process is almost identical when you create a key for any other Cognitive Service. The only difference is that you will select a different service set in the AI+Cognitive Services flyout.

Wednesday, 27 December 2017 10:35:00 (GMT Standard Time, UTC+00:00)
# Tuesday, 26 December 2017

Microsoft Cognitive Services is a set of APIs that take advantage of Machine Learning to provide developers with an easy way to analyze images, speech, language, and others.

If you have worked with or studied Machine Learning, you know that you can accomplish a lot, but that it requires a lot of computing power, a lot of time, and a lot of data. Since most of us have a limited amount of each of these, we can take advantage of the fact that Microsoft has data, time, and the computing power of Azure. They have used this power to analyze large data sets and expose the results via a set of web services, collectively known as Cognitive Services.

The APIs of Cognitive Services are divided into 5 broad categories: Vision, Speech, Language, Knowledge, and Search.

Vision APIs

The Vision APIs provide information about a given photograph or video. For example, several Vision APIs are capable of recognizing  faces in an image. One analyzes each face and deduces that person's emotion; another can compare 2 pictures and decide whether or not 2 photographs are the same person; a third guesses the age of each person in a photo.

Speech APIs

The Speech APIs can convert speech to text or text to speech. It can also recognize the voice of a given speaker (You might use this to authenticate users, for example) and infer the intent of the speaker from his words and tone. The Translator Speech API supports translations between 10 different spoken languages.


The Language APIs include a variety of services. A spell checker is smart enough to recognize common proper names and homonyms. And the Translator Text API can detect the language in which a text is written and translate that text into another language. The Text Analytics API analyzes a document for the sentiment expressed, returning a score based on how positive or negative is the wording and tone of the document. The most interesting API in this group is the Language Understanding Intelligence Service (LUIS) that allows you to build custom language models so that your application can understand questions and statements from your users in a variety of formats.


Knowledge includes a variety of APIs - from customer recommendations to smart querying and information about the context of text. Many of these services take advantage of natural language processing. As of this writing, all of these services are in preview.


The Search APIs allow you to retrieve Bing search results with a single web service call.

You can use these APIs. To get started, you need an Azure account. You can get a free Azure trial at https://azure.microsoft.com/.

Each API offers a free option that restricts the number and/or frequency of calls, but you can break through that boundary for a charge.  Because they are hosted in Azure, the paid services can scale out to meet increased demand.

You call most of these APIs by passing and receiving JSON to a RESTful web service. Some of the more complex services offer configuration and setup beforehand.

These APIs are capable of analyzing pictures, text, and speech because each service draws on the knowledge learned from parsing countless photos, documents, etc. beforehand.
You can find documentation, sample code, and even a place to try out each API live in your browser at https://azure.microsoft.com/en-us/services/cognitive-services/

A couple of fun applications of Cognitive Services are how-old.net (which guesses the ages of people in photographs) and what-dog.net (which identifies the breed of dog in a photo).

Below is a screenshot from the Azure documentation page, listing the sets of services. But keep checking back, because this list grows and each set contains one or more services.

List of Cognitive Services
Sign up today and start building apps. It’s fun, it's useful, and it’s free!

Tuesday, 26 December 2017 10:25:00 (GMT Standard Time, UTC+00:00)
# Tuesday, 08 August 2017
Tuesday, 08 August 2017 05:21:28 (GMT Daylight Time, UTC+01:00)
# Monday, 10 April 2017
# Monday, 13 March 2017
Monday, 13 March 2017 12:17:00 (GMT Standard Time, UTC+00:00)