# Friday, 18 May 2018

I was working with a dataset in Azure ML Studio and I needed to replace values in a column.

Reasons for replacing value include:

  • Replacing codes with a more readable word or words
  • Consistency when combining 2 sets of data
  • Converting to numeric values in order to assign values to discrete strings
  • Converting to numeric values in order to work with an algorithm that only accepts numeric values

There is no built-in shape to do this, but you can do so with a couple lines of code.

I can demonstrate by creating a new ML studio experiment and dragging the "Automobile Price Data" sample dataset onto the experiment design surface, as shown in Fig. 1.

MLRe01-AutomobilePriceData
Fig. 1

If we click on this shape and select "Visualize" (Fig. 2), we can see the data in the dataset. Click the "drive-wheels" column to see details about the data in that column (Fig. 3).

MLRe02DataSetMenu
Fig. 2

MLRe03VisualizeData-Before
Fig. 3

You can see from the visualization that the "drive-wheels" column contains 3 distinct values: "fwd", "rwd", "4wd"
Imagine I wanted to replace these with "FRONT", "REAR", and "FOUR", respectively. (Maybe to be consistent with a second dataset I plan to merge with this one.)

Drag an "Execute Python Script" shape to the  Experiment and connect its input to the output of the data shape (Fig. 4).

MLRe04-TwoShapes
Fig. 4

In the Properties of the "Execute Python Script" shape, replace the existing code with the following:

import pandas as pd
def azureml_main(dataframe1 = None):
    dataframe1['drive-wheels'] = dataframe1['drive-wheels'].map({'fwd': 'FRONT', 'rwd': 'REAR', '4wd': 'FOUR'})
    return dataframe1,
    

The azureml_main function is required by ML Studio. It accepts one parameter - a dataframe, which we name “dataframe1”

The first line of code maps the 3 existing drive-wheels values to 3 new values for every row and saves these 3 new values back to the dataset. By returning that dataframe, we make this updated data the output of this shape, so it can be used by later steps in our experiment.

After running this experiment, we can click the script shape and Visualize the output and see that each value in "drive-wheels" has been replaced, as shown in Fig 5.

MLRe05-VisualizeData-After
Fig. 5

This article shows a simple way to replace values in a dataframe column with new values in Azure ML Studio.

Comments are closed.