Step-By-Step: Getting Started with Azure Machine Learning

Artificial Intelligence (AI) study and use is on the rise.  Tools to enable AI are becoming more readily available, simpler to use and easier to implement.  What's more is that the definition of AI itself has been broken down into ingredients that, when later applied into a recipe (or process), can provide multiple desired outcomes.  One of the more important ingredients used in most recipes is Machine Learning.

Machine Learning in essence is a way of teaching computers to provide more accurate predictions on provided data. These predictions can also make apps and devices smarter by providing recommendations as an outcome to the data.

In the pursuit of making roads safer, Toyota Canada has been capturing data from mechanics in all of Toyota Canada's 300 dealerships on the vehicles they repair. In the past, the repair data was extracted from Toyota Canada's service application manually and stored in databases on premise to later be analysed.  While parts of the analytics process were automated, the entire process took over 6 months to process the reams of data to provide a part replacement recommendation.  Toyota Canada wanted to reduce the process time and so approached Microsoft to collaborate in a Machine Learning Hackfest to come up with a solution.

While we are unable to detail the exact process undertaken by Toyota Canada and Microsoft as completed during the Hackfest itself, this post will walk through steps accomplishing a similar exercise to enable further understanding of the Machine Learning process. The step-by-step detailed below will set up a pricing prediction of specific vehicles.

Lets get started.

Step 1: Accessing Machine Learning Studio

To begin this exercise, navigate to https://studio.azureml.net and select Sign up here. Nextchoose between free and paid options to complete this exercise.

NOTE: Select Sign In if you have already completed a Machine Learning experiment previously and simple enter your credentials.

machine_learning_toyota_001

You are ready to begin the exercise once you are able to access the Microsoft Azure Machine Learning Studio.

Step 2: Getting the Data to Analyze

Next you'll need to acquire data to analyze.  Machine Learning Studio has many sample datasets to choose from or you can even import your own dataset from almost any source.  In keeping with the automotive theme, the Automobile price data (Raw) dataset will be used in this exercise.  This dataset provides data on various cars including make, model, price and specifications

The first thing you need to perform machine learning is data. There are several sample datasets included with Machine Learning Studio that you can use, or you can import data from many sources. For this example, we'll use the sample dataset, Automobile price data (Raw) , that's included in your workspace. This dataset includes entries for various individual automobiles, including information such as make, model, technical specifications, and price.

NOTE: All data used in this exercise is factitious and does not represent the current automotive market.

Let's now capture the dataset for this experiment.

  1. Click +NEW located at the bottom of the Machine Learning Studio window to create a new experiment
  2. Select EXPERIMENTBlank Experiment
  3. Name the experiment Automotive Price Prediction Exercise by selecting and replacing the text found at the topmachine_learning_toyota_002
  4. In the Search box located in the top left hand side, enter automobile to find the dataset labeled Automobile price data (Raw)
  5. Drag the dataset to the experiment canvasmachine_learning_toyota_003NOTE: Click the output port at the bottom of the automobile dataset, and then select Visualize to see what the automotive dataset looks like

Step 3: Preparation of the Data

Preprocessing the dataset is needed to ensure missing values are addressed prior to running the prediction exercise. As noted in the newly added automotive dataset, the normalized-losses column is missing many values and will have to be excluded to provide a better prediction.

  1. In the Search box located in the top left hand side, enter select columnsand located the Select Columns in Dataset module
  2. Drag the module to the newly created experiment canvas
    NOTE: This module allows for the selection of columns of data to be included or excluded in this exercise
  3. Connect the output port of the Automobile price data (Raw) dataset to the input port of the Select Columns in Dataset module
  4. Select the Select Columns in Dataset module
  5. Click Launch column selector in the Properties panemachine_learning_toyota_004
  6. Click With rules located on the left
  7. Under Begin With, click All columns. This directs Select Columns in Dataset to pass through all the columns (except those columns we're about to exclude).
  8. From the drop-downs, select Exclude and column names, and then click inside the text box. A list of columns is displayed. Select normalized-losses, and it's added to the text box.machine_learning_toyota_005
  9. Click the check mark to close the column selector
    NOTE: The properties pane for Select Columns in Dataset now shows that all columns from the dataset will pass through except normalized-losses
  10. Drag the Clean Missing Data module to the experiment canvas and connect it to the Select Columns in Dataset module
  11. In the Properties pane, select Remove entire row under Cleaning modeNOTE: This directs Clean Missing Data to clean the data by removing rows that have any missing values.machine_learning_toyota_006
  12. Double-click the module and type the comment Remove missing value rows
  13. Click RUN at the bottom of the page

Step 4: Defining Features

Machine Leaning Features are individual measurable properties that are of interest. In Automotive Price dataset, each row represents one car, and each column is a feature of that vehicle. Experimentation and knowledge about the problem you want to solve are needed to find a good set of features to create a predictive model.

This experiment will build a model that uses a subset of the features in the automotive dataset.  These features include:
make, body-style, wheel-base, engine-size, horsepower, peak-rpm, highway-mpg, price

  1. Find and drag another Select Columns in the Dataset module to the experiment canvas
  2. Connect the left output port of the Clean Missing Data module to the input of the Select Columns in Dataset module
  3. Double-click the module and type Select features for prediction
  4. Click Launch column selector in the Properties pane
  5. Click With rules
  6. Click No columns under Begin With
  7. Select Include and column names in the filter row
  8. Select the list of column names (as listed above prior to the start of Step 3's steps) in the text box
    machine_learning_toyota_007
  9. Click the check mark button to confirm the selection

Step 5: Selecting and Applying a Learning Algorithm

With the appropriate data now repaired, training and testing of a predictive model can now commence. The data will now be uses to train the model and test the model to review price prediction.  For this experiment the regression machine learning algorithm will be used.

Regression is used to predict a number which will come in handing when predicting pricing. More specifically, this experiment will use the simple linear regression model. The data itself will be used for both training the model and testing.  This is completed by splitting the data into separate training and testing datasets.

  1. Find, select and drag the Split Data module to the experiment canvas
  2. Connect the Split Data module to the last Select Columns in Dataset module
  3. Click the Split Data module
  4. In the Properties pane to the right of the canvas, find the Fraction of rows in the first output dataset () and set it to 0.75
    machine_learning_toyota_008
  5. Run the experiment
  6. Expand the Machine Learning category in the module palette to the left of the canvas to select the learning algorithm
  7. Expand Initialize Model
    NOTE: This displays several categories of modules that can be used tomachine_learning_toyota_009 initialize machine learning algorithms
  8. Select the Linear Regression module under the Regression category and drag it to the experiment canvas
    machine_learning_toyota_009
  9. Find and drag the Train Model module to the experiment canvas
  10. Connect the output of the Linear Regression module to the left input of the Train Model module, and connect the training data output (left port) of the Split Data module to the right input of the Train Model module
    machine_learning_toyota_010
    NOTE: Please pay attention to the port utilized as the experiment will not work if connected incorrectly
  11. Click the Train Model module
  12. Select Launch column selector in the Properties pane
  13. Select the price column and move it to the Selected columns list (This is the value that the experiment is going to predict)
  14. Click the check mark button to confirm the selection
  15. Run the experiment

Step 6: Predict New Automobile Pricing

The experiment can now score the 25 percent of data to how the model functions being trained on the other 75 percent.

  1. Find and drag the Score Model module to the experiment canvas
  2. Connect the output of the Train Model module to the left input port of Score Model
  3. Connect the test data output (right port) of the Split Data module to the right input port of Score Modelmachine_learning_toyota_012
  4. Run the experiment
  5. Click the output port of Score Model and select Visualize toview the output from the Score Model module
    machine_learning_toyota_013
    NOTE:  The output shows the predicted values for price and the known values from the test data
    machine_learning_toyota_014

Congratulations as you have now completed your first machine learning experiment.  Next steps would be to try an improve the prediction and then deploy it as a predictive web service. Experiment further by adding multiple machine learning algorithms, modifying the properties of the Linear Regression algorithm or trying a different algorithm altogether.