Supervised learning learns to map inputs to outputs by minimising the difference between its predictions and the provided correct outputs/answers (ground truth).

  • the learning phase is called the training phase
  • the dataset used is the training set
  • the trained agent function is called a model/hypothesis

When learning is done, the model can predict the output for new/unseen data

  • this phase is called testing/evaluation phase
  • the test set can be used to measure the perfomance of the model
  • performance on unseen data measures generalisation of the model

Evaluation PhaseTraining SetLearning AlgorithmHypothesis ClassPerformance MeasureHypothesisTest SetTest Phase

Summary

Given a training set generated by an unknown function , find a function that approximates closely .

Tasks

Classification: A type of supervised learning where the goal is to predict a discrete label or category based on input features

  • output: categorical value

Tumour classification

A tumour can be classified by benign, or malign

Regression: A type of supervised learning where the goal is to predict a continuous numerical value based on input features

  • output: real number

Housing price prediction

Given a set of housing prices historically, future prices can be projected.

Dataset

The dataset is represented as a set of pairs

where

  • is the input to the output

There is an assumed underlying true relationship between input features and labels :

where

  • is the true but unknown function to generate label from input features
  • is the error term which accounts for randomness or imperfections in data generations process

Hypothesis Class

A hypothesis class refers to the set of all possible models or functions (hypothesis/model) that maps from inputs that can be learned by a learning alogrithm.

Goal

Find a hypothesis/model that best approximates .

Learning Algorithm

A learning algorithm takes in a training set , consisting of pairs and seeks to find a model/hypothesis to approximate the true relationship between inputs and outputs.

Performance Measure

Generally done using a test set

Regression: Error

The performance measure can be computed using its error:

where

It also can be computed using its mean absolute error (MAE):

Classification: Correctness

The accuracy can be used as a performance measure, which checks the amount of correct predictions, over the total amount of data.

A confusion matrix can also be used:

True PositiveFalse Positive (Type I error)
False Negative (Type 2 error)True Negative

The precision refers to the preciseness of the positive predicted instances. This should be maximised if false positives are very costly (Type I chart).

The recall refers to the percentage of how much actual positive instances are recalled. This should be maximised if false negative are very costly (Type II error).

The score is a combination of the two metrics:

Types