Supervised learning learns to map inputs to outputs by minimising the difference between its predictions and the provided correct outputs/answers (ground truth).

the learning phase is called the training phase
the dataset used is the training set
the trained agent function is called a model/hypothesis

When learning is done, the model can predict the output for new/unseen data

this phase is called testing/evaluation phase
the test set can be used to measure the perfomance of the model
performance on unseen data measures generalisation of the model

Evaluation PhaseTraining SetLearning AlgorithmHypothesis ClassPerformance MeasureHypothesisTest SetTest Phase

Summary

Given a training set generated by an unknown function , find a function that approximates closely .

Tasks

Classification: A type of supervised learning where the goal is to predict a discrete label or category based on input features

output: categorical value

Tumour classification

A tumour can be classified by benign, or malign

Regression: A type of supervised learning where the goal is to predict a continuous numerical value based on input features

output: real number

Housing price prediction

Given a set of housing prices historically, future prices can be projected.

Dataset

The dataset is represented as a set of pairs

where

is the input to the output

There is an assumed underlying true relationship between input features and labels :

where

is the true but unknown function to generate label from input features
is the error term which accounts for randomness or imperfections in data generations process

Hypothesis Class

A hypothesis class refers to the set of all possible models or functions (hypothesis/model) that maps from inputs that can be learned by a learning alogrithm.

Goal

Find a hypothesis/model that best approximates .

Learning Algorithm

A learning algorithm takes in a training set , consisting of pairs and seeks to find a model/hypothesis to approximate the true relationship between inputs and outputs.

Performance Measure

Generally done using a test set

Regression: Error

The performance measure can be computed using its error:

where

It also can be computed using its mean absolute error (MAE):

Classification: Correctness

The accuracy can be used as a performance measure, which checks the amount of correct predictions, over the total amount of data.

A confusion matrix can also be used:


True Positive	False Positive (Type I error)
False Negative (Type 2 error)	True Negative

The precision refers to the preciseness of the positive predicted instances. This should be maximised if false positives are very costly (Type I chart).

The recall refers to the percentage of how much actual positive instances are recalled. This should be maximised if false negative are very costly (Type II error).

The score is a combination of the two metrics:

Explorer

Supervised Learning

Tasks

Dataset

Hypothesis Class

Learning Algorithm

Performance Measure

Regression: Error

Classification: Correctness

Types

Graph View

Table of Contents

Backlinks