Perceptron

The perceptron takes in inputs with weights, gets a sum and passes it through a sign function to generate an output.

Given an input vector of dimension , the perceptron model is defined as:

where is the sign function:

Learning Algorithm

  • Initialise weights
  • Loop (until convergence/max steps)
    • For each instance (), classify
    • Select a misclassified instance
    • Update weights
      • is the learning rate

If the data is not linearly separable, the algorithm will not converge.

Why does the learning algorithm work?

Consider when there are misclassifications:

Case 1: Positive predicted as negative

The current calculation obtained is:

What we require is:

Thus, to “fix” our model, we have to reduce our . This can be done by adding to ,

Case 2: Negative as positive

The current calculation obtained is:

What we require is:

Thus, to “fix” our model, we have to increase our . This can be done by subtracting from ,

Neuron

Neuron

A generalised version of the perceptron - the building block of neural networks.

Sign function

This function is seen in the perceptron model.

Sigmoid function

This function is used to convert a linear regression model to a logistic regression model, making a classifier from values from regression.

tanh

ReLU

Leaky ReLU

Maxout

ELU

Neural network

inputsweightsbiasweight1sumactivationLayerweightsbiasweight1sumactivationLayeroutputoutputsumactivationLayersumactivationLayerbiasweight1biasweight1sumactivationLayerbiasweight1outputoutputoutput

Single-Layer

We can use a single-layer of neurons to simulate simple boolean functions.

For example, given a OR function, we have the following inputs:

x1x20R
000
011
101
111

We can derive the relevant weights by considering the model:

Thus, we can get the following inequalities from the inputs:

We can then derive a set of weights that passes these criteria.

Multi-Layer

However, some boolean functions are not linearly separable, like XNOR.

We can then model these functions by using multiple layers of neurons - for example:

  • XNOR = NOR, AND
ANDNOR

Neural network vs Logistic/linear regression model

Logistic/linear regression relies on manual feature engineering to capture complex patterns, while a multi-layer neural network learns its own feature representations through its hidden layers and non-linear activations.

XNOR x_{1}x_{2}).

The XNOR model can have hidden layers to simulate the NOR and AND layers, while feature engineering would be needed to capture this pattern in the model (new feature

Forward Propagation

Forward propagation

Process in a neural network where the input data is passed through the network’s layers to generate an output.

Forward propagation is used to do predictions.

W[1]W[..L]W[L]Forward Propagation

Matrix multiplication can be used to get the outputs here, for example, imagining the model above (with no other layers):

Multi-class classification

class classification can be used for neural networks.

....prediction for class 1prediction for class 2prediction for class cSoftmax activation

Given a vector , the softmax function computes the output for each as:

where