Perceptron
The perceptron takes in inputs with weights, gets a sum and passes it through a sign function to generate an output.
Given an input vector
where
Learning Algorithm
- Initialise weights
- Loop (until convergence/max steps)
- For each instance (
), classify - Select a misclassified instance
- Update weights
is the learning rate
- For each instance (
If the data is not linearly separable, the algorithm will not converge.
Why does the learning algorithm work?
Consider when there are misclassifications:
Case 1: Positive predicted as negative
The current calculation obtained is:
What we require is:
Thus, to “fix” our model, we have to reduce our
Case 2: Negative as positive
The current calculation obtained is:
What we require is:
Thus, to “fix” our model, we have to increase our
Neuron
Neuron
A generalised version of the perceptron - the building block of neural networks.
Sign function
This function is seen in the perceptron model.
Sigmoid function
This function is used to convert a linear regression model to a logistic regression model, making a classifier from values from regression.
tanh
ReLU
Leaky ReLU
Maxout
ELU
Neural network
Single-Layer
We can use a single-layer of neurons to simulate simple boolean functions.
For example, given a OR
function, we have the following inputs:
x1 | x2 | 0R |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
We can derive the relevant weights by considering the model:
Thus, we can get the following inequalities from the inputs:
We can then derive a set of weights that passes these criteria.
Multi-Layer
However, some boolean functions are not linearly separable, like XNOR
.
We can then model these functions by using multiple layers of neurons - for example:
XNOR
=NOR
,AND
Neural network vs Logistic/linear regression model
Logistic/linear regression relies on manual feature engineering to capture complex patterns, while a multi-layer neural network learns its own feature representations through its hidden layers and non-linear activations.
XNOR
xy
fromx,y
).The XNOR model can have hidden layers to simulate the NOR and AND layers, while feature engineering would be needed to capture this pattern in the model (new feature
Forward Propagation
Forward propagation
Process in a neural network where the input data is passed through the network’s layers to generate an output.
Forward propagation is used to do predictions.
Matrix multiplication can be used to get the outputs here, for example, imagining the model above (with no other layers):
Multi-class classification
Given a vector
where
Gradient Descent
Chain rule
Given the composition of functions, we can compute a derivative, for example:
can be derived with regards to
Multiple input
Given an equation,
we can still get the derivative of
and then get the derivative:
Gradient computation
Singular neuron
Given a singular neuron, we can
- generate the predicted value for a given data point
- define the loss function
- differentiate using chain rule
For example. given the activation function
and then compute the gradient of the loss function:
Backpropagation
Used to compute the gradient of the loss function with respect to each weight.
Thus, to find the weight
A forward pass gets us all the intermediary results as seen above, and a backward pass gets us all the intermediary derivatives of loss, with regards to the particular intermediary result.
Issues
Overfitting
Dropout
Dropout prevents overfitting by randomly setting some neuron outputs to 0. This prevents the neural network from hard-memorising the pattern as seen in the data.
Early Stopping
While training - stop training when the validation and training loss is at a minimum.
Vanishing/Exploding Gradient
Vanishing gradient
Small gradients multipled repeatedly until zero
Solution
Change activation functions.
Exploding gradient
Gradients are multiplied again and again until overflowing.
Solution
Clip gradient within a range.