A convolutional neural network takes a -dimensional input.

2D input

Given a 2D input , a sliding window is multiplied by the kernel , to get the feature map.

Without any hidden pixels, and stride = , if is and is , the feature map would be of size .

Kernel/vectorFeature map2x24x43x3

Padding

Adding hidden pixels to control the output shape.

Stride

Amount of movement of the sliding window.

Pooling

Downsamples feature maps

Aggregation methods:

  • Max-Pool
  • Average-Pool
  • Sum-Pool