ConvolutionFully Connected LayerActivationShallow CNNTrainingRegularizationDropoutBatch normalization
Convolution
- Convolutional matrix, matrix is small (3x3, 5x5)
- Extract features of input images
- Kernal function in Image processing
- Ridge detection, Sharpen, Box blur
- Padding
- Add zero values in boundary of input image
- Stride
- Elements of sliding window of convolution kernel
- Sliding Window
- MAC
- Multiply Accumlation operation
- IM2COL
- Transform n-dimmenstion data into 2D matrix data
- more efficient operation
- GEMM
- General Matrix to Matrix Multiplication
- Pooling
- Resize the feature map
- Reduce the resolution of feature map
- Max pooling, Average pooling
Fully Connected Layer
- Reshape 2D feature maps from 2D to 1D
- All weights are matching with each feature map pixel
Activation
- sigmoid
- tabh
- ReLU
- LeakyReLU
Shallow CNN
- Shallow Neural Network
- Backpropagation in Shallow Neural Network
- Max pooling backpropagation
Training
- Feed forward & backward → weights are updated to fit training data
- Gradient descent method
- Optimizer
- the goal of gradient descent is usually to minimize the loss fuction for a machine learning problem
- SGD (Stochastic Gradient Descent)
- update gradients per one data + MGSD (mini-batch gradient descent)
- Momentum
- velocity term keep going weight’s previous gradient direction
- AdaGrad
- sum of gradient squared
- very slow convergence…
- RMS-prop
- exponential moving average
- Adam
- RMS-prop + Momentum
- SGD is better than Adam?
- Adaptive methods (Adam, RMS-prop) is worse to generalize than non-adaptive methods(SGD, Momentum)
- Overfitting
- When train the model using training dataset, model is fitted on the training Data domain
- In training dataset, the loss is very slow
- But when the model predicts new unseen dataset, poor performance
- Underfitting
- The model did not train well in training dataset
- few dataset, inappropriate hyperparameters…
Regularization
- Overfitted model has large value of weights. → Use regularization term in Loss function to penalize the large weights.
- Make small value weights in model
- Adds a penalty to the error function.
- L1 Regularization
- Penalty term is the sum of absolute values of weights.
- L2 Regularization
- Penalty term is the sum of squared values of weights.
Dropout
- Drop out some weights randomly in training process
- It prevents some weights are blased and has big values
- Not use Drop out in test(inference) mode
Batch normalization
- Depending on the mean, variance of input batch data, returns stable distribution values. (mean: 0, std: 1)
- Make stable input values before activation funcation.