devskim blog

Search

🔦

Convolutional Neural Networks

sections

머신러닝&딥러닝

Tags

DeepLearing

Created

Jun 11, 2023 02:49 PM

Last Updated

Jul 30, 2023 09:49 AM

Convolution Fully Connected Layer Activation Shallow CNN Training Regularization Dropout Batch normalization

Convolution

Convolutional matrix, matrix is small (3x3, 5x5)

Extract features of input images

Kernal function in Image processing

Ridge detection, Sharpen, Box blur

Padding

Add zero values in boundary of input image

Stride

Elements of sliding window of convolution kernel

Sliding Window

MAC

Multiply Accumlation operation

IM2COL

Transform n-dimmenstion data into 2D matrix data
more efficient operation

GEMM

General Matrix to Matrix Multiplication

Pooling

Resize the feature map
Reduce the resolution of feature map
Max pooling, Average pooling

Fully Connected Layer

Reshape 2D feature maps from 2D to 1D

All weights are matching with each feature map pixel

Activation

sigmoid

tabh

ReLU

LeakyReLU

Shallow CNN

Shallow Neural Network

Backpropagation in Shallow Neural Network

Max pooling backpropagation

Training

Feed forward & backward → weights are updated to fit training data

Gradient descent method

Optimizer

the goal of gradient descent is usually to minimize the loss fuction for a machine learning problem
SGD (Stochastic Gradient Descent)

update gradients per one data + MGSD (mini-batch gradient descent)

Momentum

velocity term keep going weight’s previous gradient direction

AdaGrad

sum of gradient squared
very slow convergence…

RMS-prop

exponential moving average

Adam

RMS-prop + Momentum

SGD is better than Adam?

Adaptive methods (Adam, RMS-prop) is worse to generalize than non-adaptive methods(SGD, Momentum)

Overfitting

When train the model using training dataset, model is fitted on the training Data domain
In training dataset, the loss is very slow
But when the model predicts new unseen dataset, poor performance

Underfitting

The model did not train well in training dataset
few dataset, inappropriate hyperparameters…

Regularization

Overfitted model has large value of weights. → Use regularization term in Loss function to penalize the large weights.

Make small value weights in model

Adds a penalty to the error function.

L1 Regularization

Penalty term is the sum of absolute values of weights.

L2 Regularization

Penalty term is the sum of squared values of weights.

Dropout

Drop out some weights randomly in training process

It prevents some weights are blased and has big values

Not use Drop out in test(inference) mode

Batch normalization

Depending on the mean, variance of input batch data, returns stable distribution values. (mean: 0, std: 1)

Make stable input values before activation funcation.

PREVObject Detection

NEXTImage Classification, Pytorch