Object DetectionHow to find the objectLoss functionTermsOne-stage object detectorsTwo-stage object detectorsEvaluation metricYOLOv3
Object Detection
- Object classification
- What is the object in an image
- Classifier → output: class_score[K, N]
- K: The number of predicted objects
- N: The number of class
- Object localization
- What and where is the “single” object in an image
- Regressor → output: [K+4]
- K: The number of predicted objects
- N: The number of class
- Object detection
- What and where is the “multiple” object in an image
- How to detect multiple object? Sliding Window method
- Multiple object localization
- Computation cost is very high. (Inefficient)
- Overfeat paper
- Classifier → output: class_score[K, N]
- Regressor → output: bounding box offset[K, 4]
- Network architecture of Classification and Object Detection
- Final layer shape is different
- Classification: [1, 1, class_num]
- Object detection: [H, W, class_num + box_offset + confidence] (one stage)
- One stage detection & Two stage detection
- One-stage dection
- SSD, YOLO, …
- Backbone
- Feature extractor
- Extract features from input image
- The deeper the layer, the more abstract the feature maps
- Neck
- Merge the different resolution feature maps
- Concatenate/add different scale feature maps
- Dense Prediction
- Predict score of object and bounding box
- Regression layers
- Two-stage dection
- Faster-RCNN, …
- More accurate than one-stage, but high computation cost
- 1st forward
- Get the object candidate regions
- 2nd forward
- Classify the object in region proposals
How to find the object
- Grid
- the header layer’s final feature map
- If feature map size is [13, 13], the grid size is [13, 13]
- Predict objects in each grid cell
- Anchor
- The detector which is predict single bounding box
- Predict one object per anchor
- Pre-defined bounding box shape
- In a grid cell, there are serveral anchors
- Bouding box
- objectness score
- Object or not
- Class score
- cat or dog or car …
Loss function
- Softmax
- Cross-entropy loss
- MSE loss, MAE loss
Terms
- IOU
- Interection over union
- The mertic of how well predicts the bounding box compared with GT box
- ex) IOU > 0.5, positive box. Otherwise, negative box
- NMS
- Non-Maximum Suppression
- Filtering the best predicted boxes using IOU and confidense score
- Data annotaion
- Draw bounding box on the object and labeling its class
- Object detection dataset
- One image, one GT
- GT : Ground Truth, annotating the object information(bounding box, class etc)
- Training & Evaluation
- Training set: using when training the model
- Evaluation set: using when evaluating the trained model
- Why evaluate the trained model?
- Training the model using training set, the model fitted on the training set domain
- Evaluation set generalizes the model, prevents “overfitting” the model
- Tools
- labelImg
- Yolo_Label
One-stage object detectors
- YOLO
- You Look Only Once
- Joseph Redmon (2015)
- YOLOv1, YOLOv2, v3, …, v5, YOLOR
- Backbone
- GoogleLenet(YOLOv1), Darknet19, Darknet53
- SSD
- Single Shot Detectors
- Wei Liu (2015)
- Backbone
- VGG-16
Two-stage object detectors
- RCNN
- Selective Search, Rule-based region proposals
- 200 region proposals
- Proposals resize 227*227
- Bouding box: regression
- Class: SVM classifier
- Fast - RCNN
- ROI pooling
- Not SVM classifier, Softmax regression
- Bouding box: regression
- Not Selective search, Region proposal Network
Evaluation metric
- Precision (정밀도)
- the ratio of true positive (true predictions) (TP) and the total number of predicted positives (total predictions)
- Recall (재현율)
- the ratio of true positive (true predictions) and the total of ground truth positives (total number of cars)
- Terms
- TP(True Posivite) : True prediction
- FP(False Positive) : False prediction
- FN(False Negative) : not predict GT
- TN(True Negative) : not predict non-GT
- F1 score
- precision과 recall의 조화평균
- high recall + high precision
- the class is perfectly handled by the model
- low recall + high precision
- the model can’t detect the class well but is highly trustable when it does
- high recall + low precision
- the class is well detected but the model also include points of other classes in it
- low recall + low precision
- the class if poorly handled by the model
- PR curve
- Depending on the object confidence threshold, draw the PR curve
- mAP
- Mean average precision, PR curve area
- Confusion matrix
- Mesuarement of True positive and False positive between classes
YOLOv3
- Architecture
- Configure file
- hyperparameters
- Model architecture
- Data augmentation
- Input resolution
- Annotaion labeling format
- YOLO labeling format
- Class_index x_center y_center width height
- x y w h is normalized value [0, 1]