devskim blog

Search

🎯

Object Detection

sections

머신러닝&딥러닝

Tags

DeepLearing

Created

Jun 11, 2023 03:20 PM

Last Updated

Jul 30, 2023 09:49 AM

Object Detection How to find the object Loss function Terms One-stage object detectors Two-stage object detectors Evaluation metric YOLOv3

Object Detection

Object classification

What is the object in an image
Classifier → output: class_score[K, N]

K: The number of predicted objects
N: The number of class

Object localization

What and where is the “single” object in an image
Regressor → output: [K+4]

K: The number of predicted objects
N: The number of class

Object detection

What and where is the “multiple” object in an image
How to detect multiple object? Sliding Window method

Multiple object localization
Computation cost is very high. (Inefficient)
Overfeat paper

Classifier → output: class_score[K, N]
Regressor → output: bounding box offset[K, 4]

Network architecture of Classification and Object Detection

Final layer shape is different
Classification: [1, 1, class_num]
Object detection: [H, W, class_num + box_offset + confidence] (one stage)

One stage detection & Two stage detection

One-stage dection

SSD, YOLO, …
Backbone

Feature extractor
Extract features from input image
The deeper the layer, the more abstract the feature maps

Neck

Merge the different resolution feature maps
Concatenate/add different scale feature maps

Dense Prediction

Predict score of object and bounding box
Regression layers

Two-stage dection

Faster-RCNN, …
More accurate than one-stage, but high computation cost
1st forward

Get the object candidate regions

2nd forward

Classify the object in region proposals

How to find the object

Grid

the header layer’s final feature map
If feature map size is [13, 13], the grid size is [13, 13]
Predict objects in each grid cell

Anchor

The detector which is predict single bounding box
Predict one object per anchor
Pre-defined bounding box shape
In a grid cell, there are serveral anchors

Bouding box

objectness score

Object or not

Class score

cat or dog or car …

Loss function

Softmax

Cross-entropy loss

MSE loss, MAE loss

Terms

IOU

Interection over union
The mertic of how well predicts the bounding box compared with GT box
ex) IOU > 0.5, positive box. Otherwise, negative box

NMS

Non-Maximum Suppression
Filtering the best predicted boxes using IOU and confidense score

Data annotaion

Draw bounding box on the object and labeling its class
Object detection dataset
One image, one GT

GT : Ground Truth, annotating the object information(bounding box, class etc)

Training & Evaluation

Training set: using when training the model
Evaluation set: using when evaluating the trained model
Why evaluate the trained model?

Training the model using training set, the model fitted on the training set domain
Evaluation set generalizes the model, prevents “overfitting” the model

Tools

labelImg
Yolo_Label

One-stage object detectors

YOLO

You Look Only Once
Joseph Redmon (2015)
YOLOv1, YOLOv2, v3, …, v5, YOLOR
Backbone

GoogleLenet(YOLOv1), Darknet19, Darknet53

SSD

Single Shot Detectors
Wei Liu (2015)
Backbone

VGG-16

Two-stage object detectors

RCNN

Selective Search, Rule-based region proposals
200 region proposals
Proposals resize 227*227
Bouding box: regression
Class: SVM classifier

Fast - RCNN

ROI pooling
Not SVM classifier, Softmax regression
Bouding box: regression
Not Selective search, Region proposal Network

Evaluation metric

Precision (정밀도)

the ratio of true positive (true predictions) (TP) and the total number of predicted positives (total predictions)

Recall (재현율)

the ratio of true positive (true predictions) and the total of ground truth positives (total number of cars)

Terms

TP(True Posivite) : True prediction
FP(False Positive) : False prediction
FN(False Negative) : not predict GT
TN(True Negative) : not predict non-GT

F1 score

precision과 recall의 조화평균
high recall + high precision

the class is perfectly handled by the model

low recall + high precision

the model can’t detect the class well but is highly trustable when it does

high recall + low precision

the class is well detected but the model also include points of other classes in it

low recall + low precision

the class if poorly handled by the model

PR curve

Depending on the object confidence threshold, draw the PR curve

mAP

Mean average precision, PR curve area

Confusion matrix

Mesuarement of True positive and False positive between classes

YOLOv3

Architecture

notion image

Configure file

hyperparameters
Model architecture
Data augmentation
Input resolution

Annotaion labeling format

YOLO labeling format

Class_index x_center y_center width height
x y w h is normalized value [0, 1]

PREV오픈 데이터셋, 라벨링, 증강

NEXTConvolutional Neural Networks