devskim blog
Search
🎯

Object Detection

Tags
DeepLearing
Created
Jun 11, 2023 03:20 PM
Last Updated
Jul 30, 2023 09:49 AM
 
 

Object Detection

  • Object classification
    • What is the object in an image
    • Classifier → output: class_score[K, N]
      • K: The number of predicted objects
      • N: The number of class
  • Object localization
    • What and where is the “single” object in an image
    • Regressor → output: [K+4]
      • K: The number of predicted objects
      • N: The number of class
  • Object detection
    • What and where is the “multiple” object in an image
    • How to detect multiple object? Sliding Window method
      • Multiple object localization
      • Computation cost is very high. (Inefficient)
      • Overfeat paper
    • Classifier → output: class_score[K, N]
    • Regressor → output: bounding box offset[K, 4]
  • Network architecture of Classification and Object Detection
    • Final layer shape is different
    • Classification: [1, 1, class_num]
    • Object detection: [H, W, class_num + box_offset + confidence] (one stage)
  • One stage detection & Two stage detection
    • One-stage dection
      • SSD, YOLO, …
      • Backbone
        • Feature extractor
        • Extract features from input image
        • The deeper the layer, the more abstract the feature maps
      • Neck
        • Merge the different resolution feature maps
        • Concatenate/add different scale feature maps
      • Dense Prediction
        • Predict score of object and bounding box
        • Regression layers
    • Two-stage dection
      • Faster-RCNN, …
      • More accurate than one-stage, but high computation cost
      • 1st forward
        • Get the object candidate regions
      • 2nd forward
        • Classify the object in region proposals
       

How to find the object

  • Grid
    • the header layer’s final feature map
    • If feature map size is [13, 13], the grid size is [13, 13]
    • Predict objects in each grid cell
  • Anchor
    • The detector which is predict single bounding box
    • Predict one object per anchor
    • Pre-defined bounding box shape
    • In a grid cell, there are serveral anchors
  • Bouding box
  • objectness score
    • Object or not
  • Class score
    • cat or dog or car …
 

Loss function

  • Softmax
  • Cross-entropy loss
  • MSE loss, MAE loss
 

Terms

  • IOU
    • Interection over union
    • The mertic of how well predicts the bounding box compared with GT box
    • ex) IOU > 0.5, positive box. Otherwise, negative box
  • NMS
    • Non-Maximum Suppression
    • Filtering the best predicted boxes using IOU and confidense score
  • Data annotaion
    • Draw bounding box on the object and labeling its class
    • Object detection dataset
    • One image, one GT
      • GT : Ground Truth, annotating the object information(bounding box, class etc)
    • Training & Evaluation
      • Training set: using when training the model
      • Evaluation set: using when evaluating the trained model
      • Why evaluate the trained model?
        • Training the model using training set, the model fitted on the training set domain
        • Evaluation set generalizes the model, prevents “overfitting” the model
    • Tools
      • labelImg
      • Yolo_Label
 

One-stage object detectors

  • YOLO
    • You Look Only Once
    • Joseph Redmon (2015)
    • YOLOv1, YOLOv2, v3, …, v5, YOLOR
    • Backbone
      • GoogleLenet(YOLOv1), Darknet19, Darknet53
  • SSD
    • Single Shot Detectors
    • Wei Liu (2015)
    • Backbone
      • VGG-16
 

Two-stage object detectors

  • RCNN
    • Selective Search, Rule-based region proposals
    • 200 region proposals
    • Proposals resize 227*227
    • Bouding box: regression
    • Class: SVM classifier
  • Fast - RCNN
    • ROI pooling
    • Not SVM classifier, Softmax regression
    • Bouding box: regression
    • Not Selective search, Region proposal Network
 

Evaluation metric

  • Precision (정밀도)
    • the ratio of true positive (true predictions) (TP) and the total number of predicted positives (total predictions)
  • Recall (재현율)
    • the ratio of true positive (true predictions) and the total of ground truth positives (total number of cars)
  • Terms
    • TP(True Posivite) : True prediction
    • FP(False Positive) : False prediction
    • FN(False Negative) : not predict GT
    • TN(True Negative) : not predict non-GT
  • F1 score
    • precision과 recall의 조화평균
    • high recall + high precision
      • the class is perfectly handled by the model
    • low recall + high precision
      • the model can’t detect the class well but is highly trustable when it does
    • high recall + low precision
      • the class is well detected but the model also include points of other classes in it
    • low recall + low precision
      • the class if poorly handled by the model
  • PR curve
    • Depending on the object confidence threshold, draw the PR curve
  • mAP
    • Mean average precision, PR curve area
  • Confusion matrix
    • Mesuarement of True positive and False positive between classes
 

YOLOv3

  • Architecture
    • notion image
  • Configure file
    • hyperparameters
    • Model architecture
    • Data augmentation
    • Input resolution
  • Annotaion labeling format
    • YOLO labeling format
      • Class_index x_center y_center width height
      • x y w h is normalized value [0, 1]
 
PREV오픈 데이터셋, 라벨링, 증강
NEXTConvolutional Neural Networks