Jeonghun (James) Lee: Custom Object Detection SSD / Faster RCNN 실행 및 분석 (3차분석)

11/14/2019

Custom Object Detection SSD / Faster RCNN 실행 및 분석 (3차분석)

1. Tensorflow 및 Custom Object Detection 위한 준비

Object Detection을 위한 준비를 위해서 아래와 같이 설치를 진행한다.

Tensorflow 설치를 진행
필요 Python Package / 필요 Package 설치진행
Model을 Download하여 진행

NVIDIA Docker 및 SSD Traning 2차분석
https://ahyuo79.blogspot.com/2019/10/docker-tensorflow.html

NVIDIA Docker 및 Tensorflow 기본 사용법
https://ahyuo79.blogspot.com/2019/10/nvidia-docker.html

IOU 기능

https://ahyuo79.blogspot.com/2019/09/iou-intersection-over-union.html

Tensorflow Model
https://github.com/tensorflow/models

Tensorflow Model Branch 확인
Tensorflow의 Version에 Model source의 branch 변경하여 download
https://github.com/tensorflow/models/branches

1.1 Tensorflow 직접설치 및 설정

Tensorflow Object Detection를 사용하기 위해서는 아래와 같이 먼저 Tensorflow를 설치하고, 이후에 Object Detection Model을 Download와 관련 Package 설치한다.

Custom Object Detection 설치가이드
  https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html

1.2 General Tensorflow Docker 이용

Tensorflow Docker 기반으로 아래의 Model version을 Download하여 하나의 Image로 생성후 이를 진행하자.
이때 주의해야한 것은 Tags의 정보와 Tensorflow의 Version 일 것 같다.

Docker의 Tag 의미
  https://www.tensorflow.org/install/docker?hl=ko

Tensorflow Docker
Tensorflow version 과 상위의 model version을 같이 맞추도록하자
  https://hub.docker.com/r/tensorflow/tensorflow
  https://hub.docker.com/r/tensorflow/tensorflow/tags

1.3 NVIDIA Tensorflow Docker 이용

기존에 NVIDIA Tensorflow Docker를 설치하였던 것으로 이용 Object Detection을 사용가능.

NVIDIA Tensorflow Docker
  https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow/tags

이전에 NVIDIA SSD Docker 관련분석 참조
  https://ahyuo79.blogspot.com/2019/10/docker-tensorflow.html

2. Custom Data SET 구성

우선 다들 개와 고양이 사진으로 기본적으로 Custom DATA SET를 만들어 테스트를 진행하기에 나도 역시 쉽게 할수 있는 방법으로 시작

개와 고양이 사진 구하기 (DATASET)

$ cd ~/works/custom
$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download
$ python google_images_download/google_images_download.py --keywords "dogs" --size medium --output_directory ~/works/custom/data/
$ python google_images_download/google_images_download.py --keywords "cats" --size medium --output_directory ~/works/custom/data/

google image download 구할 수 있는 이미지들은 현재 제한적이며, 최대 100개까지 download가 가능하다.
옵션에서 limit를 100이상을 늘려도 한번에 100개이상의 image를 구할 수 없다.

google_image_download
https://google-images-download.readthedocs.io/en/latest/installation.html

google_image_download argument
https://google-images-download.readthedocs.io/en/latest/arguments.html

Image 정리 및 구성

$ cd ~/works/custom/data/
$ mkdir images        // Image들을 한곳정리  
$ mkdir annotation    // LableImg의 XML 저장장소 
$ mv ./dogs/*.jpg images/
$ mv ./cats/*.jpg images/

Annotation (LabelImg 사용, PascalVOC저장 )

$ cd ~/works/custom/labelImg    // labelImg 이미 이전에 설치됨
$ cat data/predefined_classes.txt   // Default Class 확인(개,고양이 있음), 만약 이름이 없다면, 새로생성 
dog
person
cat
tv
car
meatballs
marinara sauce
tomato soup
chicken noodle soup
french onion soup
chicken breast
ribs
pulled pork
hamburger

$ python3 labelImg.py   ~/works/custom/data/images    // images 안에 같이 xml 저장

주의사항
lableImg 실행 후 XML저장위치를 반드시 Change Save Dir ~/works/custom/data/annotation 설정
상위 정의 된 class의 순서가 달라도 상관 없지만 상위 이름과 label_map.pbtxt의 이름만 동일하면 된다.

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#annotating-images

Label_map 정의

$ cd ~/works/custom/data
$ vi label_map.pbtxt
item {
    id: 1
    name: 'cat'
}

item {
    id: 2
    name: 'dog'
}

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#creating-label-map

1.1 TF Record File 생성

다른블로그 혹은 Tensorflow 예제 사이트를 보면 XML->CSV 후 변환 CSV->TFRecord 로 변환하도록 하는데,
다른 소스들을 간단히 분석해보면 TF Record 작업은 거의 비슷한데, 왜 두번을 해야하는지 이해를 못해 아래와 같이 직접 변경시도

TF Record 만드는 법

현재 이방식으로 진행을 하지 않음
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#creating-tensorflow-records

TF_RECORD 생성

tf_record는 반드시 Tensorflow가 설치된 상태에서 실행가능

 root@3aac229c45c3:/workdir/models/research# pip install lxml
## 이전처럼 --data-dir path 주의 
root@3aac229c45c3:/workdir/models/research# python create_pascal_tf_record.py \
 --data_dir=/data \
 --annotations_dir=/data/annotation \
 --label_map_path=/data/label_map.pbtxt \
 --output_path=/data/pascal.record

Lablelimg TF Record 생성방법
https://ahyuo79.blogspot.com/2019/11/coco-set-annotation-tools.html

2. Custom Training/Evolution

Custom Model을 두개를 이용하여 테스트를 해보고 비교

2.1 Pre-trained Model Download

SSD (Single Shot MultiBox Detector)는 Feature extractor 용으로 별도의 Network를 구성해서 사용하고 있는데, 그 부분을 Download하여 기본구성을 갖춘다.

check 기본구성

$ cd ~/works/custom/check
$ mkdir -p models/configs
$ mkdir -p models/resnet_v1_50_2016_08_28
$ mkdir -p train_resnet                  //SSD-Resnet50   의  Checkpoint directory (Training 후 생성됨)
$ mkdir -p train_inception               //SSD-Inceptionv2 의  checkpoint directory (Training 후 생성됨)
$ mkdir -p fasterrcnn_train_resnet       //Faster RCNN-Resnet50 의 checkpoint directory (Training 후 생성됨)

Resnet 50 Download

$ cd ~/works/custom/check/models
$ wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
$ tar -xzf resnet_v1_50_2016_08_28.tar.gz
$ mv resnet_v1_50.ckpt resnet_v1_50_2016_08_28/model.ckpt

InceptionV2 Download

$ cd ~/works/custom/check/models
$ wget http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_11_06_2017.tar.gz
$ tar -xzf ssd_inception_v2_coco_11_06_2017.tar.gz

Pre-Trained model 정보
https://github.com/tensorflow/models/tree/master/research/slim

check 의 model 구성

$ cd ~/works/custom/check/models
$ tree 
.
├── configs                          // Pipeline Config 저장장소 (Resnet , Inceptionv2 ) 
├── resnet_v1_50_2016_08_28          // Resnet 50 (Pre-trained Model)
│   └── model.ckpt                   // checkpoint   
├── resnet_v1_50_2016_08_28.tar.gz
├── ssd_inception_v2_coco_11_06_2017    // Inception V2 (Pre-trained Model)
│   ├── frozen_inference_graph.pb          // Inception Pb file 
│   ├── graph.pbtxt                        // Inception Graph 구성 
│   ├── model.ckpt.data-00000-of-00001     // checkpoint
│   ├── model.ckpt.index
│   └── model.ckpt.meta
└── ssd_inception_v2_coco_11_06_2017.tar.gz

2.2 SSD / Faster RCNN Pipeline 설정

SSD의 경우 feature extractor로 Resnet 50 와 Inception V2 로 사용가능하며, 다른 Network로도 구성가능하다.
그리고, Pipleline의 Field들은 *.proto 에 선언이 되어있어야 동작이 가능한 것 같다.
나중에 시간이 된다면 면밀히 다시 봐야할 것 같다.

Docker Container 실행

$ docker run --gpus all --rm -it \
--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8888:8888 -p 6006:6006  \
-v /home/jhlee/works/custom/data:/data \
-v /home/jhlee/works/custom/check:/checkpoints \
--ipc=host \
--name nvidia_ssd \
nvidia_ssd

SSD-Resnet 50 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp configs/ssd320_full_1gpus.config  /checkpoints/models/configs
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/ssd320_full_1gpus.config 

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: true
    num_classes: 2    # label 갯수 (Cat/Dog) 
    box_coder {
      faster_rcnn_box_coder {   
        y_scale: 10.0              
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5    ## 테스트시, output_dict['detection_scores']가 0.5 이상인것만 
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
...
    image_resizer {            # 
      fixed_shape_resizer {
        height: 320
        width: 320
      }
    }
....

    feature_extractor {
      type: 'ssd_resnet50_v1_fpn'  # SSD의 feature extractor를 resnet 50 사용 
      fpn {
        min_level: 3
        max_level: 7
      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.0004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {                    ## post process 설정확인 
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100    ## Class당 100개설정         output_dict['detection_classes'] 
        max_total_detections: 100        ## Max detection 100개 설정  output_dict['num_detections']
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/checkpoints/models/resnet_v1_50_2016_08_28/model.ckpt"
  fine_tune_checkpoint_type: "classification"
  batch_size: 2            # OUT OF MEMORY 문제로 32->2 변경, GPU Memory가 많다면 그대로  
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 100         # steps 100000 -> 1000  (간단히 테스트용으로 변경, 실제 Training은 원래대로 )
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
....


train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # train TF Record 
  } 
  label_map_path: "/data/label_map.pbtxt" # label_map.pbtxt
}

eval_config: {
  #metrics_set: "coco_detection_metrics"
  #use_moving_averages: false
  num_examples: 8000   # eval 하지 않을 것이므로, 그대로 유지 
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # 현재 eval을 위한 tfrecord가 별도로 없음(Training과 동일하게 설정) 
  }
  label_map_path: "/data/label_map.pbtxt" # 설정만 변경 추후 
  shuffle: false
  num_readers: 1
}

Faster RCNN-Resnet 50 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp ./object_detection/samples/configs/faster_rcnn_resnet50_coco.config  /checkpoints/models/configs
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/faster_rcnn_resnet50_coco.config
model {
  faster_rcnn {
    num_classes: 2     # label 갯수 90->2 (Cat/Dog) 
    image_resizer {                 #  
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet50'     ## Resnet 50 사용확인 
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }

....
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6       ## IOU threhold 도 조절가능  
        max_detections_per_class: 100      ## 이전과 동일하게 Post Processing으로 Class당 Max 100개 
        max_total_detections: 300          ## 이전과 다르게 MAX 300 설정됨 
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}


train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/checkpoints/models/resnet_v1_50_2016_08_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 300          ### 전체 Step 수 200000->300 (임시테스트를 위해 변경)
  data_augmentation_options { 
    random_horizontal_flip {
    }
  }
}

....  
###  상위 SSD와 동일 

train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # train TF Record 
  } 
  label_map_path: "/data/label_map.pbtxt" # label_map.pbtxt
}

eval_config: {
  num_examples: 8000                                  ## evalution 
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # 설정만 변경 추후 eval을 사용할 경우 다시 변경 
  }
  label_map_path: "/data/label_map.pbtxt" # 설정만 변경 추후 
  shuffle: false
  num_readers: 1
}

Faster RCNN Precision FP32로 변경해서 실행해야하며, 현재 optimaizer 부분이 문제가 있다.
일단 Training은 되지만 관련부분을 자세히 볼 필요가 있다.

SSD Inception v2 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp ./object_detection/samples/configs/ssd_inception_v2_coco.config  /checkpoints/models/configs 
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/ssd_inception_v2_coco.config 

model {
  ssd {
    num_classes: 2   ## Lable Number , label_map.pbtxt 참조 
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5   ## 테스트시, output_dict['detection_scores']가 0.5 이상인것만 
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }

..........

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }

..........
    feature_extractor {
      type: 'ssd_inception_v2'    # SSD의 feature_extractor를 Inception_v2로 사용 
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6                 ## IOU Threshold 
        max_detections_per_class: 100  ## Class당 100개설정         output_dict['detection_classes'] 
        max_total_detections: 100      ## Max detection 100개 설정  output_dict['num_detections']
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 6     ## 24 -> 6  나의 경우 GPU 성능문제로 변경 
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/checkpoints/ssd_inception_v2_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 1000      ## 20000 -> 1000   랩탑에서 조금만 테스트하기 위해 변경 
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"          ## Train Record
  }
  label_map_path: "/data/label_map.pbtxt"      ## Train Labelmap
}

eval_config: {
  num_examples: 8000                                  ## evalution 
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"                 ## evalution의 test record
  }
  label_map_path: "/data/label_map.pbtxt"             ## evalution의 label map 
  shuffle: false
  num_readers: 1
}

세부 분석은 이전의 SSD 분석참조

eval_config 의 num_example
https://github.com/tensorflow/models/issues/5059
https://stackoverflow.com/questions/47086630/what-does-num-examples-2000-mean-in-tensorflow-object-detection-config-file

2.3 SSD / Faster RCNN Training

NVIDIA에서는 쉽게 Training 할 수 있도록 Shell Script로 쉽게 설정하였다. SSD의 경우 Precision을 FP16으로 사용하고 있지만,
Faster RCNN은 FP16으로 하면 에러가 발생하므로 주의해야한다.
간단히 Shell Script 내부를 보면 ./object_detection/model_main.py를 이용하여 실행하므로 이것으로 직접 실행해도 무방하다

Training Shell Script 수정 및 기본분석

root@1bfb89078878:/workdir/models/research# vi ./examples/SSD320_FP16_1GPU.sh     //Pipeline Config 부분 확인 및 수정 
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
### Pipeline 추가하고 Resnet50 or InceptionV2 중 선택사용 
## SSD-Resnet 50 Pipleline  (FP16지원, 기본설정 )
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

## SSD-Inception v2 Pipeline  (FP16지원, 추가설정)
#PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd_inception_v2_coco.config"

## Fastter RCNN-Resnet 50 Pipleline (FP32로만 사용)
#PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/faster_rcnn_resnet50_coco.config"

#FP16 PRESCISON MODE로 설정 (FP32로 설정시 주석처리) 
export TF_ENABLE_AUTO_MIXED_PRECISION=1


TENSOR_OPS=0
export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

time python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
       "${@:3}"

상위에서 본인 이 사용하고 싶은 Pipeline 을 정하고 아래와 같이 실행

SSD-Resnet 50 Training

root@f46c490016e0:/workdir/models/research#  bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/train_resnet /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path
..........

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/train_resnet

Fast-RCNN-Resnet 50 Training

root@f46c490016e0:/workdir/models/research#  bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/fasterrcnn_train_resnet /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path
..........

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/fasterrcnn_train_resnet

SSD-Inception V2 Training

root@1bfb89078878:/workdir/models/research# bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/train_inception /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/train_inception

Training 후 생성된 CheckPoint File 확인 (e.g SSD-Resnet50)

root@f46c490016e0:/workdir/models/research# ls /checkpoints/train_resnet/
checkpoint                                   graph.pbtxt                       model.ckpt-0.index                  model.ckpt-300.data-00001-of-00002
eval                                         model.ckpt-0.data-00000-of-00002  model.ckpt-0.meta                   model.ckpt-300.index
events.out.tfevents.1574317170.7bdf29dc41cb  model.ckpt-0.data-00001-of-00002  model.ckpt-300.data-00000-of-00002  model.ckpt-300.meta

TF_ENABLE_AUTO_MIXED_PRECISION 관련내용
https://medium.com/tensorflow/automatic-mixed-precision-in-tensorflow-for-faster-ai-training-on-nvidia-gpus-6033234b2540

2.4 SSD validation/evaluation

Training 중 일부를 사용한다고 하며, Training 중 검증을 하기 위해서 사용한다고 하는데, 정확한 설정과 관련부분을 이해 해야 할 것 같다.

Shell script 수정

root@f46c490016e0:/workdir/models/research# vi examples/SSD320_evaluate.sh  //아래와 같이 pipeline 설정 
CHECKPINT_DIR=$1

TENSOR_OPS=0
export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

## Resnet or Inception 선택 
python object_detection/model_main.py --checkpoint_dir $CHECKPINT_DIR --model_dir /results --run_once --pipeline_config_path /checkpoints/models/configs/ssd320_full_1gpus.config

# python object_detection/model_main.py --checkpoint_dir $CHECKPINT_DIR --model_dir /results --run_once --pipeline_config_path /checkpoints/models/configs/ssd_inception_v2_coco.config

validation 실행

root@f46c490016e0:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints/train_resnet 
or 
root@f46c490016e0:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints/train_inception

상위 결과를 Tensorboard로 확인하고자 하면, 아래의 위치로 변경해서 확인

root@f46c490016e0:/workdir/models/research# ls /results/eval/    // /result/eval Tensorboard Log 생성 
events.out.tfevents.1574322912.74244b7e90c7

2.5 Training 과 Validation 기본분석

Training 과 Validation 명령어는 아래의 명령어로 동일하며, 현재 생각으로는 Training 만 해도 Validation도 같이 동작되는 것으로 생각이 된다.
그리고, pipeline config에 이미 관련 옵션을 설정을 했기 때문에 validation도 진행을 하는 것으로 생각하며,

이유는 Training 만 돌려도 Tensorboard의 Validation Log까지 나오는 것으로 봐도 그렇다.

이전의 Validation 전용 명령어는 --run_once를 넣어 eval-only 한번 돌리는 것 뿐인 것 같다.

root@f46c490016e0:/workdir/models/research# python object_detection/model_main.py -h

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Binary to run train and evaluation on object detection model.
flags:

object_detection/model_main.py:
  --[no]allow_xla: Enable XLA compilation
    (default: 'false')
  --checkpoint_dir: Path to directory holding a checkpoint.  If `checkpoint_dir` is provided, this binary operates in eval-only mode, writing
    resulting metrics to `model_dir`.
  --eval_count: How many times the evaluation should be run
    (default: '1')
    (an integer)
  --[no]eval_training_data: If training data should be evaluated for this job. Note that one call only use this in eval-only mode, and
    `checkpoint_dir` must be supplied.
    (default: 'false')
  --hparams_overrides: Hyperparameter overrides, represented as a string containing comma-separated hparam_name=value pairs.
  --model_dir: Path to output model directory where event and checkpoint files will be written.
  --num_train_steps: Number of train steps.
    (an integer)
  --pipeline_config_path: Path to pipeline config file.
  --[no]run_once: If running in eval-only mode, whether to run just one round of eval vs running continuously (default).
    (default: 'false')
  --sample_1_of_n_eval_examples: Will sample one of every n eval input examples, where n is provided.
    (default: '1')
    (an integer)
  --sample_1_of_n_eval_on_train_examples: Will sample one of every n train input examples for evaluation, where n is provided. This is only used if
    `eval_training_data` is True.
    (default: '5')
    (an integer)

root@f46c490016e0:/workdir/models/research# vi python object_detection/model_main.py 
..........
  if FLAGS.checkpoint_dir:
    if FLAGS.eval_training_data:    ## 기본이 FALSE
      name = 'training_data'
      input_fn = eval_on_train_input_fn   
    else:
      name = 'validation_data'     ## name은 이것으로 설정 
      # The first eval input will be evaluated.
      input_fn = eval_input_fns[0]
    if FLAGS.run_once:             ## validation 할 경우 이곳만 실행 
      estimator.evaluate(input_fn,
                         steps=None,
                         checkpoint_path=tf.train.latest_checkpoint(
                             FLAGS.checkpoint_dir))
    else:                          ##  Training 할 경우 이곳 실행 
      model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
                                train_steps, name)  
.........

이외에도 간단한 training 하는 명령어가 존재하며, 그것을 사용해도 상관 없다.

3. Inference (chpt -> pb)

Training 이 종료가 되면 아래와 같이 최종 Inference를 위해서 pb파일로 변경
파이프라인의 step의 숫자에 따라 checkpoint 파일명은 달라지므로, 본인의 설정에 따라 아래 명령도 변경

SSD-Resnet 50 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/ssd320_full_1gpus.config \
    --trained_checkpoint_prefix  /checkpoints/train_resnet/model.ckpt-100 \
    --output_directory /checkpoints/train_resnet/inference_graph_100

SSD-Inception V2 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/ssd_inception_v2_coco.config \
    --trained_checkpoint_prefix  /checkpoints/train_inception/model.ckpt-100 \
    --output_directory /checkpoints/train_inception/inference_graph_100

Faster RCNN-Resnet 50 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/faster_rcnn_resnet50_coco.config \
    --trained_checkpoint_prefix  /checkpoints/fasterrcnn_train_resnet/model.ckpt-100 \
    --output_directory /checkpoints/fasterrcnn_train_resnet/inference_graph_100

4. Tensorboard 로 확인

Training or Validation 이 종료된 후 Tensorflow의 Log를 분석
Training에 관련된 부분만 분석

Tensorboard

root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/train_resnet  // SSD-Resnet50
or 
root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/train_inception     //SSD-Inceptionv2  
or
root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/fasterrcnn_train_resnet     //Faster RCNN-Resnet50

Tensorboard Browser 연결

http://localhost:6006/

5. Object Detection TEST

jupyter를 이용하여 상위에서 만들어진 pb파일을 이용하여 Test Image를 준비하고 관련 소스를 수정하여 최종 테스트를 진행하자

root@f46c490016e0:/workdir/models/research# jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root

Jupyter 연결

http://localhost:8888/

object_detection/object_detection_tutorial.ipynb 를 실행하여 검증

5.1 object_detection_tutorial.ipynb 수정사항

현재 inference 한 pb파일을 가지고 object_detection/object_detection_tutorial.ipynb 에서 소스를 수정하여 가볍게 테스트가 가능하다.

Download 미실행하며, Variables 의 수정

MODEL_NAME = '/checkpoints/train_resnet/inference_graph_100'
PATH_TO_LABELS = os.path.join('/data', 'label_map.pbtxt')
PATH_TO_TEST_IMAGES_DIR = '/data/test_images'

기존의 소스는 Download를 진행하여 Pre-trained 된 모델을 바로 이용하는 것이지만, 이를 우리가 inference한 것으로 변경하고
TEST Image하여 테스트를 진행하자

GPU Memory 문제발생시 추가

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

Allocator (GPU_0_bfc) ran out of memory trying to allocate
https://eehoeskrap.tistory.com/360

5.2 object_detection_tutorial.ipynb 기본소스 이해

이 소스의 중요 포인트는 run_inference_for_single_image 이며 이곳에서 나온 출력 값을 test 이미지에 적용하여 테스트해보는 것이다.

아래의 key in에 있는 정보들은 반드시 상위 정의된 pipeline config와 연동이 되며, 이 부분을 알아두도록하자. (SSD기준)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])       ### 현재 Pipeline에서 100으로 정의해서 항상 100개를 찾음 
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)                                ### 내가 정의한 label_map.pbtxt 기준으로 100개를 찾음 
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]          ### bbox의 
      output_dict['detection_scores'] = output_dict['detection_scores'][0]        ### 100개의 각각의 Confidence를 알수 있지만, 화면에 표시되는 것은 Threshold값이 넘은 것들 
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,                                          ## image output
      output_dict['detection_boxes'],      ## bbox의  정보배열 100개  (4개의 정보)  ymin/ymax/xmin/ymax = box * height/ box * width
      output_dict['detection_classes'],    ## class  정보배열 100개   (1,2 )
      output_dict['detection_scores'],     ## confidence 정보배열 100개 (0.5 이상만표시)
      category_index,                                    ## 상위 내가 정의한 label_map.pbtxt 정보 
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)                                  ## line의 두께설정 
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  plt.title(image_path)

 # print("num_detections",output_dict['num_detections'])         ### Training이 되어 Max 100개 를 찾음 (상위 SSD Pipeline 부분 참조)
 # print("detection_boxes",output_dict['detection_boxes'])       ### 찾은 100개 배열 의 box의 위치 
 # print("detection_classes",output_dict['detection_classes'])   ### 찾은 100개 배열 의 class 1 or 2 (현재 1,2만 선언)
 # print("detection_scores",output_dict['detection_scores'])     ### 찾은 100개 배열 의 confidence 이며 pipeline의 threshold 값 이상인 것만 화면 표시 

  for i,v in enumerate (output_dict['detection_scores']):   ### i : index  v: list의 member 
      if v > 0.5:                                           ### 100의 중에 0.5가 넘는 것만 표시 
        print("  - class-name:", category_index.get(output_dict['detection_classes'][i]).get('name') )   ### category_index는 상위 정의된 lable_map.pbtxt 적용하여 이름을 출력          
        print("  - confidence: ",v * 100 )                  ### percent로 변경

6. 결론

SSD / Faster RCNN은 기본적으로 잘동작하고 있지만, 나의 랩탑에서 간단한 테스트는 가능하지만,

STEPS를 늘려 최종 테스트를 하는것은 힘들어서 Server에서 돌렸다.
(Laptop에서 문제가 발생하는 것은 거의 GPU Memory관련 문제였음)
Laptop에서는 GPU Memory를 항상 봐야하며, 한계가 있으며, Server 다르게 동작하므로 주의하도록 하자.

그리고, Transfer Learning 과 Fine Tuning은 개인적으로 지인의 일때문에, 한 달간 진행했지만, 좀 더 하면 금방익숙해 질거라고 본다.

나중에 기회가 되면 다시한번해보지만, 너무 어렵게 생각할 필요 없다.