NVIDIA Tensorflow DeepLearning Example은 현재 Github에서 제공을 해주고 있으며 각각의 아래의 사이트에서 확인을 하자.
- Github NVIDIA DeepLearning SSD 사이트 확인
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Detection/SSD
상위 소스를 이용하여 쉽게 NVIDIA Tensorflow Object Detection SSD Docker 구성이 가능하며, 테스트도 가능하다.
- Github 기타 DeepLearning Example
https://github.com/NVIDIA/DeepLearningExamples
- 기타 참고 사이트
NVIDIA에서 제공해주는 각 Framework 별 Training 기능소개
https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html
Tensorflow의 사이트의 Tensorflow Guide
https://www.tensorflow.org/tutorials?hl=ko
1.1 NVIDIA SSD Docker Quick Guide
README.md 문서를 참고하며 아래와 같이 실행하면 쉽게 Docker를 이용하여 Object Detection 의 SSD Model를 쉽게 Training 이 가능하다. (COCOSET 기반)
- Quick Guide 1. Clone the repository
$ git clone https://github.com/NVIDIA/DeepLearningExamples $ cd DeepLearningExamples/TensorFlow/Detection/SSD
- Quick Guide 2. Build the SSD320 v1.2 TensorFlow NGC container.
$ docker build . -t nvidia_ssd
상위와 같이 실행하면 dockerfile을 기반으로 새로운 Docker Image 생성
이외에도 Docker commit를 이용하여 docker에서 직접 Image 생성도 가능
- Quick Guide 3. Download and preprocess the dataset. (COCO 2017)
$ ./download_all.sh nvidia_ssd /home/jhlee/works/ssd/data /home/jhlee/works/ssd/check
- Quick Guide 4. Launch the NGC container to run training/inference.
$ nvidia-docker run --rm -it \ --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ -v /home/jhlee/works/ssd/data:/data/coco2017_tfrecords \ -v /home/jhlee/works/ssd/check:/checkpoints \ --ipc=host \ nvidia_ssd
- Quick Guide 5. Start training.
root@c7550d6b2c59:/workdir/models/research# bash ./examples/SSD320_FP16_1GPU.sh /checkpoints
- Quick Guide 6. Start validation/evaluation.
root@c7550d6b2c59:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints
2. Object Detection의 SSD 실행 및 분석
2.1 Quick Guide 1~2 의 실행 및 분석
nvcr.io/nvidia/tensorflow:19.05-py3 기반으로 필요한 Package를 설치한 후 새로운 Docker Image를 생성하는 과정이다.
- HOST에서 직접 아래와 같이 실행
$ cd ~/works $ mkdir ssd $ cd ssd $ mkdir data $ mkdir check $ git clone https://github.com/NVIDIA/DeepLearningExamples $ cd DeepLearningExamples/TensorFlow/Detection/SSD $ ls configs Dockerfile download_all.sh examples img models NOTICE README.md requirements.txt
- Docker Image 생성
$ docker build . -t nvidia_ssd // Dockerfile기반으로 Image 생성
- 상위 Dockerfile 분석 및 이해
$ pwd /home/jhlee/works/ssd/DeepLearningExamples/TensorFlow/Detection/SSD $ cat Dockerfile FROM nvcr.io/nvidia/tensorflow:19.05-py3 as base FROM base as sha RUN mkdir /sha RUN cat `cat HEAD | cut -d' ' -f2` > /sha/repo_sha FROM base as final WORKDIR /workdir RUN PROTOC_VERSION=3.0.0 && \ PROTOC_ZIP=protoc-${PROTOC_VERSION}-linux-x86_64.zip && \ curl -OL https://github.com/google/protobuf/releases/download/v$PROTOC_VERSION/$PROTOC_ZIP && \ unzip -o $PROTOC_ZIP -d /usr/local bin/protoc && \ rm -f $PROTOC_ZIP COPY requirements.txt . RUN pip install Cython RUN pip install -r requirements.txt WORKDIR models/research/ COPY models/research/ . RUN protoc object_detection/protos/*.proto --python_out=. ENV PYTHONPATH="/workdir/models/research/:/workdir/models/research/slim/:$PYTHONPATH" COPY examples/ examples COPY configs/ configs/ COPY download_all.sh download_all.sh COPY --from=sha /sha .
- Google Protocol Buffer
이부분 정보를 자세히 설명해주셔서 감사하다
https://bcho.tistory.com/1182
Tensorboard 관련내용 과 실제사용
https://itnext.io/how-to-use-tensorboard-5d82f8654496
https://pythonkim.tistory.com/39
https://bcho.tistory.com/1182
- DockerFile
책으로도 나왔으며, 쉽게 Dockerfile 생성 및 사용법을 알수 있음
http://pyrasis.com/docker.html
상위 현재위치 File
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Detection/SSD
nvidia_ssd 를 위해서 none 과 nvcr.io/nvidia/tensorflow:19.05-py3 가 필요
2.2 Quick Guide 3 실행 및 분석
Quick Guide 3은 CoCoDataSET을 Download하고 이 기반으로 TF Record format을 만드는 작업이다.
여기서 주역할은 COCOSET Download와 이 기반으로 TF Record를 생성이다.
HOST 실행
Shell Script 분석을 위해 아래와 같이 간단히 Container 실행하여 분석
상위에서 dataset_tools/create_coco_tf_record.py 를 이용하여 tf record format를 생성
만약 DATASET이 변경되면, dataset_tools를 참조
Preparing Inputs (다른 SET의 설정을 알수 있음)
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/preparing_inputs.md
TFRecord 와 TF Example
https://www.tensorflow.org/tutorials/load_data/tfrecord
http://images.cocodataset.org/annotations/annotations_trainval2017.zip
http://images.cocodataset.org/zips/val2017.zip
http://images.cocodataset.org/zips/test2017.zip
2.3 Quick Guide 4~5 실행 및 분석
우선 Docker의 Conatiner를 아래와 같이 실행한 후 Tensorflow Training Shell Script으로 Training을 진행을 한다.
나의 경우는 GPU가 하나이므로 Training부분이 아주 느리다
NVIDIA Docker는 추후 사라질게 될것 같으며, 아래와 같이 Docker로도 실행이 가능하지만, nvidia-toolkit을 반드시 설치해야한다.
관련부분은 이전부분 확인
https://ahyuo79.blogspot.com/2019/10/nvidia-docker.html
다양한 config 파일을 확인하고 싶다면, object_detection/samples/configs 에서 확인을 하자
pre-trained 모델은 resnet_v150 기준으로 동작을 하므로 관련된 기능을 알아두자
https://medium.com/coinmonks/modelling-transfer-learning-using-tensorflows-object-detection-model-on-mac-692c8609be40
https://devtalk.nvidia.com/default/topic/1049371/tensorrt/how-to-visualize-tf-trt-graphs-with-tensorboard-/
2.4 Quick Guide 6 실행 및 분석
Training을 한 후 Validation 하는 부분으로 보정의 역할을 하는 것 같은데, 정확한 역할은 Tensorflow와 DataSet의 기본구조를 알아야 할 것 같다.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md
2.5 생성된 최종 Check Point 파일 확인
NVIDIA Docker를 이용하여 최종적으로 만들어지는 파일은 Checkpoint File이며, PB파일은 본인이 직접 만들어야하고, Inference도 역시 이 기반으로 해봐야 할 것 같다.
현재 chekpoint가 아래와 같이 model.ckpt-0 와 model.ckpt-100000로 구성됨
Checkpoint 이해필요
https://eehoeskrap.tistory.com/343
https://eehoeskrap.tistory.com/370
https://eehoeskrap.tistory.com/344
https://gusrb.tistory.com/21
http://jaynewho.com/post/8
2.6 CheckPoint 를 PB Format으로 변환
Inference를 위해서 아래와 같이 PB파일로 변환
TFRecord 와 TF Example
https://www.tensorflow.org/tutorials/load_data/tfrecord
Exporting a trained model for inference
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
2.7 Tensorboard 로 테스트 진행
resnet_v1_50_2016_08_28.tar.gz
https://github.com/tensorflow/models/tree/master/research/slim
Images 관련 부분 소스
관련내용정리
tf.estimator.ModeKeys.TRAIN
tf.estimator.ModeKeys.EVAL
tf.estimator.ModeKeys.PREDICT
http://pyrasis.com/docker.html
상위 현재위치 File
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Detection/SSD
- 생성된 Docker Image 확인
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE nvidia_ssd latest ab529215f717 5 minutes ago 6.97GB none none a6bc644c75ed 6 minutes ago 6.96GB //nvidia_ssd를 만들면서 생기는 image nvcr.io/nvidia/tensorflow 19.08-py3 be978d32a5c3 8 weeks ago 7.35GB nvcr.io/nvidia/cuda 10.1-cudnn7-devel-ubuntu18.04 0ead98c22e04 8 weeks ago 3.67GB nvidia/cuda 9.0-devel 2a64416134d8 8 weeks ago 2.03GB nvcr.io/nvidia/cuda 10.1-devel-ubuntu18.04 946e78c7b298 8 weeks ago 2.83GB nvidia/cuda 10.1-base a5f5d3b655ca 8 weeks ago 106MB nvcr.io/nvidia/tensorflow 19.05-py3 01c8c4b0d7ff 5 months ago 6.96GB
nvidia_ssd 를 위해서 none 과 nvcr.io/nvidia/tensorflow:19.05-py3 가 필요
2.2 Quick Guide 3 실행 및 분석
Quick Guide 3은 CoCoDataSET을 Download하고 이 기반으로 TF Record format을 만드는 작업이다.
- CoCoDATA Set Download 와 TF Record 생성 (download.sh)
여기서 주역할은 COCOSET Download와 이 기반으로 TF Record를 생성이다.
- /data/coco2017_tfrecords : COCOSET의 DATA 저장장소 및 TF Record 저장장소
- /checkpoints : Tensorflow의 checkpoint 파일로 이 부분은 별도로 알아보자.
HOST 실행
$ ./download_all.sh nvidia_ssd /home/jhlee/works/ssd/data /home/jhlee/works/ssd/check
$ cat ./download_all.sh // 기본분석 이전과 거의 유사하지만, 아래의 Container에서 Shell을 실행 ,이 부분 기존의 DATASET Download하는 부분으로 변경 if [ -z $1 ]; then echo "Docker container name is missing" && exit 1; fi ## 1st ARG : CONTAINER NAME ## 2nd ARG : BASE PATH /data/coco2017_tfrecords ## 3st ARG : BASE PATH /checkpoints CONTAINER=$1 COCO_DIR=${2:-"/data/coco2017_tfrecords"} CHECKPOINT_DIR=${3:-"/checkpoints"} mkdir -p $COCO_DIR chmod 777 $COCO_DIR # Download backbone checkpoint mkdir -p $CHECKPOINT_DIR chmod 777 $CHECKPOINT_DIR cd $CHECKPOINT_DIR wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz tar -xzf resnet_v1_50_2016_08_28.tar.gz mkdir -p resnet_v1_50 mv resnet_v1_50.ckpt resnet_v1_50/model.ckpt ## nvidia-docker/docker로 사용가능하며, 아래의 Script는 반드시 Docker Container에서 실행과동시에 bash script 실행 ## docker 내부의 download_and_preprocess_mscoco.sh 에 의해 COCOSET 2017 Download 후 아래와 같이 TFRecords 생성 nvidia-docker run --rm -it -u 123 -v $COCO_DIR:/data/coco2017_tfrecords $CONTAINER bash -c ' # Create TFRecords bash /workdir/models/research/object_detection/dataset_tools/download_and_preprocess_mscoco.sh \ /data/coco2017_tfrecords'
- download_and_preprocess_mscoco.sh 분석
- COCOSET2017 Download (Annotation 부분포함)
- DataSET 기반으로 TFRecord 생성
Shell Script 분석을 위해 아래와 같이 간단히 Container 실행하여 분석
$ nvidia-docker run --rm -it -u 123 -v $HOME/works/ssd/data:/data/coco2017_tfrecords nvidia_ssd ================ == TensorFlow == ================ NVIDIA Release 19.05 (build 6390160) TensorFlow Version 1.13.1 ..... I have no name!@a4891a3ac177:/workdir/models/research$ cat object_detection/dataset_tools/download_and_preprocess_mscoco.sh #!/bin/bash set -e if [ -z "$1" ]; then echo "usage download_and_preprocess_mscoco.sh [data dir]" exit fi if [ "$(uname)" == "Darwin" ]; then UNZIP="tar -xf" else UNZIP="unzip -nq" fi # Create the output directories. OUTPUT_DIR="${1%/}" SCRATCH_DIR="${OUTPUT_DIR}/raw-data" mkdir -p "${OUTPUT_DIR}" mkdir -p "${SCRATCH_DIR}" CURRENT_DIR=$(pwd) # Helper function to download and unpack a .zip file. function download_and_unzip() { local BASE_URL=${1} local FILENAME=${2} if [ ! -f ${FILENAME} ]; then echo "Downloading ${FILENAME} to $(pwd)" wget -nd -c "${BASE_URL}/${FILENAME}" else echo "Skipping download of ${FILENAME}" fi echo "Unzipping ${FILENAME}" ${UNZIP} ${FILENAME} } cd ${SCRATCH_DIR} ## 말 그래도 cocoset의 Download하는데, 필요한 Image들이 많다 ## (이 부분은 DATASET을 자세히 알아봐야겠다.) # Download the images. BASE_IMAGE_URL="http://images.cocodataset.org/zips" TRAIN_IMAGE_FILE="train2017.zip" download_and_unzip ${BASE_IMAGE_URL} ${TRAIN_IMAGE_FILE} TRAIN_IMAGE_DIR="${SCRATCH_DIR}/train2017" VAL_IMAGE_FILE="val2017.zip" download_and_unzip ${BASE_IMAGE_URL} ${VAL_IMAGE_FILE} VAL_IMAGE_DIR="${SCRATCH_DIR}/val2017" TEST_IMAGE_FILE="test2017.zip" download_and_unzip ${BASE_IMAGE_URL} ${TEST_IMAGE_FILE} TEST_IMAGE_DIR="${SCRATCH_DIR}/test2017" ## Annotation 부분을 Download하는데, 보면 종류가 꽤 되는데, 이 부분 역시 DATASET의 역할을 알아야겠다. # Download the annotations. BASE_INSTANCES_URL="http://images.cocodataset.org/annotations" INSTANCES_FILE="annotations_trainval2017.zip" download_and_unzip ${BASE_INSTANCES_URL} ${INSTANCES_FILE} # # Train 과 Validation 에는 annotations 중에서 instances_train2017.json / instances_val2017.json 만 사용 # TRAIN_ANNOTATIONS_FILE="${SCRATCH_DIR}/annotations/instances_train2017.json" VAL_ANNOTATIONS_FILE="${SCRATCH_DIR}/annotations/instances_val2017.json" # Download the test image info. BASE_IMAGE_INFO_URL="http://images.cocodataset.org/annotations" IMAGE_INFO_FILE="image_info_test2017.zip" download_and_unzip ${BASE_IMAGE_INFO_URL} ${IMAGE_INFO_FILE} # # TEST시에는 annotations 중에서 image_info_test-dev2017.json 사용 # TESTDEV_ANNOTATIONS_FILE="${SCRATCH_DIR}/annotations/image_info_test-dev2017.json" # Build TFRecords of the image data. cd "${CURRENT_DIR}" python object_detection/dataset_tools/create_coco_tf_record.py \ --logtostderr \ --include_masks \ --train_image_dir="${TRAIN_IMAGE_DIR}" \ --val_image_dir="${VAL_IMAGE_DIR}" \ --test_image_dir="${TEST_IMAGE_DIR}" \ --train_annotations_file="${TRAIN_ANNOTATIONS_FILE}" \ --val_annotations_file="${VAL_ANNOTATIONS_FILE}" \ --testdev_annotations_file="${TESTDEV_ANNOTATIONS_FILE}" \ --output_dir="${OUTPUT_DIR}"
상위에서 dataset_tools/create_coco_tf_record.py 를 이용하여 tf record format를 생성
만약 DATASET이 변경되면, dataset_tools를 참조
Preparing Inputs (다른 SET의 설정을 알수 있음)
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/preparing_inputs.md
root@c7550d6b2c59:/workdir/models/research# ls object_detection/dataset_tools/
__init__.py create_kitti_tf_record.py create_pascal_tf_record.py create_pycocotools_package.sh oid_hierarchical_labels_expansion_test.py tf_record_creation_util.py
create_coco_tf_record.py create_kitti_tf_record_test.py create_pascal_tf_record_test.py download_and_preprocess_mscoco.sh oid_tfrecord_creation.py tf_record_creation_util_test.py
create_coco_tf_record_test.py create_oid_tf_record.py create_pet_tf_record.py oid_hierarchical_labels_expansion.py oid_tfrecord_creation_test.py
## 아래를 보면 COCO의 Annotation은 JSON 형태 사용
root@c7550d6b2c59:/workdir/models/research# cat object_detection/dataset_tools/create_coco_tf_record.py
r"""Convert raw COCO dataset to TFRecord for object_detection.
Please note that this tool creates sharded output files.
Example usage:
python create_coco_tf_record.py --logtostderr \
--train_image_dir="${TRAIN_IMAGE_DIR}" \
--val_image_dir="${VAL_IMAGE_DIR}" \
--test_image_dir="${TEST_IMAGE_DIR}" \
--train_annotations_file="${TRAIN_ANNOTATIONS_FILE}" \
--val_annotations_file="${VAL_ANNOTATIONS_FILE}" \
--testdev_annotations_file="${TESTDEV_ANNOTATIONS_FILE}" \
--output_dir="${OUTPUT_DIR}"
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import hashlib
import io
import json
import os
import contextlib2
import numpy as np
import PIL.Image
from pycocotools import mask
import tensorflow as tf
from object_detection.dataset_tools import tf_record_creation_util
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util
flags = tf.app.flags
tf.flags.DEFINE_boolean('include_masks', False,
'Whether to include instance segmentations masks '
'(PNG encoded) in the result. default: False.')
tf.flags.DEFINE_string('train_image_dir', '',
'Training image directory.')
tf.flags.DEFINE_string('val_image_dir', '',
'Validation image directory.')
tf.flags.DEFINE_string('test_image_dir', '',
'Test image directory.')
tf.flags.DEFINE_string('train_annotations_file', '',
'Training annotations JSON file.')
tf.flags.DEFINE_string('val_annotations_file', '',
'Validation annotations JSON file.')
tf.flags.DEFINE_string('testdev_annotations_file', '',
'Test-dev annotations JSON file.')
tf.flags.DEFINE_string('output_dir', '/tmp/', 'Output data directory.')
FLAGS = flags.FLAGS
tf.logging.set_verbosity(tf.logging.INFO)
def create_tf_example(image,
annotations_list,
image_dir,
category_index,
include_masks=False):
"""Converts image and annotations to a tf.Example proto.
Args:
image: dict with keys:
[u'license', u'file_name', u'coco_url', u'height', u'width',
u'date_captured', u'flickr_url', u'id']
annotations_list:
list of dicts with keys:
[u'segmentation', u'area', u'iscrowd', u'image_id',
u'bbox', u'category_id', u'id']
Notice that bounding box coordinates in the official COCO dataset are
given as [x, y, width, height] tuples using absolute coordinates where
x, y represent the top-left (0-indexed) corner. This function converts
to the format expected by the Tensorflow Object Detection API (which is
which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
to image size).
image_dir: directory containing the image files.
category_index: a dict containing COCO category information keyed
by the 'id' field of each category. See the
label_map_util.create_category_index function.
include_masks: Whether to include instance segmentations masks
(PNG encoded) in the result. default: False.
Returns:
example: The converted tf.Example
num_annotations_skipped: Number of (invalid) annotations that were ignored.
Raises:
ValueError: if the image pointed to by data['filename'] is not a valid JPEG
"""
image_height = image['height']
image_width = image['width']
filename = image['file_name']
image_id = image['id']
full_path = os.path.join(image_dir, filename)
with tf.gfile.GFile(full_path, 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = PIL.Image.open(encoded_jpg_io)
key = hashlib.sha256(encoded_jpg).hexdigest()
xmin = []
xmax = []
ymin = []
ymax = []
is_crowd = []
category_names = []
category_ids = []
area = []
encoded_mask_png = []
num_annotations_skipped = 0
for object_annotations in annotations_list:
(x, y, width, height) = tuple(object_annotations['bbox'])
if width <= 0 or height <= 0:
num_annotations_skipped += 1
continue
if x + width > image_width or y + height > image_height:
num_annotations_skipped += 1
continue
xmin.append(float(x) / image_width)
xmax.append(float(x + width) / image_width)
ymin.append(float(y) / image_height)
ymax.append(float(y + height) / image_height)
is_crowd.append(object_annotations['iscrowd'])
category_id = int(object_annotations['category_id'])
category_ids.append(category_id)
category_names.append(category_index[category_id]['name'].encode('utf8'))
area.append(object_annotations['area'])
if include_masks:
run_len_encoding = mask.frPyObjects(object_annotations['segmentation'],
image_height, image_width)
binary_mask = mask.decode(run_len_encoding)
if not object_annotations['iscrowd']:
binary_mask = np.amax(binary_mask, axis=2)
pil_image = PIL.Image.fromarray(binary_mask)
output_io = io.BytesIO()
pil_image.save(output_io, format='PNG')
encoded_mask_png.append(output_io.getvalue())
feature_dict = {
'image/height':
dataset_util.int64_feature(image_height),
'image/width':
dataset_util.int64_feature(image_width),
'image/filename':
dataset_util.bytes_feature(filename.encode('utf8')),
'image/source_id':
dataset_util.bytes_feature(str(image_id).encode('utf8')),
'image/key/sha256':
dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded':
dataset_util.bytes_feature(encoded_jpg),
'image/format':
dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin':
dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax':
dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin':
dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax':
dataset_util.float_list_feature(ymax),
'image/object/class/text':
dataset_util.bytes_list_feature(category_names),
'image/object/is_crowd':
dataset_util.int64_list_feature(is_crowd),
'image/object/area':
dataset_util.float_list_feature(area),
}
if include_masks:
feature_dict['image/object/mask'] = (
dataset_util.bytes_list_feature(encoded_mask_png))
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
return key, example, num_annotations_skipped
def _create_tf_record_from_coco_annotations(
annotations_file, image_dir, output_path, include_masks, num_shards):
"""Loads COCO annotation json files and converts to tf.Record format.
Args:
annotations_file: JSON file containing bounding box annotations.
image_dir: Directory containing the image files.
output_path: Path to output tf.Record file.
include_masks: Whether to include instance segmentations masks
(PNG encoded) in the result. default: False.
num_shards: number of output file shards.
"""
with contextlib2.ExitStack() as tf_record_close_stack, \
tf.gfile.GFile(annotations_file, 'r') as fid:
output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
tf_record_close_stack, output_path, num_shards)
groundtruth_data = json.load(fid)
images = groundtruth_data['images']
category_index = label_map_util.create_category_index(
groundtruth_data['categories'])
annotations_index = {}
if 'annotations' in groundtruth_data:
tf.logging.info(
'Found groundtruth annotations. Building annotations index.')
for annotation in groundtruth_data['annotations']:
image_id = annotation['image_id']
if image_id not in annotations_index:
annotations_index[image_id] = []
annotations_index[image_id].append(annotation)
missing_annotation_count = 0
for image in images:
image_id = image['id']
if image_id not in annotations_index:
missing_annotation_count += 1
annotations_index[image_id] = []
tf.logging.info('%d images are missing annotations.',
missing_annotation_count)
total_num_annotations_skipped = 0
for idx, image in enumerate(images):
if idx % 100 == 0:
tf.logging.info('On image %d of %d', idx, len(images))
annotations_list = annotations_index[image['id']]
_, tf_example, num_annotations_skipped = create_tf_example(
image, annotations_list, image_dir, category_index, include_masks)
total_num_annotations_skipped += num_annotations_skipped
shard_idx = idx % num_shards
output_tfrecords[shard_idx].write(tf_example.SerializeToString())
tf.logging.info('Finished writing, skipped %d annotations.',
total_num_annotations_skipped)
def main(_):
assert FLAGS.train_image_dir, '`train_image_dir` missing.'
assert FLAGS.val_image_dir, '`val_image_dir` missing.'
assert FLAGS.test_image_dir, '`test_image_dir` missing.'
assert FLAGS.train_annotations_file, '`train_annotations_file` missing.'
assert FLAGS.val_annotations_file, '`val_annotations_file` missing.'
assert FLAGS.testdev_annotations_file, '`testdev_annotations_file` missing.'
if not tf.gfile.IsDirectory(FLAGS.output_dir):
tf.gfile.MakeDirs(FLAGS.output_dir)
train_output_path = os.path.join(FLAGS.output_dir, 'coco_train.record')
val_output_path = os.path.join(FLAGS.output_dir, 'coco_val.record')
testdev_output_path = os.path.join(FLAGS.output_dir, 'coco_testdev.record')
_create_tf_record_from_coco_annotations(
FLAGS.train_annotations_file,
FLAGS.train_image_dir,
train_output_path,
FLAGS.include_masks,
num_shards=100)
_create_tf_record_from_coco_annotations(
FLAGS.val_annotations_file,
FLAGS.val_image_dir,
val_output_path,
FLAGS.include_masks,
num_shards=10)
_create_tf_record_from_coco_annotations(
FLAGS.testdev_annotations_file,
FLAGS.test_image_dir,
testdev_output_path,
FLAGS.include_masks,
num_shards=100)
if __name__ == '__main__':
tf.app.run()
- TF Record 생성확인
- coco_train.record :Pipeline에서 설정
- coco_val.record : Pipeline에서 설정
- coco_testdev.record : 현재 사용하는 지 미확인
root@4b038f3383f2:/workdir/models/research# ls /data/coco2017_tfrecords/ annotation coco_testdev.record-00035-of-00100 coco_testdev.record-00071-of-00100 coco_train.record-00007-of-00100 coco_train.record-00043-of-00100 coco_train.record-00079-of-00100 coco_testdev.record-00000-of-00100 coco_testdev.record-00036-of-00100 coco_testdev.record-00072-of-00100 coco_train.record-00008-of-00100 coco_train.record-00044-of-00100 coco_train.record-00080-of-00100 coco_testdev.record-00001-of-00100 coco_testdev.record-00037-of-00100 coco_testdev.record-00073-of-00100 coco_train.record-00009-of-00100 coco_train.record-00045-of-00100 coco_train.record-00081-of-00100 coco_testdev.record-00002-of-00100 coco_testdev.record-00038-of-00100 coco_testdev.record-00074-of-00100 coco_train.record-00010-of-00100 coco_train.record-00046-of-00100 coco_train.record-00082-of-00100 coco_testdev.record-00003-of-00100 coco_testdev.record-00039-of-00100 coco_testdev.record-00075-of-00100 coco_train.record-00011-of-00100 coco_train.record-00047-of-00100 coco_train.record-00083-of-00100 coco_testdev.record-00004-of-00100 coco_testdev.record-00040-of-00100 coco_testdev.record-00076-of-00100 coco_train.record-00012-of-00100 coco_train.record-00048-of-00100 coco_train.record-00084-of-00100 coco_testdev.record-00005-of-00100 coco_testdev.record-00041-of-00100 coco_testdev.record-00077-of-00100 coco_train.record-00013-of-00100 coco_train.record-00049-of-00100 coco_train.record-00085-of-00100 coco_testdev.record-00006-of-00100 coco_testdev.record-00042-of-00100 coco_testdev.record-00078-of-00100 coco_train.record-00014-of-00100 coco_train.record-00050-of-00100 coco_train.record-00086-of-00100 coco_testdev.record-00007-of-00100 coco_testdev.record-00043-of-00100 coco_testdev.record-00079-of-00100 coco_train.record-00015-of-00100 coco_train.record-00051-of-00100 coco_train.record-00087-of-00100 coco_testdev.record-00008-of-00100 coco_testdev.record-00044-of-00100 coco_testdev.record-00080-of-00100 coco_train.record-00016-of-00100 coco_train.record-00052-of-00100 coco_train.record-00088-of-00100 coco_testdev.record-00009-of-00100 coco_testdev.record-00045-of-00100 coco_testdev.record-00081-of-00100 coco_train.record-00017-of-00100 coco_train.record-00053-of-00100 coco_train.record-00089-of-00100 coco_testdev.record-00010-of-00100 coco_testdev.record-00046-of-00100 coco_testdev.record-00082-of-00100 coco_train.record-00018-of-00100 coco_train.record-00054-of-00100 coco_train.record-00090-of-00100 coco_testdev.record-00011-of-00100 coco_testdev.record-00047-of-00100 coco_testdev.record-00083-of-00100 coco_train.record-00019-of-00100 coco_train.record-00055-of-00100 coco_train.record-00091-of-00100 coco_testdev.record-00012-of-00100 coco_testdev.record-00048-of-00100 coco_testdev.record-00084-of-00100 coco_train.record-00020-of-00100 coco_train.record-00056-of-00100 coco_train.record-00092-of-00100 coco_testdev.record-00013-of-00100 coco_testdev.record-00049-of-00100 coco_testdev.record-00085-of-00100 coco_train.record-00021-of-00100 coco_train.record-00057-of-00100 coco_train.record-00093-of-00100 coco_testdev.record-00014-of-00100 coco_testdev.record-00050-of-00100 coco_testdev.record-00086-of-00100 coco_train.record-00022-of-00100 coco_train.record-00058-of-00100 coco_train.record-00094-of-00100 coco_testdev.record-00015-of-00100 coco_testdev.record-00051-of-00100 coco_testdev.record-00087-of-00100 coco_train.record-00023-of-00100 coco_train.record-00059-of-00100 coco_train.record-00095-of-00100 coco_testdev.record-00016-of-00100 coco_testdev.record-00052-of-00100 coco_testdev.record-00088-of-00100 coco_train.record-00024-of-00100 coco_train.record-00060-of-00100 coco_train.record-00096-of-00100 coco_testdev.record-00017-of-00100 coco_testdev.record-00053-of-00100 coco_testdev.record-00089-of-00100 coco_train.record-00025-of-00100 coco_train.record-00061-of-00100 coco_train.record-00097-of-00100 coco_testdev.record-00018-of-00100 coco_testdev.record-00054-of-00100 coco_testdev.record-00090-of-00100 coco_train.record-00026-of-00100 coco_train.record-00062-of-00100 coco_train.record-00098-of-00100 coco_testdev.record-00019-of-00100 coco_testdev.record-00055-of-00100 coco_testdev.record-00091-of-00100 coco_train.record-00027-of-00100 coco_train.record-00063-of-00100 coco_train.record-00099-of-00100 coco_testdev.record-00020-of-00100 coco_testdev.record-00056-of-00100 coco_testdev.record-00092-of-00100 coco_train.record-00028-of-00100 coco_train.record-00064-of-00100 coco_val.record-00000-of-00010 coco_testdev.record-00021-of-00100 coco_testdev.record-00057-of-00100 coco_testdev.record-00093-of-00100 coco_train.record-00029-of-00100 coco_train.record-00065-of-00100 coco_val.record-00001-of-00010 coco_testdev.record-00022-of-00100 coco_testdev.record-00058-of-00100 coco_testdev.record-00094-of-00100 coco_train.record-00030-of-00100 coco_train.record-00066-of-00100 coco_val.record-00002-of-00010 coco_testdev.record-00023-of-00100 coco_testdev.record-00059-of-00100 coco_testdev.record-00095-of-00100 coco_train.record-00031-of-00100 coco_train.record-00067-of-00100 coco_val.record-00003-of-00010 coco_testdev.record-00024-of-00100 coco_testdev.record-00060-of-00100 coco_testdev.record-00096-of-00100 coco_train.record-00032-of-00100 coco_train.record-00068-of-00100 coco_val.record-00004-of-00010 coco_testdev.record-00025-of-00100 coco_testdev.record-00061-of-00100 coco_testdev.record-00097-of-00100 coco_train.record-00033-of-00100 coco_train.record-00069-of-00100 coco_val.record-00005-of-00010 coco_testdev.record-00026-of-00100 coco_testdev.record-00062-of-00100 coco_testdev.record-00098-of-00100 coco_train.record-00034-of-00100 coco_train.record-00070-of-00100 coco_val.record-00006-of-00010 coco_testdev.record-00027-of-00100 coco_testdev.record-00063-of-00100 coco_testdev.record-00099-of-00100 coco_train.record-00035-of-00100 coco_train.record-00071-of-00100 coco_val.record-00007-of-00010 coco_testdev.record-00028-of-00100 coco_testdev.record-00064-of-00100 coco_train.record-00000-of-00100 coco_train.record-00036-of-00100 coco_train.record-00072-of-00100 coco_val.record-00008-of-00010 coco_testdev.record-00029-of-00100 coco_testdev.record-00065-of-00100 coco_train.record-00001-of-00100 coco_train.record-00037-of-00100 coco_train.record-00073-of-00100 coco_val.record-00009-of-00010 coco_testdev.record-00030-of-00100 coco_testdev.record-00066-of-00100 coco_train.record-00002-of-00100 coco_train.record-00038-of-00100 coco_train.record-00074-of-00100 raw-data coco_testdev.record-00031-of-00100 coco_testdev.record-00067-of-00100 coco_train.record-00003-of-00100 coco_train.record-00039-of-00100 coco_train.record-00075-of-00100 coco_testdev.record-00032-of-00100 coco_testdev.record-00068-of-00100 coco_train.record-00004-of-00100 coco_train.record-00040-of-00100 coco_train.record-00076-of-00100 coco_testdev.record-00033-of-00100 coco_testdev.record-00069-of-00100 coco_train.record-00005-of-00100 coco_train.record-00041-of-00100 coco_train.record-00077-of-00100 coco_testdev.record-00034-of-00100 coco_testdev.record-00070-of-00100 coco_train.record-00006-of-00100 coco_train.record-00042-of-00100 coco_train.record-00078-of-00100
TFRecord 와 TF Example
https://www.tensorflow.org/tutorials/load_data/tfrecord
- 상위에서 사용되어지는 COCO DATASET 2017
http://images.cocodataset.org/annotations/annotations_trainval2017.zip
http://images.cocodataset.org/zips/val2017.zip
http://images.cocodataset.org/zips/test2017.zip
- Cocodata Set 관련내용 재확인
2.3 Quick Guide 4~5 실행 및 분석
우선 Docker의 Conatiner를 아래와 같이 실행한 후 Tensorflow Training Shell Script으로 Training을 진행을 한다.
나의 경우는 GPU가 하나이므로 Training부분이 아주 느리다
NVIDIA Docker는 추후 사라질게 될것 같으며, 아래와 같이 Docker로도 실행이 가능하지만, nvidia-toolkit을 반드시 설치해야한다.
관련부분은 이전부분 확인
https://ahyuo79.blogspot.com/2019/10/nvidia-docker.html
- 설치된 NGC Version2로 실행
$ nvidia-docker run --rm -it \ --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ -p 8888:8888 -p 6006:6006 \ -v /home/jhlee/works/ssd/data:/data/coco2017_tfrecords \ -v /home/jhlee/works/ssd/check:/checkpoints \ --ipc=host \ --name nvidia_ssd \ nvidia_ssd
- docker로 변경하여 실행 (nvidia-docker2 미사용할 경우)
- Tensorboard/Jupyter Port mapping 추가
- name 설정하여 쉽게 찾기
$ docker run --gpus all --rm -it \ --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ -p 8888:8888 -p 6006:6006 \ -v /home/jhlee/works/ssd/data:/data/coco2017_tfrecords \ -v /home/jhlee/works/ssd/check:/checkpoints \ --ipc=host \ --name nvidia_ssd \ nvidia_ssd
- Training 부분 실행 및 분석
root@c7550d6b2c59:/workdir/models/research# bash ./examples/SSD320_FP16_1GPU.sh /checkpoints
root@c7550d6b2c59:/workdir/models/research# cat examples/SSD320_FP16_1GPU.sh CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"} PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config" export TF_ENABLE_AUTO_MIXED_PRECISION=1 TENSOR_OPS=0 export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS} export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS} export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS} time python -u ./object_detection/model_main.py \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ --model_dir=${CKPT_DIR} \ --alsologtostder \ "${@:3}"
- Configuring the Object Detection Training Pipeline
다양한 config 파일을 확인하고 싶다면, object_detection/samples/configs 에서 확인을 하자
pre-trained 모델은 resnet_v150 기준으로 동작을 하므로 관련된 기능을 알아두자
root@a79a83fc99f6:/workdir/models/research# cat configs/ssd320_full_1gpus.config # SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal # loss (a.k.a Retinanet). # See Lin et al, https://arxiv.org/abs/1708.02002 # Trained on COCO, initialized from Imagenet classification checkpoint model { ssd { inplace_batchnorm_update: true freeze_batchnorm: true num_classes: 90 ## Label 의 갯수 object_detection/data/mscoco_label_map.pbtxt 의 label의 class 수와 동일 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 ## 50% 넘은 것만을 화면에 표시 , 추후 object_detection/object_detection_tutorial.ipynb 사용시 파악가능 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 2 } } image_resizer { ## 이 부분은 network의 input shape 인 것 같음 (kernel) fixed_shape_resizer { height: 320 width: 320 } } box_predictor { weight_shared_convolutional_box_predictor { depth: 256 class_prediction_bias_init: -4.6 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.0004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true, decay: 0.997, epsilon: 0.001, } } num_layers_before_predictor: 4 kernel_size: 3 } } feature_extractor { type: 'ssd_resnet50_v1_fpn' # feature_extractor용으로 resnet 50을 사용하며 이 부분은 변경가능 fpn { min_level: 3 max_level: 7 } min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.0004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { scale: true, decay: 0.997, epsilon: 0.001, } } override_base_feature_extractor_hyperparams: true } loss { classification_loss { weighted_sigmoid_focal { alpha: 0.25 gamma: 2.0 } } localization_loss { weighted_smooth_l1 { } } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true normalize_loc_loss_by_codesize: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 ### class마다 찾을 수 있는 MAX max_total_detections: 100 ### 전체 찾을 수 있는 MAX object_detection/object_detection_tutorial.ipynb 의 output_dict['num_detections'] 과 동일 } score_converter: SIGMOID } } } # # model은 Google의 pre-train 된 모델 # SSD에서 내부 feature extractor를 pre-trained model사용하면 fine_tune_checkpoint_type "classification" # faster R-CNN, fine_tune_checkpoint_type "detection" # train_config: { fine_tune_checkpoint: "/checkpoints/resnet_v1_50/model.ckpt" ## 상위 pre-trained model download 위치 fine_tune_checkpoint_type: "classification" # 모델에 따라 설정이 다르다고 함 batch_size: 32 ## GPU의 Memory가 Out of Memory가 발생할수 있으므로, 본인의 GPU Memory 맞게 설정 or CPU모드로 변경 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 num_steps: 100000 ## steps 100,000 정함 , ( object_detection/model_main.py --num_train_steps 으로 설정가능) data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_crop_image { min_object_covered: 0.0 min_aspect_ratio: 0.75 max_aspect_ratio: 3.0 min_area: 0.75 max_area: 1.0 overlap_thresh: 0.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: .02000000000000000000 total_steps: 100000 warmup_learning_rate: .00866640000000000000 warmup_steps: 8000 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 ## max_total_detections와 동일하게 해야 할 것 같음 object_detection_tutorial.ipynb의 output_dict['detection_boxes'] unpad_groundtruth_tensors: false } # # Training Setting # # input_path: //TF Record # coco_train.record-00000-of-00100 # coco_train.record-00001-of-00100 # ..... # label_map_path: # mscoco_label_map.pbtxt # train_input_reader: { tf_record_input_reader { input_path: "/data/coco2017_tfrecords/*train*" ## tf_record format 위치 } label_map_path: "object_detection/data/mscoco_label_map.pbtxt" ## Label 위치 } # # Eval Setting # # # eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false num_examples: 8000 ## eval 시 examples의 갯수 ##max_evals: 10 ## eval 할 수 있는 max 값 지정 (object_detection/model_main.py --eval_count 설정가능) , 원래 config 미존재 } # # Eval Setting # # input_path: //TF Record # coco_val.record-00000-of-00010 # coco_val.record-00001-of-00010 # # label_map_path: # mscoco_label_map.pbtxt # eval_input_reader: { tf_record_input_reader { input_path: "/data/coco2017_tfrecords/*val*" ## tf_record format 위치 } label_map_path: "object_detection/data/mscoco_label_map.pbtxt" ## Label 정보확인가능 shuffle: false num_readers: 1 }
- Pre-trained Models
- Configuring the Object Detection Training Pipeline
https://medium.com/coinmonks/modelling-transfer-learning-using-tensorflows-object-detection-model-on-mac-692c8609be40
https://devtalk.nvidia.com/default/topic/1049371/tensorrt/how-to-visualize-tf-trt-graphs-with-tensorboard-/
2.4 Quick Guide 6 실행 및 분석
Training을 한 후 Validation 하는 부분으로 보정의 역할을 하는 것 같은데, 정확한 역할은 Tensorflow와 DataSet의 기본구조를 알아야 할 것 같다.
root@c7550d6b2c59:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints
root@c7550d6b2c59:/workdir/models/research# cat examples/SSD320_evaluate.sh CHECKPINT_DIR=$1 TENSOR_OPS=0 export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS} export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS} export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS} python object_detection/model_main.py --checkpoint_dir $CHECKPINT_DIR --model_dir /results --run_once --pipeline_config_path configs/ssd320_full_1gpus.config
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md
2.5 생성된 최종 Check Point 파일 확인
NVIDIA Docker를 이용하여 최종적으로 만들어지는 파일은 Checkpoint File이며, PB파일은 본인이 직접 만들어야하고, Inference도 역시 이 기반으로 해봐야 할 것 같다.
현재 chekpoint가 아래와 같이 model.ckpt-0 와 model.ckpt-100000로 구성됨
root@c7550d6b2c59:/workdir/models/research# ls /checkpoints/ checkpoint model.ckpt-0.data-00000-of-00002 model.ckpt-100000.data-00000-of-00002 resnet_v1_50 eval model.ckpt-0.data-00001-of-00002 model.ckpt-100000.data-00001-of-00002 resnet_v1_50_2016_08_28.tar.gz events.out.tfevents.1572262719.c7550d6b2c59 model.ckpt-0.index model.ckpt-100000.index graph.pbtxt model.ckpt-0.meta model.ckpt-100000.meta
- 기본구성
- model.ckpt-0: Training 시작과 동시에 생성 (STEP0)
- model.ckpt-100000 : Pipeline STEP 수와 동일하며, 여기서 재학습도 할 경우 계속 증가
- graph.pb.txt: Network 구조 파악
Checkpoint 이해필요
https://eehoeskrap.tistory.com/343
https://eehoeskrap.tistory.com/370
https://eehoeskrap.tistory.com/344
https://gusrb.tistory.com/21
http://jaynewho.com/post/8
2.6 CheckPoint 를 PB Format으로 변환
Inference를 위해서 아래와 같이 PB파일로 변환
- export_inference_graph.py 사용법
- input_type
- image_tensor
- encoded_image_string_tensor
- tf_example
TFRecord 와 TF Example
https://www.tensorflow.org/tutorials/load_data/tfrecord
root@c7550d6b2c59:/workdir/models/research# python object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path configs/ssd320_full_1gpus.config \
--trained_checkpoint_prefix /checkpoints/model.ckpt-100000 \
--output_directory /checkpoints/inference_graph_100000
root@c7550d6b2c59:/workdir/models/research# ls /checkpoints/inference_graph_100000/
checkpoint frozen_inference_graph.pb model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta pipeline.config saved_model
//새로 생성된 PB파일 확인
root@c7550d6b2c59:/workdir/models/research# ls /checkpoints/inference_graph_100000/saved_model/
saved_model.pb variables
Exporting a trained model for inference
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
2.7 Tensorboard 로 테스트 진행
- Host에서 CheckPoint 구조확인
$ cd ~/works/ssd/check //host에서 checkpoint 구조 파악 $ tree . ├── checkpoint //model.ckpt-100000 과 각 path ├── eval //training시 생성된 tensorboard log validation/evaluation 시 생성된 부분은 /result/eval에 존재 │ └── events.out.tfevents.1572359812.c7550d6b2c59 // Tensorboard 용 Log file ├── events.out.tfevents.1572262719.c7550d6b2c59 // Tensorboard 용 Log file ├── graph.pbtxt // Network 관련정보 확인 ├── inference_graph_100000 // model.ckpt-100000x 기반의 pb 파일 생성 │ ├── checkpoint │ ├── frozen_inference_graph.pb │ ├── model.ckpt.data-00000-of-00001 │ ├── model.ckpt.index │ ├── model.ckpt.meta │ ├── pipeline.config │ └── saved_model │ ├── saved_model.pb │ └── variables ├── model.ckpt-0.data-00000-of-00002 //checkpoint-0 ├── model.ckpt-0.data-00001-of-00002 ├── model.ckpt-0.index ├── model.ckpt-0.meta ├── model.ckpt-100000.data-00000-of-00002 //checkpoint-100000 ├── model.ckpt-100000.data-00001-of-00002 ├── model.ckpt-100000.index ├── model.ckpt-100000.meta ├── resnet_v1_50 //Pre-trained Model (SSD의 특징추출용으로 사용) │ └── model.ckpt ├── resnet_v1_50_2016_08_28.tar.gz //Pre-trained Model ├── resnet_v1_50_2016_08_28.tar.gz.1 └── resnet_v1_50_2016_08_28.tar.gz.2
- Pre-trained Models
resnet_v1_50_2016_08_28.tar.gz
https://github.com/tensorflow/models/tree/master/research/slim
- Docker Container에서 Tensorboard 실행
root@c7550d6b2c59:/workdir/models/research# tensorboard --logdir=/checkpoints
- Tensorboard Browser 연결
- Tensorboard -> Scalars
Images 관련 부분 소스
root@c7550d6b2c59:/workdir/models/research# vi ./object_detection/utils/visualization_utils.py ........ def draw_side_by_side_evaluation_image(eval_dict, category_index, max_boxes_to_draw=20, min_score_thresh=0.2, use_normalized_coordinates=True): """Creates a side-by-side image with detections and groundtruth. Bounding boxes (and instance masks, if available) are visualized on both subimages. Args: eval_dict: The evaluation dictionary returned by eval_util.result_dict_for_batched_example() or eval_util.result_dict_for_single_example(). category_index: A category index (dictionary) produced from a labelmap. max_boxes_to_draw: The maximum number of boxes to draw for detections. min_score_thresh: The minimum score threshold for showing detections. use_normalized_coordinates: Whether to assume boxes and kepoints are in normalized coordinates (as opposed to absolute coordiantes). Default is True. Returns: A list of [1, H, 2 * W, C] uint8 tensor. The subimage on the left corresponds to detections, while the subimage on the right corresponds to groundtruth. """ ........ class EvalMetricOpsVisualization(object): .... def get_estimator_eval_metric_ops(self, eval_dict): ## 아래의에서 호출됨 if self._max_examples_to_draw == 0: return {} images = self.images_from_evaluation_dict(eval_dict) def get_images(): """Returns a list of images, padded to self._max_images_to_draw.""" images = self._images while len(images) < self._max_examples_to_draw: images.append(np.array(0, dtype=np.uint8)) self.clear() return images def image_summary_or_default_string(summary_name, image): ## 이곳에서 Image 생성 """Returns image summaries for non-padded elements.""" return tf.cond( tf.equal(tf.size(tf.shape(image)), 4), lambda: tf.summary.image(summary_name, image), ## Tensorboard Image lambda: tf.constant('')) update_op = tf.py_func(self.add_images, [[images[0]]], []) image_tensors = tf.py_func( get_images, [], [tf.uint8] * self._max_examples_to_draw) eval_metric_ops = {} for i, image in enumerate(image_tensors): summary_name = self._summary_name_prefix + '/' + str(i) value_op = image_summary_or_default_string(summary_name, image) ## Tensorboard Image 생성 eval_metric_ops[summary_name] = (value_op, update_op) return eval_metric_ops ..... class VisualizeSingleFrameDetections(EvalMetricOpsVisualization): ## VisualizeSingleFrameDetections는 EvalMetricOpsVisualization """Class responsible for single-frame object detection visualizations.""" def __init__(self, category_index, max_examples_to_draw=5, max_boxes_to_draw=20, min_score_thresh=0.2, use_normalized_coordinates=True, summary_name_prefix='Detections_Left_Groundtruth_Right'): super(VisualizeSingleFrameDetections, self).__init__( category_index=category_index, max_examples_to_draw=max_examples_to_draw, max_boxes_to_draw=max_boxes_to_draw, min_score_thresh=min_score_thresh, use_normalized_coordinates=use_normalized_coordinates, summary_name_prefix=summary_name_prefix) def images_from_evaluation_dict(self, eval_dict): return draw_side_by_side_evaluation_image( eval_dict, self._category_index, self._max_boxes_to_draw, self._min_score_thresh, self._use_normalized_coordinates) ........... root@c7550d6b2c59:/workdir/models/research# vi ./object_detection/model_lib.py .... if mode == tf.estimator.ModeKeys.EVAL: ## EVAL Mode ......... eval_dict = eval_util.result_dict_for_batched_example( ## Image 정보 eval_images, features[inputs.HASH_KEY], detections, groundtruth, class_agnostic=class_agnostic, scale_to_absolute=True, original_image_spatial_shapes=original_image_spatial_shapes, true_image_shapes=true_image_shapes) if class_agnostic: category_index = label_map_util.create_class_agnostic_category_index() else: category_index = label_map_util.create_category_index_from_labelmap( eval_input_config.label_map_path) vis_metric_ops = None if not use_tpu and use_original_images: eval_metric_op_vis = vis_utils.VisualizeSingleFrameDetections( category_index, max_examples_to_draw=eval_config.num_visualizations, max_boxes_to_draw=eval_config.max_num_boxes_to_visualize, min_score_thresh=eval_config.min_score_threshold, use_normalized_coordinates=False) vis_metric_ops = eval_metric_op_vis.get_estimator_eval_metric_ops( ## 이곳에서 Image 저장 , 상위참조 eval_dict) ....
관련내용정리
tf.estimator.ModeKeys.TRAIN
tf.estimator.ModeKeys.EVAL
tf.estimator.ModeKeys.PREDICT
아래 사이트에서 설명이 너무 잘되어있음
https://bcho.tistory.com/1196
https://www.tensorflow.org/tensorboard/image_summaries
https://bcho.tistory.com/1196
- Tensorboard -> Images
https://www.tensorflow.org/tensorboard/image_summaries
- Tensorboard -> Graphs
Tensorboard 관련내용 과 실제사용
https://itnext.io/how-to-use-tensorboard-5d82f8654496
https://pythonkim.tistory.com/39
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pets.md
2.8 Object Detection 준비
상위 Docker로 실행한 Terminal 에서 Jupyter 를 실행하여 Jupyter TEST 진행
Tensorflow Jupiter Notebook
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_notebook.md
https://github.com/kaczmarj/neurodocker/issues/82
http://melonicedlatte.com/web/2018/05/22/134429.html
2.9 기본 Object Detection 확인
사용모델: ssd_mobilenet_v1_coco_2017_11_17.tar.gz
https://medium.com/@yuu.ishikawa/how-to-show-signatures-of-tensorflow-saved-model-5ac56cf1960f
http://solarisailab.com/archives/2387
간단히 분석하며 상위 Model(pb파일)과 pbtxt를 이용하여 test_images 내의 image들을 테스트 진행
Jupyter Consol 에러사항
GPU Memory 문제사항
https://github.com/tensorflow/tensorflow/issues/24828
https://lsjsj92.tistory.com/363
https://devtalk.nvidia.com/default/topic/1051380/cudnn/could-not-create-cudnn-handle-cudnn_status_internal_error/
Tensorflow 2.0 GPU Memory 부족현상
https://inpages.tistory.com/155
3. 현재 상황
나의 랩탑에서는 상위 소스를 추가를 하면 예제를 Inference한 부분을 볼수 없지만, 다른 성능 좋은 Server에서는 잘 동작한다.
참 안타까운 일이며, 나의 랩탑(Laptop)의 한계를 많이 느낀다. (특히 GPU RAM)
관련부분 참조사이트 들이며, 너무 많이 참조하여 각 링크만 나열
Object Detection Install 및 TEST
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
Training 자료수집
https://www.slideshare.net/fermat39/polyp-detection-withtensorflowobjectdetectionapi
https://www.kdnuggets.com/2019/03/object-detection-luminoth.html
Tensorflow Training 및 사용법
https://yongyong-e.tistory.com/24
http://solarisailab.com/archives/2422
https://hwauni.tistory.com/entry/API-Object-Detection-API%EB%A5%BC-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EC%98%A4%EB%B8%8C%EC%A0%9D%ED%8A%B8-%EC%9D%B8%EC%8B%9D%ED%95%98%EA%B8%B0-Part-1-%EC%84%A4%EC%A0%95-%ED%8E%8C
https://cloud.google.com/solutions/creating-object-detection-application-tensorflow?hl=ko
Tensorflow Object Detection 부분 추후 분리
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API/
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API-Installation/
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API-Training/
Tensorflow Object Detection 관련사항
https://yongyong-e.tistory.com/31?category=836820
https://yongyong-e.tistory.com/32?category=836820
https://yongyong-e.tistory.com/35?category=836820 **
https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85 **
https://gilberttanner.com/blog/live-object-detection
Tensorflow Object Detection API Training
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html
https://towardsdatascience.com/custom-object-detection-using-tensorflow-from-scratch-e61da2e10087
https://becominghuman.ai/tensorflow-object-detection-api-tutorial-training-and-evaluating-custom-object-detector-ed2594afcf73
https://medium.com/pylessons/tensorflow-step-by-step-custom-object-detection-tutorial-d7ae840a74e2
Tensorflow Object Detection API
https://github.com/tensorflow/models/tree/master/research/object_detection
https://github.com/tensorflow/models/tree/master/research/object_detection/g3doc
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
${1:-none}
https://stackoverflow.com/questions/38260927/what-does-this-line-build-target-1-none-means-in-shell-scripting
${@:2}
https://unix.stackexchange.com/questions/92978/what-does-this-2-mean-in-shell-scripting
2.8 Object Detection 준비
상위 Docker로 실행한 Terminal 에서 Jupyter 를 실행하여 Jupyter TEST 진행
- TEST Image 준비
root@c7550d6b2c59:/workdir/models/research# cp /data/coco2017_tfrecords/raw-data/test2017/000000000001.jpg object_detection/test_images/image1.jpg root@c7550d6b2c59:/workdir/models/research# cp /data/coco2017_tfrecords/raw-data/test2017/000000517810.jpg object_detection/test_images/image2.jpg or root@5208474af96a:/workdir/models/research# cat object_detection/test_images/image_info.txt //아래의 사이트에서 image1.jpg image2.jpg download 후 복사 Image provenance: image1.jpg: https://commons.wikimedia.org/wiki/File:Baegle_dwa.jpg image2.jpg: Michael Miley, https://www.flickr.com/photos/mike_miley/4678754542/in/photolist-88rQHL-88oBVp-88oC2B-88rS6J-88rSqm-88oBLv-88oBC4 root@c7550d6b2c59:/workdir/models/research# cp /data/coco2017_tfrecords/raw-data/image1.jpg object_detection/test_images/image1.jpg //아래 사이트에서 download함 root@c7550d6b2c59:/workdir/models/research# cp /data/coco2017_tfrecords/raw-data/image2.jpg object_detection/test_images/image2.jpg
- Jupyter를 이용하여 object_detection/object_detection_tutorial.ipynb 실행
root@c7550d6b2c59:/workdir/models/research# jupyter notebook // error 발생
root@c7550d6b2c59:/workdir/models/research# jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root
Tensorflow Jupiter Notebook
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_notebook.md
- Jupyter notebook 실행 후 브라우저확인
- Jupyter notebook 실행시 발생하는 에러
https://github.com/kaczmarj/neurodocker/issues/82
http://melonicedlatte.com/web/2018/05/22/134429.html
2.9 기본 Object Detection 확인
- 별도의 Docker Terminal을 실행
$ docker exec -it nvidia_ssd /bin/bash // 상위 docker 이미 Jupyter가 돌아가는 중이므로 별도의 Terminal 사용 root@5208474af96a:/workdir/models/research# python object_detection/model_main.py --help root@5208474af96a:/workdir/models/research# ls object_detection/object_detection_tutorial.ipynb // Jupyter로 테스트 진행 object_detection/object_detection_tutorial.ipynb root@5208474af96a:/workdir/models/research# ls object_detection/ssd_mobilenet_v1_coco_2017_11_17 // 상위 Jupyter에서 사용하는 Model frozen_inference_graph.pb root@5208474af96a:/workdir/models/research# ls object_detection/data // 상위 Jupyter에서 사용하는 pbtxt ava_label_map_v2.1.pbtxt mscoco_complete_label_map.pbtxt oid_object_detection_challenge_500_label_map.pbtxt face_label_map.pbtxt mscoco_label_map.pbtxt pascal_label_map.pbtxt fgvc_2854_classes_label_map.pbtxt mscoco_minival_ids.txt pet_label_map.pbtxt kitti_label_map.pbtxt oid_bbox_trainable_label_map.pbtxt
- object_detection_tutorial.ipynb
사용모델: ssd_mobilenet_v1_coco_2017_11_17.tar.gz
https://medium.com/@yuu.ishikawa/how-to-show-signatures-of-tensorflow-saved-model-5ac56cf1960f
http://solarisailab.com/archives/2387
간단히 분석하며 상위 Model(pb파일)과 pbtxt를 이용하여 test_images 내의 image들을 테스트 진행
- object_detection_tutorial.ipynb 문제사항
config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config)
Jupyter Consol 에러사항
E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
GPU Memory 문제사항
https://github.com/tensorflow/tensorflow/issues/24828
https://lsjsj92.tistory.com/363
https://devtalk.nvidia.com/default/topic/1051380/cudnn/could-not-create-cudnn-handle-cudnn_status_internal_error/
Tensorflow 2.0 GPU Memory 부족현상
https://inpages.tistory.com/155
failed to allocate 2.62G (2811428864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memoryhttps://stackoverflow.com/questions/39465503/cuda-error-out-of-memory-in-tensorflow
- NVIDIA GPU Memory 사용량 확인
$ watch -n 0.1 nvidia-smi
3. 현재 상황
나의 랩탑에서는 상위 소스를 추가를 하면 예제를 Inference한 부분을 볼수 없지만, 다른 성능 좋은 Server에서는 잘 동작한다.
참 안타까운 일이며, 나의 랩탑(Laptop)의 한계를 많이 느낀다. (특히 GPU RAM)
관련부분 참조사이트 들이며, 너무 많이 참조하여 각 링크만 나열
Object Detection Install 및 TEST
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
Training 자료수집
https://www.slideshare.net/fermat39/polyp-detection-withtensorflowobjectdetectionapi
https://www.kdnuggets.com/2019/03/object-detection-luminoth.html
Tensorflow Training 및 사용법
https://yongyong-e.tistory.com/24
http://solarisailab.com/archives/2422
https://hwauni.tistory.com/entry/API-Object-Detection-API%EB%A5%BC-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EC%98%A4%EB%B8%8C%EC%A0%9D%ED%8A%B8-%EC%9D%B8%EC%8B%9D%ED%95%98%EA%B8%B0-Part-1-%EC%84%A4%EC%A0%95-%ED%8E%8C
https://cloud.google.com/solutions/creating-object-detection-application-tensorflow?hl=ko
Tensorflow Object Detection 부분 추후 분리
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API/
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API-Installation/
https://you359.github.io/tensorflow%20models/Tensorflow-Object-Detection-API-Training/
Tensorflow Object Detection 관련사항
https://yongyong-e.tistory.com/31?category=836820
https://yongyong-e.tistory.com/32?category=836820
https://yongyong-e.tistory.com/35?category=836820 **
https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85 **
https://gilberttanner.com/blog/live-object-detection
Tensorflow Object Detection API Training
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html
https://towardsdatascience.com/custom-object-detection-using-tensorflow-from-scratch-e61da2e10087
https://becominghuman.ai/tensorflow-object-detection-api-tutorial-training-and-evaluating-custom-object-detector-ed2594afcf73
https://medium.com/pylessons/tensorflow-step-by-step-custom-object-detection-tutorial-d7ae840a74e2
Tensorflow Object Detection API
https://github.com/tensorflow/models/tree/master/research/object_detection
https://github.com/tensorflow/models/tree/master/research/object_detection/g3doc
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
- Shellscript 분석 중 혼동부분 정리
항상 느끼지만, 매번 Opensource 의 Shell Script 잘 만들어지고, 자주 변경되어 많이 혼동됨
https://stackoverflow.com/questions/38260927/what-does-this-line-build-target-1-none-means-in-shell-scripting
${@:2}
https://unix.stackexchange.com/questions/92978/what-does-this-2-mean-in-shell-scripting