이전에 TensorRT의 Python 소스를 분석했지만, Python 다양한 소스가 존재하기 때문에 아래와 같이 우선 NVIDIA에서 제공해주는 TensorRT Python 전체예제를 살펴보자
TensorRT 기본 Release 정보확인
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/index.html
1.1 TensorRT Python 점검사항
- Jetpack 4.2.1 설치기준
- TensorRT Version Check
$ dpkg -l | grep TensorRT ii graphsurgeon-tf 5.1.6-1+cuda10.0 arm64 GraphSurgeon for TensorRT package ii libnvinfer-dev 5.1.6-1+cuda10.0 arm64 TensorRT development libraries and headers ii libnvinfer-samples 5.1.6-1+cuda10.0 all TensorRT samples and documentation ii libnvinfer5 5.1.6-1+cuda10.0 arm64 TensorRT runtime libraries ii python-libnvinfer 5.1.6-1+cuda10.0 arm64 Python bindings for TensorRT ii python-libnvinfer-dev 5.1.6-1+cuda10.0 arm64 Python development package for TensorRT ii python3-libnvinfer 5.1.6-1+cuda10.0 arm64 Python 3 bindings for TensorRT ii python3-libnvinfer-dev 5.1.6-1+cuda10.0 arm64 Python 3 development package for TensorRT ii tensorrt 5.1.6.1-1+cuda10.0 arm64 Meta package of TensorRT ii uff-converter-tf 5.1.6-1+cuda10.0 arm64 UFF converter for TensorRT package
https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#installing
- TensorRT Python 기본구조 및 필요 Package 설치
- pyCuda 설치 (python2만 설치 했으나, python3에도 설치진행)
- TensorRT UFF/Caffe/Onnx Parser 소스 분석
- TensorRT 직접Network 직접구성 분석
이전소스 구조가 비슷하기에 이전 내용을 이해면 이해가 쉽다
https://ahyuo79.blogspot.com/2019/08/tensorrt-5-python.html
- Tensorflow 설치부분확인
$ pip3 list // python3 version만 설치 .. tensorboard 1.14.0 tensorflow-estimator 1.14.0 tensorflow-gpu 1.14.0+nv19.7 ... $ pip list // python2는 없음
아래 사이트의 2.2 IPlugIn SSD 기능확인의 Tensorflow 설치부분참고
https://ahyuo79.blogspot.com/2019/08/ds-sdk-40-test4-iplugin-sample.html
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html
1.2 TensorRT Python 소스구조 확인
NVIDIA에서 제공해주는 TensorRT의 Python 예제들을 전체 살펴보자
- TensorRT의 Python 전체소스 확인
$ cd /usr/src/tensorrt/samples/python
$ tree -t
.
├── common.py // 거의 공통적으로 사용 import common
├── end_to_end_tensorflow_mnist
│ ├── model.py
│ ├── README.md
│ ├── requirements.txt
│ └── sample.py
├── engine_refit_mnist
│ ├── model.py
│ ├── README.md
│ ├── requirements.txt
│ └── sample.py
├── fc_plugin_caffe_mnist
│ ├── CMakeLists.txt
│ ├── __init__.py
│ ├── README.md
│ ├── requirements.txt
│ ├── sample.py
│ └── plugin
│ ├── FullyConnected.h
│ └── pyFullyConnected.cpp
├── int8_caffe_mnist
│ ├── calibrator.py
│ ├── README.md
│ ├── requirements.txt
│ └── sample.py
├── network_api_pytorch_mnist // 이전에 이미 설명
│ ├── model.py
│ ├── README.md
│ ├── requirements.txt
│ └── sample.py
├── uff_custom_plugin // UFF cumtome PlugIn 부분 예제 (중요)
│ ├── CMakeLists.txt
│ ├── __init__.py
│ ├── lenet5.py
│ ├── README.md
│ ├── requirements.txt
│ ├── sample.py
│ └── plugin
│ ├── clipKernel.cu
│ ├── clipKernel.h
│ ├── customClipPlugin.cpp
│ └── customClipPlugin.h
├── uff_ssd
│ ├── CMakeLists.txt
│ ├── detect_objects.py
│ ├── README.md
│ ├── requirements.txt
│ ├── voc_evaluation.py
│ ├── images
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── image_details.txt
│ ├── plugin
│ │ └── FlattenConcat.cpp
│ └── utils
│ ├── boxes.py
│ ├── coco.py
│ ├── engine.py
│ ├── inference.py
│ ├── __init__.py
│ ├── mAP.py
│ ├── model.py
│ ├── paths.py
│ └── voc.py
├── yolov3_onnx
│ ├── coco_labels.txt
│ ├── data_processing.py
│ ├── onnx_to_tensorrt.py
│ ├── README.md
│ ├── requirements.txt
│ └── yolov3_to_onnx.py
├── common.pyc
└── introductory_parser_samples // 이전에 이미 설명
├── caffe_resnet50.py
├── onnx_resnet50.py
├── README.md
├── requirements.txt
├── sample_uff.engine
└── uff_resnet50.py
1.3 common.py 소스
$ cat common.py
import os
import argparse
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
try:
# Sometimes python2 does not understand FileNotFoundError
FileNotFoundError
except NameError:
FileNotFoundError = IOError
## 2의 30승 이므로 Giga Byte로 변경
def GiB(val):
return val * 1 << 30
## 기본위치설정은 /usr/src/tensorrt/data 이며 subfolder와 find_files에 따라 변경
def find_sample_data(description="Runs a TensorRT Python sample", subfolder="", find_files=[]):
'''
Parses sample arguments.
Args:
description (str): Description of the sample.
subfolder (str): The subfolder containing data relevant to this sample
find_files (str): A list of filenames to find. Each filename will be replaced with an absolute path.
Returns:
str: Path of data directory.
Raises:
FileNotFoundError
'''
# Standard command-line arguments for all samples.
kDEFAULT_DATA_ROOT = os.path.join(os.sep, "usr", "src", "tensorrt", "data")
parser = argparse.ArgumentParser(description=description, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("-d", "--datadir", help="Location of the TensorRT sample data directory.", default=kDEFAULT_DATA_ROOT)
args, unknown_args = parser.parse_known_args()
# If data directory is not specified, use the default.
data_root = args.datadir
# If the subfolder exists, append it to the path, otherwise use the provided path as-is.
subfolder_path = os.path.join(data_root, subfolder)
data_path = subfolder_path
if not os.path.exists(subfolder_path):
print("WARNING: " + subfolder_path + " does not exist. Trying " + data_root + " instead.")
data_path = data_root
# Make sure data directory exists.
if not (os.path.exists(data_path)):
raise FileNotFoundError(data_path + " does not exist. Please provide the correct data path with the -d option.")
# Find all requested files.
for index, f in enumerate(find_files):
find_files[index] = os.path.abspath(os.path.join(data_path, f))
if not os.path.exists(find_files[index]):
raise FileNotFoundError(find_files[index] + " does not exist. Please provide the correct data path with the -d option.")
return data_path, find_files
## 아래 allocate_buffers에서 사용되어지며, host(CPU) host_mem , device(GPU) device_mem 할당
## 추후에 inputs[0].host 이런식으로 사용되어짐
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
## GPU추론을 위해서 host(CPU) 와 device(GPU) inputs/output 별도Buffer 생성
# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
## HOST (CPU) 와 Device(GPU) Memory 할당이 다르다
## 추후에 inputs[0].host 이런식으로 사용되어짐
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
## 상위 input/output이 NULL인 상태에서 선언된 HostDeviceMem를 추가
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
## inference할 때도 host(CPU) 와 device(GPU) 의 개념존재
## 추론을 위해서 CPU->GPU Buffer 이동하고 추론하고 GPU->CPU로 가져오는 방식이다
## 최종적으로 CPU의 Output Buffer를 반환하여 Linux에서 실행가능
# This function is generalized for multiple inputs/outputs.
# inputs and outputs are expected to be lists of HostDeviceMem objects.
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
Python Sample Section
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#python_samples_section
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics
2. Tensorflow MINIST 소스 실행
Tensorflow의 Keras를 Model를 생성하고 이를 테스트 하는 실행소스
$ cd /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist
$ cat requirements.txt //python3 에 이미 설치됨
numpy
Pillow
pycuda
tensorflow
$ sudo mkdir models //권한문제
$ sudo python3 model.py // download 권한문제
.........
2019-09-05 14:23:43.819912: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
60000/60000 [==============================] - 8s 136us/sample - loss: 0.2010 - acc: 0.9414 // loss acc는 어떻게 계산이 되는지 모르겠음
Epoch 2/5
60000/60000 [==============================] - 7s 121us/sample - loss: 0.0803 - acc: 0.9754 // acc는 정확성 같고, loss 무슨손실인지
Epoch 3/5
60000/60000 [==============================] - 7s 119us/sample - loss: 0.0523 - acc: 0.9838
Epoch 4/5
60000/60000 [==============================] - 7s 118us/sample - loss: 0.0361 - acc: 0.9887
Epoch 5/5
60000/60000 [==============================] - 7s 117us/sample - loss: 0.0291 - acc: 0.9907 // 5 번 Training이 진행될 수록 acc의 수치는 증가 loss는 줄어든다
10000/10000 [==============================] - 1s 73us/sample - loss: 0.0600 - acc: 0.9812 // 마지막 Test 진행하여 결과
W0905 14:24:21.071296 547892088848 deprecation_wrapper.py:119] From model.py:78: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.
.........
$ find / -name convert_to_uff.py 2> /dev/null
/usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py
/usr/lib/python2.7/dist-packages/uff/bin/convert_to_uff.py
$ sudo python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py models/lenet5.pb
.........
UFF Version 0.6.3
=== Automatically deduced input nodes ===
[name: "input_1"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: -1
}
dim {
size: 28
}
dim {
size: 28
}
dim {
size: 1
}
}
}
}
]
=========================================
=== Automatically deduced output nodes ===
[name: "dense_1/Softmax"
op: "Softmax"
input: "dense_1/BiasAdd"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
]
==========================================
Using output node dense_1/Softmax
Converting to UFF graph
DEBUG: convert reshape to flatten node
No. nodes: 13
UFF Output written to models/lenet5.uff
$ ls models/ //UFF Format 생성확인 (PB->UFF)
lenet5.pb lenet5.uff
// Test Case : Random으로 선택된 Case , Prediction: 추론의 의한값 동일
$ sudo python3 sample.py // -d /usr/src/tensorrt/data
Test Case: 1
Prediction: 1
$ ls /usr/src/tensorrt/data/mnist/
0.pgm 3.pgm 6.pgm 9.pgm LegacyCalibrationTable lenet5_mnist_frozen.pb mnistapi.wts mnist_lenet.caffemodel mnist.prototxt
1.pgm 4.pgm 7.pgm batches lenet5_custom_pool.uff lenet5.uff mnist.caffemodel mnist_mean.binaryproto
2.pgm 5.pgm 8.pgm deploy.prototxt lenet5_custom_pool.uff.txt lenet5.uff.txt mnistgie.wts mnist.onnx
UFF Utility
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/uff/uff.html
2.1 Tensorflow MNIST Model 소스분석
기본소스는 Tensorflow의 Keras로 생성된 모델과 설정된 Netowkr으로 동작되며, Tensorflow의 MNIST DATASET을 Download하여 Training과 TEST를 걸쳐
최종적으로 Model을 파일로 PB파일로 저장한다
$ cat model.py import tensorflow as tf import numpy as np ## Google에서 minist.npz download 후 TRAIN과 TEST를 횟수를 정의하기위해 1차원추가 def process_dataset(): ## Google에서 mnist.npz data를 가져온 후 값을 255로 나누어 저장 # Import the data (x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 ## TRAINING 횟수 , TEST 횟수를 reshape를 해서 1차원을 추가 (4차원) # Reshape the data NUM_TRAIN = 60000 NUM_TEST = 10000 x_train = np.reshape(x_train, (NUM_TRAIN, 28, 28, 1)) x_test = np.reshape(x_test, (NUM_TEST, 28, 28, 1)) return x_train, y_train, x_test, y_test ## Model의 Network 생성 과 구성(각 Layer 추가설정) def create_model(): model = tf.keras.models.Sequential() model.add(tf.keras.layers.InputLayer(input_shape=[28,28, 1])) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(512, activation=tf.nn.relu)) model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax)) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) return model ## Model과 File 명을 입력 받아 freeze하여 lenet5.pb로 저장 def save(model, filename): # First freeze the graph and remove training nodes. output_names = model.output.op.name sess = tf.keras.backend.get_session() frozen_graph = tf.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), [output_names]) frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph) # Save the model with open(filename, "wb") as ofile: ofile.write(frozen_graph.SerializeToString()) def main(): ## DataSet Download 하여 Training/TEST 숫자 변경, 상위함수 x_train, y_train, x_test, y_test = process_dataset() ## 상위에 정의된 Layer로 모델구성,상위함수 model = create_model() ## Training을 위해 전체횟수 5번 과 1번의 Progress Bar로 표시 # Train the model on the data model.fit(x_train, y_train, epochs = 5, verbose = 1) ## Model Training/TEST 진행 x_test:input , y_test:output # Evaluate the model on test data model.evaluate(x_test, y_test) ## Training/TEST한 Model File로 lenet5.pb 저장 save(model, filename="models/lenet5.pb") if __name__ == '__main__': main()
- 기본용어이해
step : weight 와 Bias를 1회 update하는 것을 1 step
batch size : 1회 step에 사용한 data의 수를 정의
https://m.blog.naver.com/PostView.nhn?blogId=wideeyed&logNo=221333529176&proxyReferer=https%3A%2F%2Fwww.google.com%2F
- Tensorflow의 mnist DATASET
- tf.keras.datasets.mnist.load_data
- Tensorflow keras model 이해
- tf.keras.models.Sequential
- model.fi (verbose: 0은 silent 1은 1은 progress bar 표시)
- model.evaluate
- numpy의 reshape / ravel 이해
https://rfriend.tistory.com/349
2.2 TensorRT의 Sample.py 소스분석
상위에서 Training/TEST를 걸쳐 생성된 PB파일을 UFF로 변경된 모델로 읽어서 TensorRT의 Engine을 생성하고 이를 실행하여,
임의로 정한 TEST CASE와 추론을 통한 PREDICTION CASE를 비교한다
$ cat sample.py # This sample uses a UFF MNIST model to create a TensorRT Inference Engine from random import randint from PIL import Image import numpy as np import pycuda.driver as cuda # This import causes pycuda to automatically manage CUDA context creation and cleanup. import pycuda.autoinit import tensorrt as trt import sys, os sys.path.insert(1, os.path.join(sys.path[0], "..")) ## 상위 common.py import common # You can set the logger severity higher to suppress messages (or lower to display more messages). TRT_LOGGER = trt.Logger(trt.Logger.WARNING) ## Class로 ModelData 정의하고 초기값들을 설정 (Class는 설정값만 이용) class ModelData(object): MODEL_FILE = "lenet5.uff" INPUT_NAME ="input_1" INPUT_SHAPE = (1, 28, 28) OUTPUT_NAME = "dense_1/Softmax" ## Build Engine을 만드는 함수 (UFF를 통해 Network 정의) ## UFF Parser에 INPUT정보 input_1 (1, 28, 28) , 이 부분 상위 model.py의 create_model 확인 ## UFF Parser에 OUTPUT정보 dense_1/Softmax def build_engine(model_file): # For more information on TRT basics, refer to the introductory samples. with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser: builder.max_workspace_size = common.GiB(1) # Parse the Uff Network parser.register_input(ModelData.INPUT_NAME, ModelData.INPUT_SHAPE) parser.register_output(ModelData.OUTPUT_NAME) parser.parse(model_file, network) # Build and return an engine. return builder.build_cuda_engine(network) ## CPU Input Buffer에 그림이미지를 넣고 추론준비하고, Random으로 TESTCASE 0~9.pgm 파일준비 ## pagelocked_buffer=inputs[0].host, 즉 Host(CPU) Input Buffer를 입력받는다 ## 그리고 Host(CPU) Input Buffer 에 Random으로 선택된 TEST Case의 Image를 읽어 1차원으로 변경후 ## 1.0 - img/255 연산을 걸친 후 최종 Host Input Buffer 넣는다 ## Random으로 선택된 TEST_CASE는 그대로 리턴 # Loads a test case into the provided pagelocked_buffer. def load_normalized_test_case(data_path, pagelocked_buffer, case_num=randint(0, 9)): test_case_path = os.path.join(data_path, str(case_num) + ".pgm") # Flatten the image into a 1D array, normalize, and copy to pagelocked memory. img = np.array(Image.open(test_case_path)).ravel() np.copyto(pagelocked_buffer, 1.0 - img / 255.0) return case_num ## Main 함수 순차적으로 보자 def main(): ## /usr/src/tensorrt/data/mnist/로 data_path로 설정 data_path, _ = common.find_sample_data(description="Runs an MNIST network using a UFF model file", subfolder="mnist") ## MODEL_PATH 정의가 되었다면, 이것으로 설정 ## os.path.dirname(__file__)를 통해 현재 작업중인 Directory 알아내고, models 의 directory 설정 model_path = os.environ.get("MODEL_PATH") or os.path.join(os.path.dirname(__file__), "models") ## 상위에서 정의된 모델 파일 확인 lenet5.uff model_file = os.path.join(model_path, ModelData.MODEL_FILE) ## 상위 함수 호출로 TensorRT(Cuda) Engine 생성 with build_engine(model_file) as engine: ## Host(CPU), Device(GPU)의 Input/Output Buffer를 설정 # Build an engine, allocate buffers and create a stream. # For more information on buffer allocation, refer to the introductory samples. inputs, outputs, bindings, stream = common.allocate_buffers(engine) ## 실제적인 Engine을 생성하고 준비 중인 상태로 진입 (이전에는 Build상태) with engine.create_execution_context() as context: ## 상위함수로, Host(CPU) Input Buffer Image Data를 넣고 추론준비하고, Test Case 선택 case_num = load_normalized_test_case(data_path, pagelocked_buffer=inputs[0].host) ## Engine이 생성되고 준비가 되었으니, 추론을 진행 (GPU에게 추론진행) # For more information on performing inference, refer to the introductory samples. # The common.do_inference function will return a list of outputs - we only have one in this case. [output] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream) ## 추론된 output 중 가장 큰값 찾고 index 값을 추출 pred = np.argmax(output) ## TEST CASE 상위 Random으로 선택된 값과 추론에 나온값비교 print("Test Case: " + str(case_num)) print("Prediction: " + str(pred)) if __name__ == '__main__': main()
댓글 없음 :
댓글 쓰기