Jeonghun (James) Lee: EVM-Jetson TX2

레이블이 EVM-Jetson TX2인 게시물을 표시합니다. 모든 게시물 표시

6/21/2019

DeepStream SDK 1.5 Jetson TX2 TEST ( 에러발생, 호환안됨)

1. Deepstream SDK 1.5 TEST (Jetson TX2)

JetPack 4.2 기준으로 DeepStream 1.5를 설치하여, 테스트를 진행을 했지만, 문제가 발생하여, 아래와 같이 기록한다.
NVIDIA에서 공식적으로 JetPack 4.2에서는 동작이 안된다고 하며, Jetson TX2의 Deepstream SDK 이 추후 제공시 그 때 다시 테스트 해보자.

Download 및 설치

$ scp ./DeepStream_SDK_on_Jetson_1.5_pre-release.tbz2 nvidia@192.168.55.1:~

$ ssh -X nvidia@192.168.55.1   // Jetson USB로 연결됨 

$ cd ~  // Jetson TX2 

$ tar xpvf DeepStream_SDK_on_Jetson_1.5_pre-release.tbz2

$ sudo tar xpvf deepstream_sdk_on_jetson.tbz2 -C /

$ sudo tar xpvf deepstream_sdk_on_jetson_models.tbz2 -C /


$ nvgstiva-app -c ${HOME}/configs/PGIE-FP16-CarType-CarMake-CarColor.txt
libEGL warning: DRI3: failed to query the version
libEGL warning: DRI2: failed to authenticate

(gst-plugin-scanner:14538): GStreamer-WARNING **: 13:59:42.872: Failed to load plugin '/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvcaffegie.so': libnvparsers.so.4: cannot open shared object file: No such file or directory
** ERROR: : parse_config_file failed
** ERROR: : Failed to parse config file '/home/jetsontx2/configs/PGIE-FP16-CarType-CarMake-CarColor.txt'
Quitting

(nvgstiva-app:14537): GStreamer-CRITICAL **: 13:59:43.237: gst_element_get_static_pad: assertion 'GST_IS_ELEMENT (element)' failed

(nvgstiva-app:14537): GStreamer-CRITICAL **: 13:59:43.238: gst_pad_send_event: assertion 'GST_IS_PAD (pad)' failed

(nvgstiva-app:14537): GStreamer-CRITICAL **: 13:59:43.338: gst_element_set_state: assertion 'GST_IS_ELEMENT (element)' failed

(nvgstiva-app:14537): GStreamer-CRITICAL **: 13:59:44.237: gst_object_unref: assertion 'object != NULL' failed
App run failed

DeepStream 1.5 와 JetPack 4.2 는 호환이 안되며, Jetapck 3.2에서만 동작

JetPack 4.2 기준으로 DeepStream 1.5 호환되지 않는다고 한다.

https://devtalk.nvidia.com/default/topic/1048781/jetson-tx2/jetpack-4-2-compatibility-with-deepstream-1-5/

DeepStream 1.5 와 JetPack 4.2 에러관련사항
https://devtalk.nvidia.com/default/topic/1037472/deepstream-sdk-on-jetson/installing-deepstream-v1-5-on-jetson-tx2-with-jetpack-3-2/

DeepStream User Manual
https://usermanual.wiki/Pdf/DeepStreamUserGuide.1049160376/html

6/07/2019

TensorRT 5.0 와 DeepStream 기능

1. TensorRT 와 DeepStream

DeepStream는 NVIDIA에서 제공하는 Gstreamer의 확장으로, TensorRT의 기능을 포함 Gstreamer기능이라고 생각하면 되겠다.

Gstreamer 에 AI Inference 하기 위해서 별도의 Gstreamer Plugin을 추가하여 개발한 것이다.

DeepStream이 완전지원이 된다면, 사용자는 손쉽게 Gsteamer 처럼 파이프라인만 구축하고 실행을 하면 되기 때문에 상당히 매력적인 기능이다.

현재 Jetson TX2의 경우 JetPack 4.2 용 TensorRT 5.0 의 DeepStream은 아직 제공하고 있지 않으며, Jetpack 3.2 TensorRT 4.0 기준용 Version로 제공을 하고 있다.
TensorRT 5.0의 경우 Jetson AGX Xavier 에서만 제공하고 있다.

Jetson TX2 : DeepStream 1.5 까지 지원 (현재)
Jetson AGX Xavier : DeepStream 3.0 지원 (최근에 지원)

DeepStream SDK Download
https://developer.nvidia.com/embedded/deepstream-on-jetson-downloads

1.1 Jetson TX2 와 Jetson AGX Xavier 비교

DeepStream 에서 주요기능은 TensorRT의 역할일 것이며, 나머지는 Gstreamer와 거의 동일하기 때문에 크게 신경을 쓰지 않아도 될 것 같다.

가장 주목해서 봐야할 기능은 DLA(Deep Learning Accelerator)의 기능이 될 것이며, 이것은 TensorRT의 기능이다.
현재 Jetson TX2에서도 TensorRT에서 DLA(gieexec or trtexec)는 지원을 하고 있지만, 아래의 비교를 보면,

Jetson TX2는 HW 미지원이고 Jetson AGX Xavier는 HW 지원 이다

Jetson AGX Xavier에서 다음으로 봐야할 기능이 Vison Accelerator 일 것 같으며 는 OpenCV or VisionWork처럼 HW적으로 지원해줄 것 같은데,

아직 써보지 못해서 뭐라고 말은 못하겠다.

기존에 사용되어지는 CUDA기반의 VisionWork/OpenCV가 별도의 엔진이 존재하는 것인지 좀 의문스럽다.

추후 Jetson AGX Xavier를 사용 할 기회가 있다면 사용 해본 후에 관련내용을 다시 정리해보자.
두개의 GPU기능을 보면 차이가 많이나며, Tensor 처리부분도 역시 차이가 많이 난다.

Jetson TX2 와 Jetson AGX Xavier 비교

아래에서 비교해보면 알겠지만, 성능차이가 압도적이다.

GPU 성능: 아키텍쳐 변경 및 Clock 과 TensorCores ( AI Interence 할 경우)
DL Accelerator: AI Interence 할 경우, TensorRT
Vision Accelerator: 비디오 입력시 , 영상처리

Feature	Jetson™ TX2	Jetson™ AGX Xavier
GPU	256 Core Pascal @ 1.3GHz	512 Core Volta @ 1.37GHz 64 Tensor Cores
DL Accelerator	-	(2x) NVDLA
Vision Accelerator	-	(2x) 7-way VLIW Processor
CPU	6 core Denver and A57 @ 2GHz (2x) 2MB L2	8 core Carmel ARM CPU @ 2.26GHz (4x) 2MB L2 + 4MB L3
Memory	8GB 128 bit LPDDR4 58.4 GB/s	16GB 256-bit LPDDR4x @ 2133MHz 137 GB/s
Storage	32GB eMMC	32GB eMMC
Video Encode	(2x) 4K @30 HEVC	(4x) 4Kp60 / (8x) 4Kp30 HEVC
Video Decode	(2x) 4K @30 12 bit support	(2x) 8Kp30 / (6x) 4Kp60 12 bit support
Camera	12 lanes MIPI CSI-2 D-PHY 1.2 30Gbps	16 lanes MIPI CSI-2 \| 8 lanes SLVS-EC D-PHY 40Gbps / C-PHY 109Gbps
PCI Express	5 lanes PCIe Gen 2 1x4 + 1x1 \| 2x1 + 1x4	16 lanes PCIe Gen 4 1x8 + 1x4 + 1x2 + 2x1
Mechanical	50mm x 87mm 400 pin connector	100mm x 87mm 699 pin connector
Power	7.5W / 15W	10W / 15W / 30W

Jetson Board 비교
http://connecttech.com/xavier-tx2-comparison/

DeepStream SDK 내용
http://on-demand.gputechconf.com/gtc-cn/2018/pdf/CH8307.pdf

NVIDIA Tesla T4의 기능소개
상위문서를 보면, NVIDIA T4 가 나오는데, NGC(Nvidia GPU Cloud)를 이용하여 동작되는 것으로 보면, ARM용은 아니며, x86용 기반으로 동작되는 기기 인 것 같다.
https://www.nvidia.com/ko-kr/data-center/tesla-t4/

2. DeepStream 의 기능

전체구조를 보면, 쉽게 IP Camera or 보안장비에 연결되어 RTSP로 영상데이터를 받아 이를 Gstream의 중간 기능에,

TensorRT(Inference Engine)를 추가하여 영상 분석과 Detection 및 Tracking 기능을 주로 추가하여 합성되는 기능이다.

DeepStream SW의 구조를 봐도 상위 설명과 크게 다르지 않으며, 아래의 구조를 보면 쉽게 이해간다.
주요기능이 TRT(TensorRT)를 어떻게 Gstream PiPe에 효율적으로 넣을 것인가가 핵심기능이 될 것 같다.

개인적으로 문제가 될 부분을 미리 예상해보자면

Toltal Latency 문제 : 파이프라인이 길어져서 발생되는 총 Latency 문제
Gstream 객체간의 Latency : 즉 Pipe Line에서도 각 객체의 Latency가 다를 것이며, 이에관련된 Buffer 처리문제
TensorRT 적용부분: Gstream에서 이 부분에서 Buffer문제와 빠른 동작이 요구될터인데, HW적으로 어떻게 지원될지가 궁금하다.

NVIDIA Deep Stream SDK
https://developer.nvidia.com/deepstream-sdk

NVIDIA Tesla T4 기반의 DeepStream SDK 3.0

아래의 문서를 보면 상당히 편하게 동작이 되는 것을 알 수 있는지만, 이 기능이 Jetson에도 제공해줄지는 의문이다. (현재 Jetson AGX Xavier에는 SDK 3.0제공)

https://devblogs.nvidia.com/intelligent-video-analytics-deepstream-sdk-3-0/

DeepStream SDK 2.0

x86 Server 기반의 DeepStream 기능을 소개한다.
https://devblogs.nvidia.com/accelerate-video-analytics-deepstream-2/?ncid=so-int-dmsk20ntllh-43648

DeepStream 기반의 Github

https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/yolo

6/04/2019

TensorRT 5.0 Multimedia-Sample

1. TensorRT 5.0 와 Multimedia제어

Multimedia 와 TensorRT의 테스트를 진행하며, 관련 예제들을 알아보자.
이 부분의 기능은 추후 설명할 DeepStream 부분하고도 거의 동일한 기능이기때문에, 동작은 이해를 하자.

우선 JetsonTX2의 성능을 최대로 변경

$ sudo jetson_clocks   // sudo nvpmodel -m 0

$ sudo jetson_clocks --show
[sudo] password for jetsontx2: 
SOC family:tegra186  Machine:quill
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0 
cpu1: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0 
cpu2: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c6=0 c7=0 
cpu3: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0 
cpu4: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0 
cpu5: Online=1 Governor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200 IdleStates: C1=0 c7=0 
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=40800000 MaxFreq=1866000000 CurrentFreq=1866000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN

$ cat /usr/bin/jetson_clocks   // script 추후 세부분석 
....
 do_hotplug
 do_clusterswitch
 do_cpu
 do_gpu
 do_emc
 do_fan
 do_nvpmodel
......

NVDIA Multimedia API
https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/index.html

1.1 Sample Backend Test

JetPack 3.3 때와 JePack4.2 동일한 Video인 줄 알았는데, 예제 Video가 변경이 되었으며,

실행되는 시간은 오래 걸리기 때문에 인내를 가지고, 실행을 기다리자.

 
$ cd  /usr/src/tegra_multimedia_api
$ ls
argus  data  include  LEGAL  LICENSE  Makefile  README  samples  tools

$ cd samples     // JetPack3.3 과 거의 동일함 
00_video_decode  02_video_dec_cuda  04_video_dec_trt  06_jpeg_decode    08_video_dec_drm        10_camera_recording  13_multi_camera       backend  frontend  v4l2cuda
01_video_encode  03_video_cuda_enc  05_jpeg_encode    07_video_convert  09_camera_jpeg_capture  12_camera_v4l2_cuda  14_multivideo_decode  common   Rules.mk

$ cd backend 

//JetPack 3.3 과 동일하게, HDMI를 연결한 후 테스트를 진행해야 함 

$ ./backend 1 ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 \
    --trt-deployfile ../../data/Model/GoogleNet_one_class/GoogleNet_modified_oneClass_halfHD.prototxt \
    --trt-modelfile ../../data/Model/GoogleNet_one_class/GoogleNet_modified_oneClass_halfHD.caffemodel \
    --trt-proc-interval 1 -fps 10

// --trt-forcefp32 0  옵션이 없어지고, fp16으로 동작

Backend 의 Sample 구조

Gstream 예제 ( not Used TensorRT)

아래와 같이 4 Channel H.264 기반으로 Input으로 받아 H.264를 Decode 하고, Video Image Compositor (VIC) 걸쳐 CUDA를 이용하여
X11 기반으로 OpenGL을 이용하여 재생을 한다.(일반적인 Gstreamer 를 생각하면 되겠다)

Gstream 예제 ( used TensorRT)

상위에서 실행한 실제 Sample의 구조(backend)이며, TensorRT(GIE)를 이용하여 자동차를 구분하는 기능까지 추가해서 동작하는 기능이다.
VIC의 주기능은 주로 영상변환기능(RGB2YUB 변환 or Scale 변환 , 즉 영상 Format의 변화 기능을 담당)이며,이를 TensorRT(GIE, Inference Engine)의 입력포맷에 맞게 데이터를 변환하여 처리한다.

용어

TensorRT (previously known as GPU Inference Engine (GIE))
Video Image Compositor (VIC)

VIC(Video Image Compositor)

https://developer.ridgerun.com/wiki/index.php?title=Xavier/Processors/HDAV_Subsystem/Compositor

BackEnd Sample (상위예제)

https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/nvvid_backend_group.html

Yolo 영상 Test (JetsonHacks)

JetPack 3.3 기준으로 Yolo 영상 TEST이며, 3.3 Frame 이라고 하며, 속도가 너무 느리다.
https://www.youtube.com/watch?v=p1fJFG1S6Sw

1.2 Sample FrontEnd

이 Sample은 EVM JetsonTX2에는 Camera가 기본적으로 존재하기 때문에 동작하며, 다른 Jetson EVM에서는 동작이 될지는 정확하게 모르겠다.
Jetson TX2 EVM은 MIPI로 Camera가 연결이 되어있으며, 거의 구조가 Gstream의 Pipe와 비슷하며, 이를 비교해서 봐야할 것이다.

 
$ cd  /usr/src/tegra_multimedia_api
$ ls
argus  data  include  LEGAL  LICENSE  Makefile  README  samples  tools

$ cd samples     // JetPack3.3 과 거의 동일함 
00_video_decode  02_video_dec_cuda  04_video_dec_trt  06_jpeg_decode    08_video_dec_drm        10_camera_recording  13_multi_camera       backend  frontend  v4l2cuda
01_video_encode  03_video_cuda_enc  05_jpeg_encode    07_video_convert  09_camera_jpeg_capture  12_camera_v4l2_cuda  14_multivideo_decode  common   Rules.mk

$ cd frontend 

//JetPack 3.3 과 동일하게, HDMI를 연결한 후 테스트를 진행, 실시간으로 화면재생  

// 처음 구동시 trtModel.cache 를 생성하기 때문에 시간이 걸린다. 

$ sudo ./frontend --deploy ../../data/Model/GoogleNet_three_class/GoogleNet_modified_threeClass_VGA.prototxt \
       --model ../../data/Model/GoogleNet_three_class/GoogleNet_modified_threeClass_VGA.caffemodel

$ ll  // 아래와 같이 새로 생성된 파일이 존재 (trt.h264 / trtModel.cache / output1.h265 ...)
total 1134276
drwxr-xr-x  2 root root      4096  6월  7 13:00 ./
drwxr-xr-x 20 root root      4096  5월 30 15:17 ../
-rwxr-xr-x  1 root root    784936  5월 30 15:19 frontend*
-rw-r--r--  1 root root     12271  5월 30 15:17 main.cpp
-rw-r--r--  1 root root    157944  5월 30 15:19 main.o
-rw-r--r--  1 root root      2821  5월 30 15:17 Makefile
-rw-r--r--  1 root root 287496172  6월  7 12:59 output1.h265
-rw-r--r--  1 root root 287555462  6월  7 12:59 output2.h265
-rw-r--r--  1 root root 287447012  6월  7 12:59 output3.h265
-rw-r--r--  1 root root      2626  5월 30 15:17 Queue.h
-rw-r--r--  1 root root      3134  5월 30 15:17 StreamConsumer.cpp
-rw-r--r--  1 root root      2854  5월 30 15:17 StreamConsumer.h
-rw-r--r--  1 root root     62112  5월 30 15:19 StreamConsumer.o
-rw-r--r--  1 root root 287331038  6월  7 12:59 trt.h264
-rw-r--r--  1 root root  10003768  6월  7 12:49 trtModel.cache
-rw-r--r--  1 root root     13293  5월 30 15:17 TRTStreamConsumer.cpp
-rw-r--r--  1 root root      3856  5월 30 15:17 TRTStreamConsumer.h
-rw-r--r--  1 root root    341584  5월 30 15:19 TRTStreamConsumer.o
-rwxr-xr-x  1 root root       208  6월  7 12:48 tst.sh*
-rw-r--r--  1 root root      9197  5월 30 15:17 VideoEncoder.cpp
-rw-r--r--  1 root root      3462  5월 30 15:17 VideoEncoder.h
-rw-r--r--  1 root root     83496  5월 30 15:19 VideoEncoder.o
-rw-r--r--  1 root root      4177  5월 30 15:17 VideoEncodeStreamConsumer.cpp
-rw-r--r--  1 root root      2480  5월 30 15:17 VideoEncodeStreamConsumer.h
-rw-r--r--  1 root root    104768  5월 30 15:19 VideoEncodeStreamConsumer.o


// 각 동영상 H.264/ H.265 재생확인


$ sudo ../00_video_decode/video_decode H265 output1.h265 //480P
or 
$ sudo ../02_video_dec_cuda/video_dec_cuda output1.h265 H265 //480p
$ sudo ../02_video_dec_cuda/video_dec_cuda output2.h265 H265  //720p
$ sudo ../02_video_dec_cuda/video_dec_cuda output3.h265 H265  //1080p 
$ sudo ../02_video_dec_cuda/video_dec_cuda trt.h264 H264  //1080p

상위 테스트를 진행후 생긴 영상들의 종류를 아래와 같이 나눠볼 수 있겠다.
아래의 File Sink 부분이 각각의 H.265의 OUTPUT이라고 생각하면된다.
그리고, TensorRT를 걸쳐 직접 Rendering 하고 Display 해주는 부분과 H.264로 저장해주는 부분이다. (trt.h264)

아래의 Flow대로 라면 ,
Jetson TX 카메라의 실시간 영상분석(TensorRT이용하여,Box를 만들어 구분)하여 파일로 저장한다.
정확하게 테스트를 해볼 환경이 되지 않아 이부분을 직접 일일 보드를 가지고 돌아다니면서 다 테스트를 해보지 못했지만,

일단 Box가 실시간으로 생기는 것은 확인은 했지만,아쉽게도 자동차는 잘 구분을 할 줄 알았으나, 구분을 잘 못하는 것 같음.
( 테스트 환경이 잘못될 수도 있음, 일반 자동차사진을 비추고 찍고 테스트함 )

좌측 Argus Camera API

우측 V4L2

https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/l4t_mm_camcap_tensorrt_multichannel_group.html

FrontEnd Example

상위내용설명

https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/l4t_mm_camcap_tensorrt_multichannel_group.html

Frame Buffer 정보

https://devtalk.nvidia.com/default/topic/1017059/jetson-tx2/onboard-camera-dev-video0/

Gstream 관련기능

https://devtalk.nvidia.com/default/topic/1010795/jetson-tx2/v4l2-on-jetson-tx2/
https://devtalk.nvidia.com/default/topic/1030593/how-to-control-on-board-camera-such-as-saving-images-and-videos/

다른 Camera Solution

https://github.com/Abaco-Systems/jetson-inference-gv

1.3 Sample Videe_dec_trt

Backend와 유사한 Sample이지만, 영상으로 보여주지 않고 분석까지만 해주는 Sample 이다.

 
$ cd  /usr/src/tegra_multimedia_api
$ ls
argus  data  include  LEGAL  LICENSE  Makefile  README  samples  tools

$ cd samples     // JetPack3.3 과 거의 동일함 
00_video_decode  02_video_dec_cuda  04_video_dec_trt  06_jpeg_decode    08_video_dec_drm        10_camera_recording  13_multi_camera       backend  frontend  v4l2cuda
01_video_encode  03_video_cuda_enc  05_jpeg_encode    07_video_convert  09_camera_jpeg_capture  12_camera_v4l2_cuda  14_multivideo_decode  common   Rules.mk

$ cd 04_video_dec_trt 

//result.txt result0.txt result1.txt 생성되며, HDMI 연결가능, 다른 모델을 사용했지만, 문제발생 

// 2 Channel 분석 
$ sudo ./video_dec_trt 2 ../../data/Video/sample_outdoor_car_1080p_10fps.h264 \
    ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 \
    --trt-deployfile ../../data/Model/resnet10/resnet10.prototxt \
    --trt-modelfile ../../data/Model/resnet10/resnet10.caffemodel \
    --trt-mode 0


or 
// 1 Channel 분석 
$ sudo ./video_dec_trt 1  ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 \
    --trt-deployfile ../../data/Model/resnet10/resnet10.prototxt \
    --trt-modelfile ../../data/Model/resnet10/resnet10.caffemodel \
    --trt-mode 0


$cat  ../../data/Model/resnet10/labels.txt    // Labeling 확인 
Car
RoadSign
TwoWheeler
Person

//result.txt , result0.txt , result1.txt  생성 (2ch)

$ cat result.txt | head -n 100        // num 0,1,2 모두 생성되며, 1과,2의 정보가 없어 문제 발생  
frame:0 class num:0 has rect:5
 x,y,w,h:0.55625 0.410326 0.040625 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.09375 0.36413 0.223438 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.418478 0.0390625 0.076087

frame:0 class num:1 has rect:0

frame:0 class num:2 has rect:0

frame:1 class num:0 has rect:5
 x,y,w,h:0.55625 0.413043 0.0390625 0.048913
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.09375 0.36413 0.225 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.418478 0.040625 0.076087

frame:1 class num:1 has rect:0

frame:1 class num:2 has rect:0

frame:2 class num:0 has rect:5
 x,y,w,h:0.55625 0.410326 0.0390625 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.0921875 0.36413 0.221875 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.11413
 x,y,w,h:0.403125 0.418478 0.040625 0.076087

frame:2 class num:1 has rect:0

frame:2 class num:2 has rect:0

frame:3 class num:0 has rect:5
 x,y,w,h:0.554688 0.410326 0.0375 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.0921875 0.361413 0.220313 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.421196 0.0390625 0.076087
........

$ cat ./result0.txt | head -n 100    // num0 만 생성 
frame:0 class num:0 has rect:5
 x,y,w,h:0.55625 0.410326 0.040625 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.09375 0.36413 0.223438 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.418478 0.0390625 0.076087

frame:1 class num:0 has rect:5
 x,y,w,h:0.55625 0.413043 0.0390625 0.048913
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.09375 0.36413 0.225 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.418478 0.040625 0.076087

frame:2 class num:0 has rect:5
 x,y,w,h:0.55625 0.410326 0.0390625 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.0921875 0.36413 0.221875 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.11413
 x,y,w,h:0.403125 0.418478 0.040625 0.076087

frame:3 class num:0 has rect:5
 x,y,w,h:0.554688 0.410326 0.0375 0.0516304
 x,y,w,h:0.595312 0.366848 0.0546875 0.0923913
 x,y,w,h:0.0921875 0.361413 0.220313 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.421196 0.0390625 0.076087

frame:4 class num:0 has rect:5
 x,y,w,h:0.55625 0.413043 0.0375 0.0461957
 x,y,w,h:0.596875 0.366848 0.0546875 0.0923913
 x,y,w,h:0.0921875 0.36413 0.220313 0.201087
 x,y,w,h:0.323438 0.413043 0.0984375 0.111413
 x,y,w,h:0.403125 0.421196 0.0375 0.076087
..........

//상위 분석한 정보기반으로 동영상 Play ( Backend와 동일)
$ sudo ../02_video_dec_cuda/video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264  --bbox-file result0.txt

각 영상을 Decoding 한 후 각 Size에 맞게 변환 후에 TensorRT에 적용한 후 Save BBOX 정보

명령어 사용법

$ ./video_dec_trt   // 사용법 1 Channl or 2 Channel Video 입력을 받아 최종으로 result.txt 를 만들어냄 (box 정보)

video_dec_trt [Channel-num]   ...  [options]

Channel-num:
 1-32, Number of file arguments should exactly match the number of channels specified

Supported formats:
 H264
 H265

OPTIONS:
 -h,--help            Prints this text
 --dbg-level   Sets the debug level [Values 0-3]

 --trt-deployfile     set deploy file name
 --trt-modelfile      set model file name
 --trt-mode           0 fp16 (if supported), 1 fp32, 2 int8
 --trt-enable-perf    1[default] to enable perf measurement, 0 otherwise




$  ../02_video_dec_cuda/video_dec_cuda 
video_dec_cuda   [options]

Supported formats:
 H264
 H265

OPTIONS:
 -h,--help            Prints this text
 --dbg-level   Sets the debug level [Values 0-3]

 --disable-rendering  Disable rendering
 --fullscreen         Fullscreen playback [Default = disabled]
 -ww           Window width in pixels [Default = video-width]
 -wh          Window height in pixels [Default = video-height]
 -wx        Horizontal window offset [Default = 0]
 -wy        Vertical window offset [Default = 0]

 -fps            Display rate in frames per second [Default = 30]

 -o         Write to output file

 -f       1 NV12, 2 I420 [Default = 1]

 --input-nalu         Input to the decoder will be nal units
 --input-chunks       Input to the decoder will be a chunk of bytes [Default]
 --bbox-file          bbox file path
 --display-text     enable nvosd text overlay with input string

04_video_dec_trt
상위와 비슷한 예제이며, 직접 Video Input을 받아 처리
  https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/l4t_mm_vid_decode_trt.html

JetsonTX2 Gstreamer
  https://developer.ridgerun.com/wiki/index.php?title=Gstreamer_pipelines_for_Jetson_TX2
  https://elinux.org/Jetson/H264_Codec
  https://developer.ridgerun.com/wiki/index.php?title=NVIDIA_Jetson_TX1_TX2_Video_Latency

Jetson TX2 Gstreamer and OpenCV (python)
  https://jkjung-avt.github.io/tx2-camera-with-python/
  https://devtalk.nvidia.com/default/topic/1025356/how-to-capture-and-display-camera-video-with-python-on-jetson-tx2/

5/31/2019

UFF_SSD 와 Tensorflow 와 개발방법 외 TensorRT 5.0 기능소개

1. TensorRT python3 sample

Jetson TX2의 Jetpack 4.2에서 제공해주는 python 2가 아닌 다른 python3 sample 이며,이를 테스트를 하고자 한다.
UFF는 Format인 것은 알겠는데, SSD가 뭔지를 몰라서 아래와 같이 찾아봤다.

SSD ( Single Shot MultiBox Detector)관련사항

Yolo 처럼 Detection Network 중에 하나이며, 이 Source Tensorflow 기반으로 사용 중
https://ai.google/research/pubs/pub44872
http://openresearch.ai/t/ssd-single-shot-multibox-detector/74

결론적으로 동작은 Tensorflow의 모델을 SSD Model을 가져와서 UFF로 저장 후 TensorRT로 변환하고 이를 동작하는 개념이다.
이전 Yolo 테스트와 유사하며, 기능도 거의 유사하다.

1.1 UFF_SSD Sample

README를 보면, SSD는 Object Detection에서 많이 사용되는 모델이라고 하며, SSD는 크게 두가지로 나뉘어지며, 특징추출과 detection 부분이라고 한다.

convolutional feature extractor like VGG, ResNet, Inception (현재 Inception_v2 사용)
detection part

이외에도 NVIDIA에서는 SSD 관련 example을 제공해주고 있으며, 이를 전부 다 테스트를 진행을 못할 것 같아 아래만 테스트 진행

Python3 -UFF_SSD Example

TensorRT의 example 중 python 부분의 UFF-SSD 확인

$ cd /usr/src/tensorrt/samples/python
$ ls
common.py   end_to_end_tensorflow_mnist  introductory_parser_samples  uff_custom_plugin  yolov3_onnx
common.pyc  fc_plugin_caffe_mnist        network_api_pytorch_mnist    uff_ssd

$ cd uff_ssd 

$ ls
CMakeLists.txt  detect_objects.py  images  plugin  README.md  requirements.txt  utils  voc_evaluation.py

$ cat requirements.txt 
numpy
Pillow
pycuda
requests
tensorflow-gpu

//  README에서 요구하는 Package 설치진행 

$ sudo pip install -r requirements.txt    //tensorflow-gpu version 정의 문제로 찾지못함  
$ python3 -m pip install -r requirements.txt   // tensorflow-gpu version 정의 문제로 찾지못함 
Collecting numpy (from -r requirements.txt (line 1))
Collecting Pillow (from -r requirements.txt (line 2))
Collecting pycuda (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/4d/29/5a3eb66c2f1a4adc681f6c8131e9ed677af31b0c8a78726d540bd44b3403/pycuda-2019.1.tar.gz
Collecting requests (from -r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl
Collecting tensorflow-gpu (from -r requirements.txt (line 5))
  Could not find a version that satisfies the requirement tensorflow-gpu (from -r requirements.txt (line 5)) (from versions: )
No matching distribution found for tensorflow-gpu (from -r requirements.txt (line 5))

// requirement의 tensorflow의 version 명시를 진행하지 않아 설치진행을 못하는 것 같음 
// pip를 진행을 해보면, Collecting 다음에 Building 으로 진행이 되어야 설치진행, 그래서 상위 Package 전부 미설치됨, 별도 설치진행 

$ pip or pip3 search //명령으로 찾으면 tensorflow-gpu (1.13.1) 발견, 혹시 몰라 NVIDIA Site 재확인 (공식버전이 있음) 

$ python3 -m pip install Pillow numpy pycuda requests  // requirment의 Module들을 별도 설치 진행

python3는 현재 상위 requirment가 하나도 설치가 진행되지 않았으며, 관련부분은 별도설치진행

python3 Tensorflow-GPU 별도설치

상위에서 tensorflow-gpu가 설치되지 않아 원인보면, version 미 정의로 설치가 되지 않는 것 같다.
그래서 NVIDIA의 공식 사이트에서 확인하고 관련사항 확인

$ python3 --version  // python3 version 확인 
Python 3.6.7
//  NVIDIA tensorflow-gpu official version install
$ sudo apt-get install libhdf5-serial-dev hdf5-tools // tensorlfow 에서 필요 
$ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.5 --user    //  official release of TensorFlow for Jetson TX2

How To Install Tensorflow-GPU
  https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/
  https://developer.nvidia.com/embedded/downloads#?search=tensorflow
  https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetsontx2/index.html

NVIDIA DeepLearning Frameworks
  https://developer.nvidia.com/deep-learning-frameworks

Jetson Package Download
  https://developer.nvidia.com/embedded/downloads

Jetson How to install Tensorflow Document
  https://docs.nvidia.com/deeplearning/dgx/install-tf-jetsontx2/index.html

Python3 -UFF_SSD Build

README에서 확인가능하며, 그대로 설정

$ pwd
/usr/src/tensorrt/samples/python/uff_ssd

$ sudo  mkdir -p build
$ cd build
$ sudo cmake ..
$ sudo make
$ ls
CMakeCache.txt  CMakeFiles  cmake_install.cmake  libflattenconcat.so  Makefile
$ cd ..
$ pwd
/usr/src/tensorrt/samples/python/uff_ssd

UFF_SSD 위해 이미지 복사 및 테스트

README.md을 읽어보면 다음과 같이 동작한다.

download pretrained ssd_inception_v2_coco_2017_11_17 (tensorflow object detetion API)
이 모델을 TensorRT로 변환되고 Model version 이름이 추가됨
TensorRT inference engine 빌드 후 File로 저장 한다.
TensorRT의 Optimization에서 frozen graph가 추가되며, 이는 Time consumimg 측정

세부내용은 README.md 를 확인하자.

$ sudo cp ../yolov3_onnx/dog.jpg .
$ sudo cp ../yolov3_onnx/cat.jpg .

$ vi detect_objects.py //TensorRT로 동작 (UFF->TensorRT Format) 
import os
import ctypes
import time
import sys
import argparse

import numpy as np
from PIL import Image
import tensorrt as trt

import utils.inference as inference_utils # TRT/TF inference wrappers
import utils.model as model_utils # UFF conversion
import utils.boxes as boxes_utils # Drawing bounding boxes
import utils.coco as coco_utils # COCO dataset descriptors
from utils.paths import PATHS # Path management
...............

$ sudo python3 detect_objects.py dog.jpg  // download ssd_inception_v2_coco_2017_11_17.tar.gz
Preparing pretrained model
Downloading /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17.tar.gz
Download progress [==================================================] 100%
Download complete
Unpacking /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17.tar.gz
Extracting complete
Removing /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17.tar.gz
Model ready
WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/graphsurgeon/StaticGraph.py:123: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING: To create TensorRT plugin nodes, please use the `create_plugin_node` function instead.
UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: "Input"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 1
      }
      dim {
        size: 3
      }
      dim {
        size: 300
      }
      dim {
        size: 300
      }
    }
  }
}
]
=========================================

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_conf as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: GridAnchor_TRT yet.
Converting GridAnchor as custom op: GridAnchor_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_loc as custom op: FlattenConcat_TRT
No. nodes: 563
UFF Output written to /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.uff
UFF Text Output written to /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pbtxt
TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

Building TensorRT engine. This may take few minutes.
TensorRT inference time: 97 ms
Detected bicycle with confidence 98%  // bicycle 발견했지만 에러발생 (libfreetype 문제) 
Traceback (most recent call last):
  File "detect_objects.py", line 193, in module
    main()
  File "detect_objects.py", line 180, in main
    analyze_prediction(detection_out, det * prediction_fields, img_pil)
  File "detect_objects.py", line 87, in analyze_prediction
    color=coco_utils.COCO_COLORS[label]
  File "/usr/src/tensorrt/samples/python/uff_ssd/utils/boxes.py", line 33, in draw_bounding_boxes_on_image
    boxes[i, 3], color, thickness, display_str_list[i])
  File "/usr/src/tensorrt/samples/python/uff_ssd/utils/boxes.py", line 77, in draw_bounding_box_on_image
    font = ImageFont.truetype('arial.ttf', 24)
  File "/home/jetsontx2/.local/lib/python3.6/site-packages/PIL/ImageFont.py", line 280, in truetype
    return FreeTypeFont(font, size, index, encoding, layout_engine)
  File "/home/jetsontx2/.local/lib/python3.6/site-packages/PIL/ImageFont.py", line 136, in __init__
    if core.HAVE_RAQM:
  File "/home/jetsontx2/.local/lib/python3.6/site-packages/PIL/ImageFont.py", line 40, in __getattr__
    raise ImportError("The _imagingft C module is not installed")
ImportError: The _imagingft C module is not installed

$ sudo apt-get install libfreetype6-dev    // libfreetype 문제발생, 설치 진행 
$ pip3 uninstall Pillow;pip3 install --no-cache-dir Pillow  // pillow package 재설치    

$ sudo python3 detect_objects.py dog.jpg   // 재설치 이후 다시 테스트 진행 
TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT  //32bit FLOAT, 16bit HALF
  * Max batch size - 1

Loading cached TensorRT engine from /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/engines/FLOAT/engine_bs_1.buf
TensorRT inference time: 347 ms
Detected bicycle with confidence 98%
Detected dog with confidence 95%
Detected car with confidence 79%
Total time taken for one image: 456 ms
Saved output image to: /usr/src/tensorrt/samples/python/uff_ssd/utils/../image_inferred.jpg

$ eog image_inferred.jpg   // 아래그림 참조

$ sudo python3 detect_objects.py cat.jpg   // 재설치 이후 다시 테스트 진행
TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

Loading cached TensorRT engine from /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/engines/FLOAT/engine_bs_1.buf
TensorRT inference time: 120 ms
Detected cat with confidence 98%
Total time taken for one image: 186 ms

Saved output image to: /usr/src/tensorrt/samples/python/uff_ssd/utils/../image_inferred.jpg

$ eog image_inferred.jpg  // 아래그림 참조

SSH로 테스트와 직접 HDMI 연결하여 테스트를 진행해보면, SSH가 좀 느린 것 같다.
처음에는 300x300만 되는줄 알았는데, 테스트해보니, 얼추 다 동작되는 것 같다.

1.2 VOC TEST 부분

README의 옵션으로 시도 했으며, SSD 모델을 VOC라는 Image들을 이용하여 Training 하여 성능을 향상시키는 것 같다. (세부내용은 README에도 없음)

VOC 내부에는 다양한 Image들이 존재하며 왜 이렇게 존재하는지는 나중에 별도로 알아야 할 것 같다.

$ sudo wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar   //download 가 잘안됨,
$ sudo wget http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar    //다른 mirror 사이트를 찾음,  https://pjreddie.com/projects/pascal-voc-dataset-mirror/

$ sudo tar xvf VOCtest_06-Nov-2007.tar

$ sudo python3 voc_evaluation.py --voc_dir /usr/src/tensorrt/samples/python/uff_ssd/VOCdevkit/VOC2007  // Model은 상위와 동일 
Preprocessing VOC dataset. It may take few minutes.
TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 64

Building TensorRT engine. This may take few minutes.
Infering image 1/4952
Infering image 65/4952
Infering image 129/4952
Infering image 193/4952
Infering image 257/4952
Infering image 321/4952
Infering image 385/4952
Infering image 449/4952
Infering image 513/4952
Infering image 577/4952
Infering image 641/4952
Infering image 705/4952
Infering image 769/4952
Infering image 833/4952
Infering image 897/4952
Infering image 961/4952
Infering image 1025/4952
Infering image 1089/4952
Infering image 1153/4952
Infering image 1217/4952
Infering image 1281/4952
Infering image 1345/4952
Infering image 1409/4952
Infering image 1473/4952
Infering image 1537/4952
Infering image 1601/4952
Infering image 1665/4952
Infering image 1729/4952
Infering image 1793/4952
Infering image 1857/4952
Infering image 1921/4952
Infering image 1985/4952
Infering image 2049/4952
Infering image 2113/4952
Infering image 2177/4952
Infering image 2241/4952
Infering image 2305/4952
Infering image 2369/4952
Infering image 2433/4952
Infering image 2497/4952
Infering image 2561/4952
Infering image 2625/4952
Infering image 2689/4952
Infering image 2753/4952
Infering image 2817/4952
Infering image 2881/4952
Infering image 2945/4952
Infering image 3009/4952
Infering image 3073/4952
Infering image 3137/4952
Infering image 3201/4952
Infering image 3265/4952
Infering image 3329/4952
Infering image 3393/4952
Infering image 3457/4952
Infering image 3521/4952
Infering image 3585/4952
Infering image 3649/4952
Infering image 3713/4952
Infering image 3777/4952
Infering image 3841/4952
Infering image 3905/4952
Infering image 3969/4952
Infering image 4033/4952
Infering image 4097/4952
Infering image 4161/4952
Infering image 4225/4952
Infering image 4289/4952
Infering image 4353/4952
Infering image 4417/4952
Infering image 4481/4952
Infering image 4545/4952
Infering image 4609/4952
Infering image 4673/4952
Infering image 4737/4952
Infering image 4801/4952
Infering image 4865/4952
Infering image 4929/4952
Reading annotation for 1/4952
Reading annotation for 101/4952
Reading annotation for 201/4952
Reading annotation for 301/4952
Reading annotation for 401/4952
Reading annotation for 501/4952
Reading annotation for 601/4952
Reading annotation for 701/4952
Reading annotation for 801/4952
Reading annotation for 901/4952
Reading annotation for 1001/4952
Reading annotation for 1101/4952
Reading annotation for 1201/4952
Reading annotation for 1301/4952
Reading annotation for 1401/4952
Reading annotation for 1501/4952
Reading annotation for 1601/4952
Reading annotation for 1701/4952
Reading annotation for 1801/4952
Reading annotation for 1901/4952
Reading annotation for 2001/4952
Reading annotation for 2101/4952
Reading annotation for 2201/4952
Reading annotation for 2301/4952
Reading annotation for 2401/4952
Reading annotation for 2501/4952
Reading annotation for 2601/4952
Reading annotation for 2701/4952
Reading annotation for 2801/4952
Reading annotation for 2901/4952
Reading annotation for 3001/4952
Reading annotation for 3101/4952
Reading annotation for 3201/4952
Reading annotation for 3301/4952
Reading annotation for 3401/4952
Reading annotation for 3501/4952
Reading annotation for 3601/4952
Reading annotation for 3701/4952
Reading annotation for 3801/4952
Reading annotation for 3901/4952
Reading annotation for 4001/4952
Reading annotation for 4101/4952
Reading annotation for 4201/4952
Reading annotation for 4301/4952
Reading annotation for 4401/4952
Reading annotation for 4501/4952
Reading annotation for 4601/4952
Reading annotation for 4701/4952
Reading annotation for 4801/4952
Reading annotation for 4901/4952
Saving cached annotations to /usr/src/tensorrt/samples/python/uff_ssd/utils/../workspace/annotations_cache/annots.pkl
AP for aeroplane = 0.7817
AP for bicycle = 0.7939
AP for bird = 0.6812
AP for boat = 0.5579
AP for bottle = 0.4791
AP for bus = 0.8383
AP for car = 0.7645
AP for cat = 0.8259
AP for chair = 0.5948
AP for cow = 0.7847
AP for diningtable = 0.6731
AP for dog = 0.7886
AP for horse = 0.8402
AP for motorbike = 0.8103
AP for person = 0.7848
AP for pottedplant = 0.4290
AP for sheep = 0.7474
AP for sofa = 0.7683
AP for train = 0.8429
AP for tvmonitor = 0.7145
Mean AP = 0.7251

상위 테스트들은 전부 Jetson TX2의 Normal 상태에서 테스트를 진행을 했으며, 성능을 더 올리고 싶다면, Clock 부분을 수정하여 재 테스트를 진행하자.

2. TensorBoard 테스트

아직 TensorBoard의 정확한 용도와 사용법을 숙지하지 못하여 실행부분만 실행해본다.

$ cd ~ 
$ mkdir jhlee
$/home/jetsontx2/.local/bin/tensorboard --logdir ~/jhlee   // tensorboard는 상위 tensorflow로 이미 설치됨

browser를 이용하여 JetsonTX2의 ip를 접속 http://10.0.0.174:6006/

TensorBoard (*.PBTX)
https://www.tensorflow.org/guide/graph_viz
https://gusrb.tistory.com/21

3. TensorFlow와 TensorRT 개발방법

아래의 문서를 보면, TensorRT Inference optimization tool with TensorFlow 발표했으며, TensorFlow 1.7 에서 이용가능한 것 같다.

아래의 문서를 기반으로 예제를 Download 받아 테스트를 진행을 해보며 추후 시간이 된다면 더 자세히 문서를 읽고 관련내용들을 숙지한 후 테스트를 진행해본다.

TensorRT를 Tensorflow에서 활용방법

TF-TRT ( Tensorflow 에서 직접 TensorRT 엔진을 사용하는 방법 )
UFF (Tensorflow에서 UFF Format으로 변환한 후 TensorRT로 사용하는 방법)

  https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#build_model

TF-TRT Guide
  https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
  https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#work
  https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt

TF-TRT-TensorBoard
  https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#debug-tools

Tensorflow와 TensorRT 개발
https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/

Example

https://developer.download.nvidia.com/devblogs/tftrt_sample.tar.xz

Tensorflow의 개발 과 Tensorflow와 TensorRT개발 비교

Trained Graph 만들고, 이를 Frozen Graph 변경하고 실행을 한다.
처음 Freeze 라는 용어를 몰랐는데, 일종의 실행 Format으로 변경한다고 생각하면 되겠다.
TensorRT에서는 Serialize라는 용어가 나오는데 이역시 실행될 수 or 저장되어지는Format이라고 생각하면되겠지만, TensorRT의 동작이 정확히 이해는 안되지만, 정리하자.

하지만, Freeze과 Serialize의 차이는 존재하는 것 같으며, 각각 사용되어지는 Framework에서 저장되는 방식이 다른 것 같다.
Framework에서는 호환

아래를 보면, Optimized Plan들이 TensorRT 유저입장에서는 엔진으로 보이며,
Optimized Plan들은 TensorRT Runtime Engine에 의해 De-Serialized되며, 적용이 된다.

3.1 TensorRT Deployment Flow

상위 STEP 1의 Optimize trained model

Layer & Tensor Fusion
복잡한 Layer의 갯수를 최소한으로 줄이는 것이며, 이를 통하여, 많은 Layer가 줄어든다.
성능향상이 된다.

Weights & Activation Precision Calibration
FP32, FP16, INT8 로 변경하면 아래와 같이 Range가 변경이 되며, 최적화를 진행하지만, 이는 TensorRT의 Manual 반드시 참조 (각 Layer와 지원여부 확인)

Kernel Auto Tuning 과 Dynamic Tensor Memory
Kernel 과 Memory관리를 해주는 기능인 것 같은데, 이부분은 정확한 이해를 다른 부분을 이해하고 하자.

UFF Format TensorRT (python)

아래를 보면 이해를 할수 있다. Plan 파일이 Engine파일이며, 이는 TensorRT에 동작된다
.

DEEP LEARNING DEPLOYMENT WITH TENSORRT

상위설명이 자세히 나오며, 이부분은 이해가 될때까지 여러번 시청을 하자.
http://on-demand.gputechconf.com/gtcdc/2017/video/DC7172/
https://youtu.be/6My-daDk4zE?list=PLoS6u5SJMkUk1kk2_WWHfTrANJuYvZNIP

3.2 TensorRT의 장점의 정리

TensorRT는 아래와 같이 다양한 Framework를 지원가능

Caffe -> Caffe Parser
CNTK, mxnet, pytorch, caffe2 -> onnx parser
Tensorflow -> UFF parser or TF-TRT 사용

아래의 정리는 TensorRT 4.0이므로 착각하지말고 , TensorRT 5.0은 기능이 더 개선되었음

3.3 Tensorflow 설치 와 tftrt_sample 실행

python2-Tensorflow-gpu 설치

python2에서 tensorflow를 설치 진행했더니, CUDA Version 9.0 버전문제발생
python3 version 도 제대로 동작이 안됨 (주의, python3 tensorflow-gpu 재설치 진행 )

  $ pip install --extra-index-url=https://developer.download.nvidia.com/compute/redist/jp/v33/ tensorflow-gpu  
// 문제발생 CUDA 9.0 으로 동작하므로 현재 CUDA 10.0하고 PATH가 맞지 않음 
//libcublas.so.9.0: cannot open shared object file: No such file or directory

python2 /3 version 설치방법 있지만, 시도해보지 않음
https://stackoverflow.com/questions/49811510/how-to-install-tensorflow-gpu-for-both-python2-and-python3?rq=1

Example TEST ( python2->python3 변경)

$ tar -xvf tftrt_sample.tar.xz 
$ cd tftrt/

$ cat README 
TRT Tensorflow integration example

Install tensorRT and Tensorflow with TRT contrib. Instructions are available from:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/tensorrt

Run the sample with:
./run_all.sh
It will run through native, FP32, FP16 and INT8 examples


$ cat ./run_all.sh 
#!/bin/bash

python tftrt_sample.py --native --FP32 --FP16 --INT8 \
                       --num_loops 10 \
                       --topN 5 \
                       --batch_size 4 \
                       --workspace_size 2048 \
                       --log_file log.txt \
                       --network resnet_v1_50_frozen.pb \
                       --input_node input \
                       --output_nodes resnet_v1_50/predictions/Reshape_1 \
                       --img_size 224 \
                       --img_file  grace_hopper.jpg
 
$ ./run_all3.sh  // python3로 변경 후 실행 

Namespace(FP16=True, FP32=True, INT8=True, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=2048)
Starting at 2019-06-05 10:36:28.410090
2019-06-05 10:36:28.522480: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-06-05 10:36:28.524851: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x27b6b450 executing computations on platform Host. Devices:
2019-06-05 10:36:28.525015: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): , 
2019-06-05 10:36:28.657662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-05 10:36:28.658228: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x27a69610 executing computations on platform CUDA. Devices:
2019-06-05 10:36:28.658354: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2019-06-05 10:36:28.658837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.02
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 784.20MiB
2019-06-05 10:36:28.658921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-05 10:36:32.198382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 10:36:32.198520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-06-05 10:36:32.198596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-06-05 10:36:32.198931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3926 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
WARNING:tensorflow:From tftrt_sample.py:92: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
INFO:tensorflow:Starting execution
./run_all3.sh: line 13: 31992 Segmentation fault      (core dumped) python3 tftrt_sample.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 4 --workspace_size 2048 --log_file log.txt --network resnet_v1_50_frozen.pb --input_node input --output_nodes resnet_v1_50/predictions/Reshape_1 --img_size 224 --img_file grace_hopper.jpg

일단 상위예제로 Graph가 동작될 줄 알았으나, 문제가 있어 동작되지 않으며, Tensorboard와 연결해서 볼수 있을 줄 알았는데, 디버깅을 해야 할 것 같음

추후 Tensorboard의 graph 부분의 활용법을 알아봐야겠음

4. NVIDIA의 UFF Format 관련사항

TensorRT에서는 현재 3가지 Parser를 제공하여 다른 기반 Platform 의 모델을 가져올수 있다
현재 상위 Tensorflow는 UFF를 사용하므로, 정확한 역할과 관련기능을 정확하게 이해를 해야겠다.

TensorRT의 배포구조
  https://devblogs.nvidia.com/deploying-deep-learning-nvidia-tensorrt/

UFF Parser
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/parsers/Uff/pyUff.html

UFF Converter
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/index.html
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/uff/uff.html

UFF Operator
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/uff/Operators.html

5/30/2019

TensorRT5.0-Yolov3

1. TensorRT 5.0

Jetpack 4.2를 설치를 다 진행을 하고 SDK를 다 설치를 진행을 했으면, 이전 Jetpack 3.3 처럼 기본 Sample을 다시 확인하면,
드디어 python 지원 되는 것을 확인가능하며, C++ 예제도 여전히 존재한다.

이전 SDK보다 Sample의 갯수가 많아졌으며, 추측으로는 Server 기반의 API도 지원가능한 것 같다.
각각의 기능을 다 살펴보려면 Manual과 소스를 조금씩 다 분석을 해봐야겠다.

1.1 TensorRT Sample TEST

이전 처럼 TensorRT Sample 소스로 가서 빌드 후 생긴 각각의 bin 파일들을 테스트 진행

Sampel Build

$ ssh -X jetsontx2@192.168.55.1   // jetson TX2 접속 

jetsontx2@jetsontx2-desktop:~$  ls /usr/src/tensorrt/samples/
common     Makefile.config  sampleFasterRCNN  sampleINT8API  sampleMNISTAPI   sampleOnnxMNIST  sampleUffMNIST
getDigits  python           sampleGoogleNet   sampleMLP      sampleMovieLens  samplePlugin     sampleUffSSD
Makefile   sampleCharRNN    sampleINT8        sampleMNIST    sampleNMT        sampleSSD        trtexec

jetsontx2@jetsontx2-desktop:~$ cd /usr/src/tensorrt/samples
jetsontx2@jetsontx2-desktop:~$ sudo make

Sampel Test

jetsontx2@jetsontx2-desktop:~$ cd ..
bin  data  python  samples

jetsontx2@jetsontx2-desktop:~$ cd ./bin
jetsontx2@jetsontx2-desktop:/usr/src/tensorrt/bin$ ls
chobj                     sample_googlenet        sample_mnist            sample_onnx_mnist        sample_uff_ssd
dchobj                    sample_googlenet_debug  sample_mnist_api        sample_onnx_mnist_debug  sample_uff_ssd_debug
download-digits-model.py  sample_int8             sample_mnist_api_debug  sample_plugin            trtexec
giexec                    sample_int8_api         sample_mnist_debug      sample_plugin_debug      trtexec_debug
sample_char_rnn           sample_int8_api_debug   sample_movielens        sample_ssd
sample_char_rnn_debug     sample_int8_debug       sample_movielens_debug  sample_ssd_debug
sample_fasterRCNN         sample_mlp              sample_nmt              sample_uff_mnist
sample_fasterRCNN_debug   sample_mlp_debug        sample_nmt_debug        sample_uff_mnist_debug

:/usr/src/tensorrt/bin$ ./sample_mlp  // Simple TEST 이며, 다 테스트 진행 



---------------------------



@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@%+-:  =@@@@@@@@@@@@
@@@@@@@%=      -@@@**@@@@@@@
@@@@@@@   :%#@-#@@@. #@@@@@@
@@@@@@*  +@@@@:*@@@  *@@@@@@
@@@@@@#  +@@@@ @@@%  @@@@@@@
@@@@@@@.  :%@@.@@@. *@@@@@@@
@@@@@@@@-   =@@@@. -@@@@@@@@
@@@@@@@@@%:   +@- :@@@@@@@@@
@@@@@@@@@@@%.  : -@@@@@@@@@@
@@@@@@@@@@@@@+   #@@@@@@@@@@
@@@@@@@@@@@@@@+  :@@@@@@@@@@
@@@@@@@@@@@@@@+   *@@@@@@@@@
@@@@@@@@@@@@@@: =  @@@@@@@@@
@@@@@@@@@@@@@@ :@  @@@@@@@@@
@@@@@@@@@@@@@@ -@  @@@@@@@@@
@@@@@@@@@@@@@# +@  @@@@@@@@@
@@@@@@@@@@@@@* ++  @@@@@@@@@
@@@@@@@@@@@@@*    *@@@@@@@@@
@@@@@@@@@@@@@#   =@@@@@@@@@@
@@@@@@@@@@@@@@. +@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

2. Yolov3 python 테스트

TensorRT 5.0 부터 python으로 Yolov3를 onnx 기반에서 지원하며, 아래 테스트를 진행해보면, 결국 TensorRT 용 전용 Data를 생성해서 동작된다.

Python Package 확인

$ dpkg -l | grep TensorRT     // TensorRT 관련 Package 설치확인 
ii  graphsurgeon-tf                               5.0.6-1+cuda10.0                                arm64        GraphSurgeon for TensorRT package
ii  libnvinfer-dev                                5.0.6-1+cuda10.0                                arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                            5.0.6-1+cuda10.0                                all          TensorRT samples and documentation
ii  libnvinfer5                                   5.0.6-1+cuda10.0                                arm64        TensorRT runtime libraries
ii  python-libnvinfer                             5.0.6-1+cuda10.0                                arm64        Python bindings for TensorRT
ii  python-libnvinfer-dev                         5.0.6-1+cuda10.0                                arm64        Python development package for TensorRT
ii  python3-libnvinfer                            5.0.6-1+cuda10.0                                arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                        5.0.6-1+cuda10.0                                arm64        Python 3 development package for TensorRT
ii  tensorrt                                      5.0.6.3-1+cuda10.0                              arm64        Meta package of TensorRT
ii  uff-converter-tf                              5.0.6-1+cuda10.0                                arm64        UFF converter for TensorRT package

TensorRT 설치확인 (주의 x86기반 설명이므로 참고만)
https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html

Python Package 설치

$ sudo apt-get install -y python-pip python-dev  // python2    상위소스가 python 2이므로 이것만설치 
$ sudo apt-get install -y python3-pip python3-dev // python3  , 다른 소스는 python3 이므로 설치  

$ pip install wget

//onnx 에러 해결을 위해 아래와 같이 설치 
$ sudo apt-get install cmake  //  onnx에서 필요 
$ sudo apt-get install build-essential // build 관련부분 전체설치 
$ sudo apt-get install protobuf-compiler libprotoc-dev  //

$ pip install onnx     // 설치시 cmakd 및 probuf 설치가 필요 1.5.0 
$ pip uninstall onnx; pip install onnx==1.4.1  // 상위 onnx가 NVIDIA에서 알려진 이슈로 나옴,  
$ pip uninstall onnx; pip install onnx=1.2.2    // 진행 했지만 설치 안됨 
$ pip install Pillow==2.2.1  // libjpeg 설치안해서 아래의 소스에서 에러  
$ sudo apt-get install libjpeg-dev  // libjpeg 설치  후 PIL 재설치 및 설정  
$ pip uninstall Pillow; pip install --no-cache-dir -I pillow  // Pillow==6.0.0 변경

pip 설치 및 tensorflow 기본설치
https://github.com/jetsonhacks/installTensorFlowJetsonTX

onnx 설치시 문제발생
https://github.com/onnx/onnx-tensorrt/issues/62

  https://github.com/onnx/onnx/issues/389

onnx nvidia issue 사항 확인
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-release-notes/tensorrt-5.html
  https://devtalk.nvidia.com/default/topic/1047487/tensorrt-5-0-2-6-yolov3_onnx-sample-error-/

PIL ( jpeg decode 사용하므로 먼저설치필요)
  https://stackoverflow.com/questions/29649941/pil-decoder-jpeg-not-available-raspberry

기타정보
  https://blog.csdn.net/xxradon/article/details/89160576

2.1 python yolov3_onnx test 진행

상위에서 필요한 python pakcage들을 전부 설치를 해야 동작가능하며, 나의 경우는 테스트하면서 필요한 Package를 설치하여 상위와 같이 적었다.

README 를 반드시 읽고 관련사항 숙지
- YOLOv3 -> ONNX -> TensorRT 로 최종변환

$ cd /usr/src/tensorrt/samples/python
$ ls
common.py                    fc_plugin_caffe_mnist        network_api_pytorch_mnist  uff_ssd
end_to_end_tensorflow_mnist  introductory_parser_samples  uff_custom_plugin          yolov3_onnx

$ cd yolov3_onnx/

$ vi README.md  // python 3는 미지원 확인 및 아래 명령들 확인, 각 python 기능확인

$ cat requirements.txt   // 설치되어있어야 하는 Python Package
numpy>=1.15.1
onnx
pycuda>=2017.1.1
Pillow>=5.2.0
wget>=3.2

$ python2 -m pip install -r requirements.txt  // python2 onnx 1.5.0 설치됨 (다시별도로  1.4.1 변경 후 진행)  
$ python3 -m pip install -r requirements.txt  // python3  (나는 아직 설치 안함)


//python2이며 동작되면, yolov3.weight yolov3.cfg download 진행후 onnx 변환 
$ sudo python yolov3_to_onnx.py    // onnx 1.4.1 동작 ,   
Layer of type yolo not supported, skipping ONNX node generation.
Layer of type yolo not supported, skipping ONNX node generation.
Layer of type yolo not supported, skipping ONNX node generation.
graph YOLOv3-608 (
  0_net[FLOAT, 64x3x608x608]
) initializers (
  1_convolutional_bn_scale[FLOAT, 32]
  1_convolutional_bn_bias[FLOAT, 32]
  1_convolutional_bn_mean[FLOAT, 32]
  1_convolutional_bn_var[FLOAT, 32]
  1_convolutional_conv_weights[FLOAT, 32x3x3x3]
  2_convolutional_bn_scale[FLOAT, 64]
  2_convolutional_bn_bias[FLOAT, 64]
  2_convolutional_bn_mean[FLOAT, 64]
  2_convolutional_bn_var[FLOAT, 64]
  2_convolutional_conv_weights[FLOAT, 64x32x3x3]
  3_convolutional_bn_scale[FLOAT, 32]
  3_convolutional_bn_bias[FLOAT, 32]
  3_convolutional_bn_mean[FLOAT, 32]
  3_convolutional_bn_var[FLOAT, 32]
  3_convolutional_conv_weights[FLOAT, 32x64x1x1]
  4_convolutional_bn_scale[FLOAT, 64]
  4_convolutional_bn_bias[FLOAT, 64]
  4_convolutional_bn_mean[FLOAT, 64]
  4_convolutional_bn_var[FLOAT, 64]
  4_convolutional_conv_weights[FLOAT, 64x32x3x3]
  6_convolutional_bn_scale[FLOAT, 128]
  6_convolutional_bn_bias[FLOAT, 128]
  6_convolutional_bn_mean[FLOAT, 128]
  6_convolutional_bn_var[FLOAT, 128]
..........
  %104_convolutional_conv_weights[FLOAT, 128x256x1x1]
  %105_convolutional_bn_scale[FLOAT, 256]
  %105_convolutional_bn_bias[FLOAT, 256]
  %105_convolutional_bn_mean[FLOAT, 256]
  %105_convolutional_bn_var[FLOAT, 256]
  %105_convolutional_conv_weights[FLOAT, 256x128x3x3]
  %106_convolutional_conv_bias[FLOAT, 255]
  %106_convolutional_conv_weights[FLOAT, 255x256x1x1]
...
) {
  1_convolutional = Conv[auto_pad = u'SAME_LOWER', dilations = [1, 1], kernel_shape = [3, 3], strides = [1, 1]](0_net, 1_convolutional_conv_weights)
  1_convolutional_bn = BatchNormalization[epsilon = 1e-05, momentum = 0.99](1_convolutional, 1_convolutional_bn_scale, 1_convolutional_bn_bias, 1_convolutional_bn_mean, 1_convolutional_bn_var)
  1_convolutional_lrelu = LeakyRelu[alpha = 0.1](1_convolutional_bn)
  2_convolutional = Conv[auto_pad = u'SAME_LOWER', dilations = [1, 1], kernel_shape = [3, 3], strides = [2, 2]](1_convolutional_lrelu, 2_convolutional_conv_weights)
...
  %103_convolutional_bn = BatchNormalization[epsilon = 1e-05, momentum = 0.99](%103_convolutional, %103_convolutional_bn_scale, %103_convolutional_bn_bias, %103_convolutional_bn_mean, %103_convolutional_bn_var)
  %103_convolutional_lrelu = LeakyRelu[alpha = 0.1](%103_convolutional_bn)
  %104_convolutional = Conv[auto_pad = u'SAME_LOWER', dilations = [1, 1], kernel_shape = [1, 1], strides = [1, 1]](%103_convolutional_lrelu, %104_convolutional_conv_weights)
  %104_convolutional_bn = BatchNormalization[epsilon = 1e-05, momentum = 0.99](%104_convolutional, %104_convolutional_bn_scale, %104_convolutional_bn_bias, %104_convolutional_bn_mean, %104_convolutional_bn_var)
  %104_convolutional_lrelu = LeakyRelu[alpha = 0.1](%104_convolutional_bn)
  %105_convolutional = Conv[auto_pad = u'SAME_LOWER', dilations = [1, 1], kernel_shape = [3, 3], strides = [1, 1]](%104_convolutional_lrelu, %105_convolutional_conv_weights)
  %105_convolutional_bn = BatchNormalization[epsilon = 1e-05, momentum = 0.99](%105_convolutional, %105_convolutional_bn_scale, %105_convolutional_bn_bias, %105_convolutional_bn_mean, %105_convolutional_bn_var)
  %105_convolutional_lrelu = LeakyRelu[alpha = 0.1](%105_convolutional_bn)
  %106_convolutional = Conv[auto_pad = u'SAME_LOWER', dilations = [1, 1], kernel_shape = [1, 1], strides = [1, 1]](%105_convolutional_lrelu, %106_convolutional_conv_weights, %106_convolutional_conv_bias)
  return %082_convolutional, %094_convolutional, %106_convolutional

//python 2이며, onnx에서 tensorRT 변환 (README참조) 
$ sudo python onnx_to_tensorrt.py   // PIL Module 6.0.0  과 jpeglib 필요 
$ sudo python onnx_to_tensorrt.py 
Loading ONNX file from path yolov3.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file yolov3.onnx; this may take a while...
Completed creating Engine
Running inference on image dog.jpg...
[[135.04631129 219.14287094 184.31729756 324.86083388]
 [ 98.95616386 135.5652711  499.10095358 299.16207424]
 [477.88943795  81.22835189 210.86732516  86.96319981]] [0.99852328 0.99881124 0.93929232] [16  1  7]
Saved image with bounding boxes of detected objects to dog_bboxes.png

$ eog dog_bboxes.png  // box와 label 이 생김 

$ ls
coco_labels.txt     data_processing.pyc  dog.jpg              README.md         yolov3.cfg   yolov3_to_onnx.py   yolov3.trt
data_processing.py  dog_bboxes.png       onnx_to_tensorrt.py  requirements.txt  yolov3.onnx  yolov3_to_onnx.pyc  yolov3.weights

$ sudo python onnx_to_tensorrt.py   // 두번째 실행하면, onnx->trt 변환이 필요 없으므로, 빨리 실행 
Reading engine from file yolov3.trt
Running inference on image dog.jpg...
[[135.04631129 219.14287094 184.31729756 324.86083388]
 [ 98.95616386 135.5652711  499.10095358 299.16207424]
 [477.88943795  81.22835189 210.86732516  86.96319981]] [0.99852328 0.99881124 0.93929232] [16  1  7]
Saved image with bounding boxes of detected objects to dog_bboxes.png.

$ vi coco_labels.txt  // 분류가능한 것 정보가 나온다. 

$ pip list   // pip는 기본으로 pip2로 설정되며, python2 Package Version ( pip3, python3)
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
appdirs (1.4.3)
asn1crypto (0.24.0)
atomicwrites (1.3.0)
attrs (19.1.0)
configparser (3.7.4)
contextlib2 (0.5.5)
cryptography (2.1.4)
decorator (4.4.0)
enum34 (1.1.6)
funcsigs (1.0.2)
gps (3.17)
graphsurgeon (0.3.2)
idna (2.6)
importlib-metadata (0.17)
ipaddress (1.0.17)
keyring (10.6.0)
keyrings.alt (3.0)
Mako (1.0.10)
MarkupSafe (1.1.1)
more-itertools (5.0.0)
numpy (1.16.4)
onnx (1.4.1)
pathlib2 (2.3.3)
Pillow (6.0.0)
pip (9.0.1)
pluggy (0.12.0)
protobuf (3.8.0)
py (1.8.0)
pycairo (1.16.2)
pycrypto (2.6.1)
pycuda (2019.1)
pygobject (3.26.1)
pytest (4.5.0)
pytools (2019.1.1)
pyxdg (0.25)
scandir (1.10.0)
SecretStorage (2.3.1)
setuptools (41.0.1)
six (1.12.0)
tensorrt (5.0.6.3)
typing (3.6.6)
typing-extensions (3.7.2)
uff (0.5.5)
unity-lens-photos (1.0)
wcwidth (0.1.7)
wget (3.2)
wheel (0.30.0)
zipp (0.5.1)

결론적으로 yolo ->onnx ->TensorRT 변환시간이 많이 걸리며, 한번 변한면 속도는 괜찮다.

다른 그림 테스트 진행

간단히 소스를 수정해서 다른 그림들을 손쉽게 테스트가 가능하며, 지금까지 나는 최상의 성능으로 테스트를 진행하지 않았다.
JetsonTX2가 Pan이 안돌면 보통상태이다.

$ sudo cp ~/download/*.jpg     // 크롬을 이용하여 자동차 , 고양이 사진 download 
$ sudo cp onnx_to_tensorrt.py jhleetest.py  // 다른 그림 테스트 용 (권한 root )

$ sudo vi jhleetest.py  // 관련부분 car.jpg 수정 
$ sudo python jhleetest.py  // 빠른실행 Car 변환 
Reading engine from file yolov3.trt
Running inference on image car.jpg...
[[ 120.05153578  152.42545467  966.89172317  486.4402317 ]
 [  89.13414976  131.88476328 1018.99139214  434.55479845]] [0.96183921 0.78680305] [2 7]
Saved image with bounding boxes of detected objects to car_bboxes.png.

$ sudo vi jhleetest.py  // 관련부분 cat.jpg 수정
$ sudo python jhleetest.py //빠른실행
Reading engine from file yolov3.trt
Running inference on image cat.jpg...
[[113.97585209  53.73459241 781.95893924 365.30765023]] [0.85985616] [15]
Saved image with bounding boxes of detected objects to cat_bboxes.png.

$ eog car_bboxes.png
$ eog cat_bboxes.png

최상의 상태에서 테스트 진행

아래의 Command를 먼저 주고 실행

$ sudo nvpmodel -m 0
$ sudo ~/jetson_clocks.sh

jetson_clock.sh (Jetpack 4.2)
  https://devtalk.nvidia.com/default/topic/1049117/jetson-agx-xavier/jetpack-4-2-missing-jetson_clocks-sh-/

관련사항
  https://devtalk.nvidia.com/default/topic/1047018/tensorrt/yolov3_to_onnx-py-sample-failure/

TensorRT backend for ONNX
  https://github.com/onnx/onnx-tensorrt#tests

NVIDIA Multimedia
https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-281/index.html

2.2 ONNX Model

ONNX Model에 관련된 아래의 예제가 별도로 있으며, 아래의 예제는 추후 시간이 되면 실행을 해보고 관련된 내용을 습득한다.

아래의 문서를 읽어보면, Profile 에 관한내용 및 Optimization 및 좋은 내용이 많이 있으므로 추후 반드시 관련내용을 알아두자.

How to Speed Up Deep Learning Inference Using TensorRT
  https://devblogs.nvidia.com/speed-up-inference-tensorrt/

3. Visionworks Sample

사용할지 안할지 모르겠지만, Jetson TX2에 아래의 같은 Sample이 존재하여 더 첨부해서 넣는다.
Visionworks를 사용하는 예제이며, Jetsonhacks에서 제공하는데라 따라하면, 쉽게 실행을 할수 있다.

https://www.youtube.com/watch?v=tFrrCrSTCig
https://www.youtube.com/watch?v=KROP46Wte4Q&t=552s

피드 구독하기: 글 ( Atom )