Jeonghun (James) Lee

8/02/2019

Deepstream SDK 4.0 변화 및 PlugIn 구조 및 생성방법 ( Gstreamer 변화 )

1. DeepStream SDK 4.0 변화 (Gstreamer 변화)

기존 DeepStream SDK 3.0과 호환되지 않는 부분이 많으며, 우선 빨리 파악하기 위해서 PlugIn Manual 과 소스를 분석하여 어떻게 변경되었는지 알아야겠다.
기존의 DeepStream SDK 3.0에서 동작되었던 , Gstreamer 명령어들이 동작되지 않는 것들이 많다.

참고로, NVIDIA 관련 내용은 NVIDIA Site의 Guide 관련 내용과 TEST 관련 내용 기반이다.

더 세부사항들은 NVIDIA 사이트에서 직접 확인 하시길

DeepStream 관련전체문서 (필독)

https://docs.nvidia.com/metropolis/index.html

상위 전체문서 중에 많이 보게될 문서는 아래 3 문서가 될 것 같다.

DeepStream Release Note

이전 DeepStream 버전과 변경사항 및 x86과 Jetson의 차이와 신기능들을 확인하자
https://docs.nvidia.com/metropolis/deepstream/4.0/DeepStream_4.0_Release_Notes.pdf

DeepStream Quick Guide 기본사용법

설치는 sdkmanger로 쉽게 하면될 것이고, 개발 및 관련 설명을 쉽게 정리해서 보기 편하다
https://docs.nvidia.com/metropolis/deepstream/4.0/dev-guide/index.html

DeepStream 개발시 PlugIn Manual 과 DeepStream API

DeepStream 관련부분을 개발할 경우, PlugIn의 정보와 기능을 비롯하여 내부에서 사용하는 API들을 알아야하는데 관련문서들이므로, 필수로 보자
https://docs.nvidia.com/metropolis/deepstream/4.0/DeepStream_Plugin_Manual.pdf
https://docs.nvidia.com/metropolis/deepstream/4.0/dev-guide/DeepStream_Development_Guide/baggage/index.html

1.1 Jetson AGX Xavier 의 INT8 특징

다른 Jetson과 다른게 DLA라는 것이 존재하며, 이는 INT8 Inference기능을 제공을 하고 있다.
이외에도 OpenVX 기능도 존재하지만, 이부분이 OpenCV에도 적용이 되는지는 좀 더 알아봐야할 것 같다.
DeepStream 4.0부터 지원되는 기능은 아니며, 기존부터 존재했다고 하지만, Xavier를 처음 사용하기에 이 를 간단히 정리하며, Jetson Nano , TX2는 이 부분에서 제외

INT8 Inferece 관련문서 (Jetson AGX Xavier 지원)

http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

INT8기능이 지원이 되려면 아래와 같이 GPU의 SM 61 Version 이상이어야 하며, 아래와 같은 연산이 지원되어야 가능하다

SM61 관련내용

https://devtalk.nvidia.com/default/topic/1026069/jetson-tx2/how-to-use-int8-inference-with-jetsontx2-tensorrt-2-1-2/

INT8 Inferece의 목적은 FP32/FP16에 비해 정확성 손실이 크게 없이 빠른계산을 위해 INT8로 변경하여 속도향상이 목적이다

Bias의 필요성 ( 절대값이 아닌 상대값으로 보면, 필요가 없어질것으로 추측 )

하지만 아래와 같이 2개의 곱에서 전체의 Bias가 필요가 없어지는데, 이 부분이 좀 혼동이 된다
이 부분은 좀 알아봐야 할 것 같다.

아래의 FP32 Bias가 불필요해서 제거한다고 함

양자화(Quantization, 비율로 INT8에 맞게 양자화 진행 )

아래와 같이 양자화할 경우 Threshold를 설정하여 Saturation 을 조절

INT8 Inference의 정확성 비교

상위 INT8의 문서를 간단히 정리하면, Weight는 그대로 두고, Bias를 제거후 일종의 Hash Table 같은 것을 만들어서
FP32를 INT8로 Table를 통해 Mapping하는 방식으로 구현한다 (양자화)
Bias의 불필요성은 값을 절대값이 아닌 상대값으로 보기 때문에 필요가 없어지는 것 같으며, 상위문서를 잘 봐도 크게 데이타 손실은 없을 것 같다.(추측)

재미있는 부분은 양자화할때의 정확성부분이며, 이때 Threshold를 설정하여 Saturation 을 조절도 가능하다는 점이다.
그리고, 불필요하다면, Threshold를 설정하여, 잘라 내어 제거한다
이 부분의 필요성이 언제 필요한지는 추후에 알아봐야할 것 같다

양자화(Quantization) 할때 Mapping시 Hash Table 사용했는지는 모르지만, 예전에 내가 비슷한 것을 구현했던 경험이 있어,
Hash Table을 이용했기때문에, 나라면 Hash Table을 이용했을 것 같다.

INT8의 Inference의 기능도 꽤 재미있는 기능이며, 이 부분에 관심이 많아졌다.
다만 상위문서를 설명을 듣고 싶은데 문서로만 봐서 안타까울 뿐이다.

Config File의 IN8 Inference 확인사항

model-engine-file : TensorRT model-engine (serialized 된 상태의 INT8)
int8-calib-file : CalibrationTable File 이며 각 TensorRT의 Version 정보표시
network-mode : 0=FP32, 1=INT8, 2=FP16 mode , 처음 model-engine이 없을 경우 이 기준으로 생성

Config File의 Example for Jetson AGX Xavier

model-engine-file=model_b1_int8.engine
int8-calib-file=yolov3-calibration.table.trt5.1
int8-calib-file=../../models/Primary_Detector/cal_trt4.bin

1.2 DeepStream SDK 3.0 과 4.0 비교

3.0에서 4.0으로 변경되면서 많은 기능이 추가되었지만 호환되지 않는 부분이 많이 생겨, 관련부분을 정리가 필요할 것 같다.
기존 DS3.0에서 이것저것 만들어보고 Porting해보고 했는데, DS4.0에서 많이 지원되는 것 같은데, 관련부분도 다 테스트를 해야한다.

NVIDIA의 문서를 보면 가장 큰 변화사항은 Jetson 과 dGPU Platform 기반의 단일화된 변화라고 하는데, 간단히 정리하면, 최적화를 통한 성능향상이 될 것 같다.
세부사항은 역시 PlugIn Manual로 다 봐야 알겠다.

더불어 이제 x86만 지원가능했던 NGC Docker도 ARM에서도 지원을 해주기 때문에 설치환경이 편하게 될 것 같다.

Gst-nvinfer 변화정리

UFF/ONNX/Caffe 이외의 Custom Model 위한 New Interface제공 (TensorRT IPlugin)
Segmentation/Gray model 지원
FP16 / INT8 DLA (Jetson Xavier) 지원 (기존 INT8만 지원)
Source Code 제공 (이 부분은 나중에 분석)

New PlugIns

Gst-V4L2 기반의 H265+H264 encode 와 decode 지원 ( 기존과 변경됨)
JPEG+MJPEG decoder 지원
gst-nvvideoconver (기존 gst-nvconv 확장)
gst-nvof ( Optical flow )
nvofvisual / nvsegvisul 지원을 해준다고 하는데, 설정으로 테스트 진행을 해봐야겠다.
dewarper 도 제공해주며, gst-msgbroker도 많이 확장되었다.
이외 기존 Plugin들의 이름이 호환되지 않는다.

자세한 내용 아래의 Release Note를 참고해서 보자
https://docs.nvidia.com/metropolis/deepstream/4.0/DeepStream_4.0_Release_Notes.pdf

1.3 DeepStream 4.0의 PlugIn 관련사항

DS4.0 PlugIn Manual
https://docs.nvidia.com/metropolis/deepstream/4.0/DeepStream_Plugin_Manual.pdf

기존처럼 gst-inspect를 이용하여 PlugIn 기능확인을 할 수 없기 때문에 오직 상위 Manual로 세부사항을 알아야겠다.

gst-inspect 명령어로 Element 기능확인

Terminal에서 확인이 잘되지만, SSH로 연결시 문제가 발생하는 부분이 gst-inspect 부분이다.

$ ssh  nvidia@192.168.55.1  
$ echo $DISPLAY  // 설정이 없음 

$ gst-inspect-1.0 -a   // 모든 PlugIn 확인가능    
.......
$ gst-inspect-1.0 -a |  grep dsexample     
.....
$ gst-inspect-1.0 dsexample    
Factory Details:
  Rank                     primary (256)
  Long-name                DsExample plugin
  Klass                    DsExample Plugin
......
GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstBaseTransform
                         +----GstDsExample


$ ssh -X  nvidia@192.168.55.1         // X Protocol 지원 

$ echo $DISPLAY   // 상위와 다르게 설정되었으며, 이로 인해 오작동됨  
localhost:10.0

$ export DISPLAY=:1   // 1 or 0 설정  반드시 =:를 사용  

$ gst-inspect-1.0 dsexample    
Factory Details:
  Rank                     primary (256)
  Long-name                DsExample plugin
  Klass                    DsExample Plugin
......
GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstBaseTransform
                         +----GstDsExample

Gstreamer gst-inspect
https://gstreamer.freedesktop.org/documentation/tools/gst-inspect.html?gi-language=c#

SSH에서 gst-inspect 사용시 주의 사항

만약 SSH X Protocl 과 같이 접속시 상위의 DISPLAY를 미설정시에는 상위 Command 미동작

미동작원인을 몰랐는데, NVIDIA에서 정확히 알려줘서 해결
https://devtalk.nvidia.com/default/topic/1058525/deepstream-sdk/gst-inspect-is-not-work-properly-in-ds4-0/post/5368380/#5368380

Ubuntu DISPLAY 관련설정
https://help.ubuntu.com/community/EnvironmentVariables

만약 문제생길 경우 아래와 같이 Cache 삭제

$ rm ~/.cache/gstreamer-1.0/registry.aarch64.bin   //문제가 생기면, 아래와 같이 Gstreamer Cache를 지우고 다시 해보자. 
......

1.4 DeepStream PlugIn 구조 및 위치확인

이전 DS SDK 3.0과 동일하며, 아래의 소스에서 Sample PlugIn을 선택해서 이름을 변경해서 Sample을 만들고 테스트를 진행하자.

DeepStream Gst-PlugIn 예제 구성

아래와 같이 Gstreamer 의 PlugIN 구조를 파악을 하고 예제로 주어진 dsexample을 이름을 변경하여 만들어서 간단히 테스트를 진행하면된다.

$ cd ~/deepstream-4.0/sources/gst-plugins
$ tree .
.
├── gst-dsexample   // 이것 기준으로 동일하게 이름을 변경해서 테스트 진행 
│   ├── dsexample_lib
│   │   ├── dsexample_lib.c
│   │   ├── dsexample_lib.h
│   │   ├── dsexample_lib.o
│   │   ├── libdsexample.a
│   │   └── Makefile
│   ├── gstdsexample.cpp
│   ├── gstdsexample.h
│   ├── gstdsexample.o
│   ├── libnvdsgst_dsexample.so
│   ├── Makefile
│   └── README
├── gst-nvinfer                 // 새로 추가된 nvinfer 
│   ├── gstnvinfer_allocator.cpp
│   ├── gstnvinfer_allocator.h
│   ├── gstnvinfer_allocator.o
│   ├── gstnvinfer.cpp
│   ├── gstnvinfer.h
│   ├── gstnvinfer_meta_utils.cpp
│   ├── gstnvinfer_meta_utils.h
│   ├── gstnvinfer_meta_utils.o
│   ├── gstnvinfer.o
│   ├── gstnvinfer_property_parser.cpp
│   ├── gstnvinfer_property_parser.h
│   ├── gstnvinfer_property_parser.o
│   ├── libnvdsgst_infer.so
│   ├── Makefile
│   └── README
├── gst-nvmsgbroker
│   ├── gstnvmsgbroker.c
│   ├── gstnvmsgbroker.h
│   ├── gstnvmsgbroker.o
│   ├── libnvdsgst_msgbroker.so
│   ├── Makefile
│   └── README
└─── gst-nvmsgconv
      ├── gstnvmsgconv.c
      ├── gstnvmsgconv.h
      ├── gstnvmsgconv.o
      ├── libnvdsgst_msgconv.so
      ├── Makefile
      └── README

Gst PlugIn 및 DeepStream PlugIn 위치파악

DeepStream Plugin 위치 및 Gstreamer PlugIn 위치를 알아보기 위해 아래와 같이 찾아보았다.

$ ls /opt/nvidia/deepstream/deepstream-4.0/lib/gst-plugins/    //DeepStream PlugIn만 설치위치확인  
libnvdsgst_dewarper.so   libnvdsgst_msgbroker.so    libnvdsgst_multistreamtiler.so  libnvdsgst_osd.so        libnvdsgst_tracker.so
libnvdsgst_dsexample.so  libnvdsgst_msgconv.so      libnvdsgst_of.so          libnvdsgst_infer.so      libnvdsgst_multistream.so  libnvdsgst_ofvisual.so       
libnvdsgst_segvisual.so

$ cat /etc/ld.so.conf.d/deepstream.conf              //DeepStream 동적 Library 연결확인 
/opt/nvidia/deepstream/deepstream-4.0/lib

$ echo $PATH   // PATH는 아시다시피, BIN파일을 어느위치에서 실행가능한 환경변수 
/usr/local/cuda-10.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

$ echo $LD_LIBRARY_PATH   // LD의 동적 LIBRARY_PATH로 상위 ld.so.conf 설정도 참조 
/usr/local/cuda-10.0/lib64:

/*
상위 Gstreamer 환경변수 참조하여 관련설정 전부확인했으나 파악실패 
*/
 
$ echo $GST_PLUGIN_PATH  // 설정없음   ,
$ echo $GST_PLUGIN_PATH_1_0 //설정없음 
$ echo $GST_PLUGIN_SYSTEM_PATH  // 설정없음 
$ echo $GST_PLUGIN_SYSTEM_PATH_1_0  // 설정없음
$ ls ~/.local/share/gstreamer-1.0/presets/    // 아무것도 없음 

/*
 Gstreamer PlugIn을 위치를 직접 찾겠다. 
*/

$ find / -name gstreamer-1.0 2> /dev/null    // 관련 Directory 파악완료 
/usr/share/gstreamer-1.0
/usr/include/gstreamer-1.0
/usr/lib/aarch64-linux-gnu/gstreamer1.0/gstreamer-1.0
/usr/lib/aarch64-linux-gnu/gstreamer-1.0
/home/nvidia/.cache/gstreamer-1.0
/home/nvidia/.local/share/gstreamer-1.0

$ cat /home/nvidia/.cache/gstreamer-1.0/registry.aarch64.bin   // 이 안에 파일을 분석하면, directory 구조 파악가능 

$ ls /usr/lib/aarch64-linux-gnu/gstreamer-1.0  // 다른 Gstreamer PlugIn 부분과 DeepStream Plugin 연결 확인완료 (DeepStream로 심볼링크됨)
deepstream                 libgstcurl.so                libgstisomp4.so            libgstomx.so              libgsttaglib.so
include                    libgstcutter.so              libgstivfparse.so          libgstopenal.so           libgsttcp.so
libcluttergst3.so          libgstdashdemux.so           libgstivtc.so              libgstopenexr.so          libgstteletext.so
libgst1394.so              libgstdc1394.so              libgstjack.so              libgstopenglmixers.so     libgsttheora.so
..............

1.5 DeepStream의 PlugIn 개발

상위와 같이 기본동작구성을 알았으니, 기본으로 Gstreamer PlugIn 관련 개발 Manual을 숙지해두고 알아두자

Gstreamer PlugIn 개발 (필독)

https://gstreamer.freedesktop.org/documentation/plugin-development/basics/boiler.html?gi-language=c

Gstreamer PlugIn 개발시 Pad 부분 (필독)

https://gstreamer.freedesktop.org/documentation/plugin-development/basics/pads.html?gi-language=c

Gstreamer 의 Properites 설정 (필독)

https://gstreamer.freedesktop.org/documentation/plugin-development/basics/args.html?gi-language=c

이외 Callback 함수들 연결

Chain function: chain 함수를 만들어서 Callback 으로 호출하는데, 내부 Data 처리할때 사용
Event Function: Pad에게 Callback Function 넣고 State에 따라 pad에게 event 생성가능
Query Function: Query를 받았을 때 Callback

Gstreamer Write Guide

PlugIn 구조를 세부적으로 알기위해서 아래의 Write Guide를 좀 자세히 보자
https://gstreamer.freedesktop.org/documentation/plugin-development/index.html?gi-language=c

SAMPLE의 Gstreamer 구성의 예

Gstreamer PlugIn의 함수는SAMPLE이라는 이름으로 생성하고자 한다면 아래와 같이 만들면 된다.
함수이름 역시 gst_sample_xxx으로 구성을 하면된다.

 vi sample.h 
G_BEGIN_DECLS
....
#define GST_TYPE_SAMPLE (gst_sample_get_type())
#define GST_SAMPLE(obj) (G_TYPE_CHECK_INSTANCE_CAST((obj),GST_TYPE_ROI,GstSAMPLE))
#define GST_SAMPLE_CLASS(klass) (G_TYPE_CHECK_CLASS_CAST((klass),GST_TYPE_ROI,GstSAMPLEClass))
#define GST_SAMPLE_GET_CLASS(obj) (G_TYPE_INSTANCE_GET_CLASS((obj), GST_TYPE_SAMPLE, GstSAMPLEClass))
#define GST_IS_SAMPLE(obj) (G_TYPE_CHECK_INSTANCE_TYPE((obj),GST_TYPE_SAMPLE))
#define GST_IS_SAMPLE_CLASS(klass) (G_TYPE_CHECK_CLASS_TYPE((klass),GST_TYPE_SAMPLE))
#define GST_SAMPLE_CAST(obj)  ((GstSAMPLE *)(obj))

..
G_END_DECLS

dsexample로 본인이 원하는 PlugIn 생성

dsexmple을 복사하여 이름만 변경해서 그 구성을 만들어서 일단 테스트를 진행을 해보면 쉽게 동작되는 것을 확인가능하다.
설치가 되면, gst-inspect 로도 쉽게 관련설명을 확인 할수 있다.
자세한 세부설명은 생략 (Gstreamer Manual 참조)

dsexample 과 nvmsgbroker 비교분석

두개의 PlugIn의 구성 동작방식이 다르며, 이는 아래와 같이 간단히 비교가능하다.
gst_xxxxx_class_init 함수에서 사용되는 구조체와 이와 관련된 함수들을 비교 분석할 필요가 있다.

//msgbroker 
//GstBaseSinkClass  , Sink Pad의 중점으로 동작되도록 구성

  GObjectClass *gobject_class = G_OBJECT_CLASS (klass);
  GstBaseSinkClass *base_sink_class = GST_BASE_SINK_CLASS (klass);
.....
  base_sink_class->set_caps = GST_DEBUG_FUNCPTR (gst_nvmsgbroker_set_caps);
  base_sink_class->start = GST_DEBUG_FUNCPTR (gst_nvmsgbroker_start);
  base_sink_class->stop = GST_DEBUG_FUNCPTR (gst_nvmsgbroker_stop);
  base_sink_class->render = GST_DEBUG_FUNCPTR (gst_nvmsgbroker_render);

// dsexmple 
//GstBaseTransformClass  , PlugIn 내부에서 Data 변경중심으로 동작 (이때 Data를 어떻게 trasform 시키는지 확인) 

  GObjectClass *gobject_class;
  GstElementClass *gstelement_class;
  GstBaseTransformClass *gstbasetransform_class;

  gobject_class = (GObjectClass *) klass;
  gstelement_class = (GstElementClass *) klass;
  gstbasetransform_class = (GstBaseTransformClass *) klass;

  /* Overide base class functions */
  gobject_class->set_property = GST_DEBUG_FUNCPTR (gst_dsexample_set_property);
  gobject_class->get_property = GST_DEBUG_FUNCPTR (gst_dsexample_get_property);

  gstbasetransform_class->set_caps = GST_DEBUG_FUNCPTR (gst_dsexample_set_caps);
  gstbasetransform_class->start = GST_DEBUG_FUNCPTR (gst_dsexample_start);
  gstbasetransform_class->stop = GST_DEBUG_FUNCPTR (gst_dsexample_stop);

상위와 같이 각각의 Class에 Callback Function을 넣고 동작을 하는데, 언제 호출이되는지를 파악하자.

GstBaseSink 와 GstBaseTransform 관련구조 파악

둘 다 구조를 보면 GstElement 가 부모 Class 이므로 GstElement 하위 클래스 특징을 알아두자
각각의 method들을 파악하자
https://gstreamer.freedesktop.org/documentation/base/gstbasetransform.html?gi-language=c#GstBaseTransform
https://gstreamer.freedesktop.org/documentation/base/gstbasesink.html?gi-language=c#GstBaseSink

DeepStream Program 과 PlugIn을 작성중에 NVIDIA에게 직접질문사항

dsexample의 properties 중 full-frame의 설정에따라 openCV 와 crop기능이 동작이 되는데, 이 부분을 내 소스에 적용하여 동작되는 것은 확인했다.
이를 정확하게 이해하고자 하면, 반드시 DeepStream SDK API 문서를 보고 각각의 동작을 이해해야한다.
https://devtalk.nvidia.com/default/topic/1061422/deepstream-sdk/how-to-crop-the-image-and-save/

ROI를 PlugIn을 이용하여 이미 개발을 했는데, Line으로 가능하다고하는데 추후 테스트진행
https://devtalk.nvidia.com/default/topic/1061791/deepstream-sdk/about-roi-in-ds4-0-on-xavier-/post/5378191/#5378191

1.6 각 모델의 성능비교 (TensorRT)

이전에 TensorRT를 하면서 trtexec 제대로 사용할 줄을 몰랐는데, 이제 사용법을 제대로 알겠다.
이 Tool은 UFF/Caffe/ONNX Model, TensorRT Engine의 성능측정을 위해서 사용되어진다고 한다.

//각 모델의 성능을 측정을 해보기 위해서 trtexec 사용을 해보자 
$ /usr/src/tensorrt/bin/trtexec --help
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --help
[I] help

Mandatory params:
  --deploy=          Caffe deploy file
  OR --uff=          UFF file
  OR --onnx=         ONNX Model file
  OR --loadEngine=   Load a saved engine

Mandatory params for UFF:
  --uffInput=,C,H,W Input blob name and its dimensions for UFF parser (can be specified multiple times)
  --output=      Output blob name (can be specified multiple times)

Mandatory params for Caffe:
  --output=      Output blob name (can be specified multiple times)

Optional params:
  --model=          Caffe model file (default = no model, random weights used)
  --batch=N               Set batch size (default = 1)
  --device=N              Set cuda device to N (default = 0)
  --iterations=N          Run N iterations (default = 10)
  --avgRuns=N             Set avgRuns to N - perf is measured as an average of avgRuns (default=10)
  --percentile=P          For each iteration, report the percentile time at P percentage (0<=P<=100, with 0 representing min, and 100 representing max; default = 99.0%)
  --workspace=N           Set workspace size in megabytes (default = 16)
  --safe                  Only test the functionality available in safety restricted flows.
  --fp16                  Run in fp16 mode (default = false). Permits 16-bit kernels
  --int8                  Run in int8 mode (default = false). Currently no support for ONNX model.
  --verbose               Use verbose logging (default = false)
  --saveEngine=     Save a serialized engine to file.
  --loadEngine=     Load a serialized engine from file.
  --calib=          Read INT8 calibration cache file.  Currently no support for ONNX model.
  --useDLACore=N          Specify a DLA engine for layers that support DLA. Value can range from 0 to n-1, where n is the number of DLA engines on the platform.
  --allowGPUFallback      If --useDLACore flag is present and if a layer can't run on DLA, then run on GPU. 
  --useSpinWait           Actively wait for work completion. This option may decrease multi-process synchronization time at the cost of additional CPU usage. (default = false)
  --dumpOutput            Dump outputs at end of test. 
  -h, --help              Print usage

//trtexec의 정보를 얻기위해서 각 config 파일 파악 
$ cd ~/deepstream-4.0/sources/apps/sample_apps/deepstream-test2
$ cat dstest2_pgie_config.txt

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8)
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names            // Caffe Model 은 3가지 정보가 필수 , model-file / proto-file , out-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names                            // UFF , uff-file, input-dims, uff-input-blob-name, output-blob-names 
#   ONNX: onnx-file                                                                                                 //  ONNX: onnx-file 

model-file=../../../../samples/models/Primary_Detector/resnet10.caffemodel
proto-file=../../../../samples/models/Primary_Detector/resnet10.prototxt
...
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

//TensorRT Engine 
$ /usr/src/tensorrt/bin/trtexec   --loadEngine=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b4_int8.engine 
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b4_int8.engine
[I] loadEngine: /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b4_int8.engine
[I] /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b4_int8.engine has been successfully loaded.
[I] Average over 10 runs is 1.01944 ms (host walltime is 1.10421 ms, 99% percentile time is 1.06387).
[I] Average over 10 runs is 1.00943 ms (host walltime is 1.06669 ms, 99% percentile time is 1.02454).
[I] Average over 10 runs is 1.00519 ms (host walltime is 1.06144 ms, 99% percentile time is 1.01184).
[I] Average over 10 runs is 1.00898 ms (host walltime is 1.07056 ms, 99% percentile time is 1.02982).
[I] Average over 10 runs is 1.00417 ms (host walltime is 1.06018 ms, 99% percentile time is 1.02707).
[I] Average over 10 runs is 1.00541 ms (host walltime is 1.06557 ms, 99% percentile time is 1.02682).
[I] Average over 10 runs is 1.00323 ms (host walltime is 1.0602 ms, 99% percentile time is 1.03834).
[I] Average over 10 runs is 1.00476 ms (host walltime is 1.06061 ms, 99% percentile time is 1.02954).
[I] Average over 10 runs is 1.00358 ms (host walltime is 1.05957 ms, 99% percentile time is 1.00902).
[I] Average over 10 runs is 1.00232 ms (host walltime is 1.05585 ms, 99% percentile time is 1.00704).

//Caffe Model 과 TensorRT 상위비교가능 (소요시간은 알겠지만, percentile time 은 무슨의미인지 퍼센트?)
$ /usr/src/tensorrt/bin/trtexec --deploy=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt \
  --model=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel \
  --output=conv2d_bbox

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --deploy=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt --model=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel --output=conv2d_bbox
[I] deploy: /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt
[I] model: /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel
[I] output: conv2d_bbox
[I] Input "input_1": 3x368x640
[I] Output "conv2d_bbox": 16x23x40
[I] Average over 10 runs is 4.64663 ms (host walltime is 4.72585 ms, 99% percentile time is 4.7319).
[I] Average over 10 runs is 4.62393 ms (host walltime is 4.69601 ms, 99% percentile time is 4.64422).
[I] Average over 10 runs is 4.6295 ms (host walltime is 4.69154 ms, 99% percentile time is 4.64858).
[I] Average over 10 runs is 4.62978 ms (host walltime is 4.68834 ms, 99% percentile time is 4.64538).
[I] Average over 10 runs is 4.62103 ms (host walltime is 4.68236 ms, 99% percentile time is 4.63843).
[I] Average over 10 runs is 4.62193 ms (host walltime is 4.68143 ms, 99% percentile time is 4.64042).
[I] Average over 10 runs is 4.61595 ms (host walltime is 4.67465 ms, 99% percentile time is 4.62768).
[I] Average over 10 runs is 4.61807 ms (host walltime is 4.67505 ms, 99% percentile time is 4.63514).
[I] Average over 10 runs is 4.61827 ms (host walltime is 4.68276 ms, 99% percentile time is 4.62362).
[I] Average over 10 runs is 4.62702 ms (host walltime is 4.69345 ms, 99% percentile time is 4.64864).
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --deploy=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt --model=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel --output=conv2d_bbox

$ /usr/src/tensorrt/bin/trtexec --deploy=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt \
  --model=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel \
  --output=conv2d_cov/Sigmoid
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --deploy=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt --model=/home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel --output=conv2d_cov/Sigmoid
[I] deploy: /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt
[I] model: /home/nvidia/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel
[I] output: conv2d_cov/Sigmoid
[I] Input "input_1": 3x368x640
[I] Output "conv2d_cov/Sigmoid": 4x23x40
[I] Average over 10 runs is 4.6327 ms (host walltime is 4.70868 ms, 99% percentile time is 4.6904).
[I] Average over 10 runs is 4.62576 ms (host walltime is 4.69326 ms, 99% percentile time is 4.66947).
[I] Average over 10 runs is 4.62769 ms (host walltime is 4.68794 ms, 99% percentile time is 4.65818).
[I] Average over 10 runs is 4.62516 ms (host walltime is 4.69319 ms, 99% percentile time is 4.66173).
[I] Average over 10 runs is 4.62184 ms (host walltime is 4.68396 ms, 99% percentile time is 4.64534).
[I] Average over 10 runs is 4.62518 ms (host walltime is 4.67966 ms, 99% percentile time is 4.64067).
[I] Average over 10 runs is 4.62082 ms (host walltime is 4.68281 ms, 99% percentile time is 4.64358).
[I] Average over 10 runs is 4.62256 ms (host walltime is 4.68476 ms, 99% percentile time is 4.65318).
[I] Average over 10 runs is 4.62129 ms (host walltime is 4.68117 ms, 99% percentile time is 4.64125).
[I] Average over 10 runs is 4.62561 ms (host walltime is 4.6864 ms, 99% percentile time is 4.64435).

//UFF Model 
$ cd ~/deepstream-4.0/sources/objectDetector_SSD
$ cat config_infer_primary_ssd.txt
.......
uff-file=sample_ssd_relu6.uff
uff-input-dims=3;300;300;0
uff-input-blob-name=Input
...
output-blob-names=MarkOutput_0
parse-bbox-func-name=NvDsInferParseCustomSSD
custom-lib-path=nvdsinfer_custom_impl_ssd/libnvdsinfer_custom_impl_ssd.so

//UFF 모델은 Custom Layer를 사용해서 동작이 제대로 안될 것 같다 ( uffinput 부분도 상위 값과 어떤의미를 정확하게 알아야하는데, 모름)
$ /usr/src/tensorrt/bin/trtexec --uff=/home/nvidia/deepstream-4.0/sources/objectDetector_SSD/sample_ssd_relu6.uff \
  --uffInput=Input,3,300,300 \
  --output=MarkOutput_0
[I] uff: /home/nvidia/deepstream-4.0/sources/objectDetector_SSD/sample_ssd_relu6.uff
[I] uffInput: Input,3,300,300
[I] output: MarkOutput_0
[E] [TRT] UffParser: Validator error: concat_box_loc: Unsupported operation _FlattenConcat_TRT
[E] Engine could not be created
[E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --uff=/home/nvidia/deepstream-4.0/sources/objectDetector_SSD/sample_ssd_relu6.uff --uffInput=Input,3,300,300 --output=MarkOutput_0

Best Practices For TensorRT Performance (trtexec 및 다른 Tool)
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html

Caffe Model trtexec 실제사용
  https://devtalk.nvidia.com/default/topic/1061845/deepstream-sdk/resnet50-classification-as-primary-gie/

Model Parser Error 사항
  https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#error-messaging

yais Tool 은 추후에 사용해보자
  https://github.com/NVIDIA/tensorrt-laboratory/tree/master/examples/00_TensorRT

2. DeepStream 의 Gstream 테스트

DeepStream SDK 3.0 및 Gstreamer 관련정리
https://ahyuo79.blogspot.com/2019/07/deepstream-sdk-30-gstreamer.html

2.1 Gstreamer Debugging 방법

GST_DEBUG 의 환경변수를 이용하며, 이곳에 원하는 값을 넣고 설정하면 관련 Debug Message 볼수가 있다.

1~9까지 선택이 가능하다
( 1 - ERROR ,2 - WARNING ,3 - FIXME ,4 - INFO )
( 5 - DEBUG, 6 - LOG, 7 - TRACE, 9 - MEMDUMP )

Gstreamer 관련 환경변수들 (GST_DEBUG 이외 다양한 환경변수)
https://gstreamer.freedesktop.org/documentation/gstreamer/running.html?gi-language=c

GST_DEBUG 관련사용방법
https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=c

전체설정 Debug

Gstreamer 를 전체 Debug 하고자 하면 아래와 같이 하면 되지만 별로 추천하지는 않는다.

$ export GST_DEBUG="*:2" //  전체 WARN,  설정이 가능,  가장 적절    
$ export GST_DEBUG="*:4" //  전체 INFO,설정    
$ export GST_DEBUG=""   // 설정제거

원하는 PlugIn 만 Debug

모든 PlugIn을 Debug 하지말고 특정 PlugIN만 Debug하여 관련부분의 문제점을 찾아보자.

 // 모든 PlugIN Debug  , 내부 PlugIN을 구조를 모르면 이렇게 확인하고 PlugIN 구조를 파악 
$ GST_DEBUG="*:5" deepstream-app -c deepstream_app_config_yoloV3.txt 

//특정 PlugIn만 Debug 
$ GST_DEBUG="dsexample:5" ./deepstream-rtsp-app rtsp://10.0.0.199:554/h264    // 특정 PlugIN만 세부 Debug 
$ GST_DEBUG="qtdemux:5" deepstream-app -c deepstream_app_config_yoloV3.txt  // qtdemux 만 세부 Debug 

//동시에 여러개 PlugIn Debug 
$ GST_DEBUG="qtdemux:5,dsexample:4 " deepstream-app -c deepstream_app_config_yoloV3.txt  // qtdemux 만 세부 Debug

이외 PlugIn의 카테고리 설정 Debug

본인이 직접 특정 Category를 선언하고 그에 관련된 부분을 직접 Debug도 가능하다, 상위 PlugIn도 다 들어가보면, Category로 선언하여 사용한다

// 소스에서 직접 원하는 Category 설정 (my_category)
GST_DEBUG_CATEGORY_STATIC (my_category);  // define category (statically)
#define GST_CAT_DEFAULT my_category       // set as default

  if (!my_category) {
   GST_DEBUG_CATEGORY_INIT (my_category, "MY_CAT", 0, NULL);
  }

  GST_CAT_INFO (my_category, "TEST Info %s", "Category TEST");
  GST_CAT_DEBUG (my_category, "TEST Debug %s", "Category TEST");
  GST_CAT_ERROR (my_category, "TEST error %s", "Category TEST");

//실제 테스트 
$ GST_DEBUG="MY_CAT:5" deepstream-app -c deepstream_app_config_yoloV3.txt 
$ GST_DEBUG="MY_CAT:9" deepstream-app -c deepstream_app_config_yoloV3.txt

https://gstreamer.freedesktop.org/documentation/gstreamer/gstinfo.html?gi-language=c

2.2 Gstreamer 관련부분 SDK3.0 관련부분 재확인

Gstreamer 관련 Test는 이미 SDK 3.0에서 많이 했기때문에 간단히 서술하며, SDK 4.0과 SDK 3.0은 호환성 Gstreamer 명령어가 동일하게 동작되지 않는 부분이 많다.
PlugIn 이름과 설정 변경이 되었기때문에 상위의 gst-inspect로 확인하고 실행하자

DeepStream SDK 3.0의 TEST2 예제

아래와 같이 decodebin or uridecodebin 사용해서 테스트 진행했으며, 세부내용은 아래참조

$ pwd
/home/nvidia/deepstream_sdk_on_jetson/sources/apps/sample_apps/deepstream-test2

//Sample 영상로 1stGIE,2ndGIE ,nvtracker 사용하여 화면전체 재생 ( X-Window 재생은 상위 참조)

$ gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.h264 ! \
        decodebin ! nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        capsfilter caps=video/x-raw(memory:NVMM), format=NV12 ! \
        nvvidconv ! \
        capsfilter caps=video/x-raw(memory:NVMM), format=RGBA ! \
        nvosd font-size=15 ! nvoverlaysink

// RTSP를 이용하여  1stGIE,2ndGIE ,nvtracker 사용하여 화면전체 재생 

$ gst-launch-1.0 uridecodebin uri=rtsp://10.0.0.199:554/h264 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        capsfilter caps=video/x-raw(memory:NVMM), format=NV12 ! \
        nvvideoconvert ! \
        capsfilter caps=video/x-raw(memory:NVMM), format=RGBA ! \
        nvdsosd ! nvoverlaysink

//Sample 영상로 1stGIE,2ndGIE ,nvtracker 사용하여 X-Window 창 재생 

$ gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! \
        decodebin ! nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvidconv ! nvosd ! nvegltransform ! nveglglessink

DeepStream SDK 3.0 관련문서

아래 링크와 비교를 해보면, 이전의 DeepStream SDK 3.0과는 많이 변했으며, Gstreamer도 호환이 되지 않는다.
https://ahyuo79.blogspot.com/2019/07/deepstream-sdk-30-gstreamer.html

2.3 Gstreamer 관련부분 SDK 4.0 관련부분확인

DeepStream SDK 4.0으로 오면서 우선 가장 큰 차이는 1Channel을 사용해도 streammux는 반드시 사용이 되어야한다.
더불어 nvvidconv 대신 nvvideoconvert을 사용해야하며, 이전 처럼 필터설정은 필요없어 진것 같다.
nvvidconv의 경우 nvosd가 없어져서 동작이 안되는 것로 생각된다.
SDK 4.0으로 오면서 PlugIn(Element)의 Properties가 다양해졌으며, 변경되었기 때문에 조심하자.

DeepStream SDK 4.0의 TEST2 기본설정

우선 아래와 같이 TensorRT (1st, 2nd) 엔진설정을 해두자
상위에서 설명했듯이 Jetson AGX Xavier는 INT8모드 지원하며 이를 설정시 Table도 같이 설정해야함 (상위참조)
- model-engine-file
- int8-calib-file
- network-mode=1 # 0=FP32, 1=INT8, 2=FP16 mode

DeepStream SDK4.0 TEST4 와 iPlugIn 관련사항 (이전부분참조)
https://ahyuo79.blogspot.com/2019/08/ds-sdk-40-test4-iplugin-sample.html

$cd ~/deepstream-4.0/sources/apps/sample_apps/deepstream-test2
$ pwd
/home/nvidia/deepstream-4.0/sources/apps/sample_apps/deepstream-test2

$ vi dstest2_pgie_config.txt 
model-engine-file=../../../../samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine

$ vi dstest2_sgie1_config.txt 
model-engine-file=../../../../samples/models/Secondary_CarColor/resnet18.caffemodel_b16_int8.engine

$ vi dstest2_sgie2_config.txt
model-engine-file=../../../../samples/models/Secondary_CarMake/resnet18.caffemodel_b16_int8.engine

$ vi dstest2_sgie3_config.txt
model-engine-file=../../../../samples/models/Secondary_VehicleTypes/resnet18.caffemodel_b16_int8.engine

nvstreammux 기본 사용법

필수설정이므로 관련 각 기능을 알아두도록하자 (nvinfer 전에 설정)
아래와 같이 m.sink_0 을 두어 앞에 sink를 설정하여 여러채널을 받을 수 있다.
뒤의 설정을 보면 batch-size는 Channel (Frame) 과 Resolution을 설정 할수 있어 Scale도 가능하다
만약 nvinfer를 사용하지 않는다면, 아래의 테스트와 같이 사용을 안해도 상관은 없는것 같다.

m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720

NVIDIA Community 질의사항

Deepstream 의 궁금해서 Community에 가입해서 물어봄 (TI의 경우, 벌써 계정 몇개)
https://devtalk.nvidia.com/default/topic/1061785/deepstream-sdk/about-streammux-in-ds-on-xavier/

PlugIn Manual

아직 미지원사항이 있으므로 주의 , Jetson과 Tesla 와 별도
https://docs.nvidia.com/metropolis/deepstream/4.0/DeepStream_Plugin_Manual.pdf

DeepStream SDK 4.0의 TEST2의 기본 테스트 진행

우선 h264parse 와 nvv4l2decoder(nvv4l2decoder 새로 생김)을 이용하여 동작해보고, 출력은 X-Window창으로 출력을 하도록하자 (OpenGL사용)
아래와 같이 nvosd 가 사라져서 두번째 것은 동작이 안된다.

//Sample TEST 2 동작확인 
$ gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.h264 ! \
        h264parse !  nvv4l2decoder ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

//Sample nvvidconv 와 nvosd 변경 (nvosd 미지원으로 에러발생)   4.0부터는 nvvideoconvert 와 nvdsosd 를 이용권장 
 gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.h264 ! \
        h264parse !  nvv4l2decoder ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvidconv ! nvosd ! nvegltransform ! nveglglessink

DeepStream SDK 4.0 의 TEST2 decode 테스트 실행

기존처럼 편하게 deepstrem-test2 에 decodebin 과 uridecodebin을 사용이 가능하다.

 
// 주의 해야할 것은 처음 실행시, TensorRT Engine이 없으므로, 생성시간이 많이 걸림
// 각각의 model-engine-file을 설정을 해줘서 이를 해결하지만, Jetson AGX Xavier (INT8 지원가능)

//Sample TEST 2  decodebin 변경 (동작확인) 
$ gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.h264 ! \
        decodebin ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

//Sample TEST 2  uridecodebin 변경 (RTSP지원) 동작확인 
$ gst-launch-1.0 uridecodebin uri=rtsp://10.0.0.199:554/h264 ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

//Sample TEST 2  uridecodebin 변경 (FILE지원) 동작확인 
$ gst-launch-1.0 uridecodebin uri=file:///home/nvidia/deepstream-4.0/samples/streams/sample_720p.h264  ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

//Sample TEST 2  uridecodebin 변경 (RTSP지원) 미동작확인 (nvstreamux 삭제) 
$ gst-launch-1.0 uridecodebin uri=rtsp://10.0.0.199:554/h264 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

//Sample TEST 2  uridecodebin 변경 (RTSP지원) 동작확인 (nvstreamux 삭제 및 nvinfer/nvtracker 삭제 ) 
$ gst-launch-1.0 uridecodebin uri=rtsp://10.0.0.199:554/h264 ! \
        nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

nvinfer 때문에 nvstreamux 필요한 것 같음 (추측)

DeepStream SDK 4.0 의 TEST2 출력부분을 변경하여 각각 테스트

nvegltransform ! nveglglessink 은 X-Window 창으로 출력
nvoverlaysink 설정하면, 전체 화면출력

//Sample TEST 2  nvoverlaysink 변경 (전체화면) 동작확인 
$ gst-launch-1.0 uridecodebin uri=file:///home/nvidia/deepstream-4.0/samples/streams/sample_720p.h264  ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvoverlaysink

MPEG4->H.264 Transcoding 테스트

기존과 동일하게 동작되며, 재생이 아닌이상 streammux가 설정안해도 됨

 $  gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! omxh264enc !  h264parse ! qtmux ! filesink location=test.h264

tee를 사용하여 2개의 채널을 분리하여, 재생과 Transcoding 동시진행

기존 SDK 3.0 처럼 tee를 사용하여 재생과 Transcoding을 동시진행하려고 했으나 동작 안되는데, PlugIn(Element) 단위로 보면 미동작 원인이 이해가 안간다.

// 기본영상재생시 동작확인 ( Streammux 권고)
$  gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvegltransform ! nveglglessink

// RTSP를 위해 uridecodebin으로 변경 후  nvstreammux 를 제거해서 실행 
$ gst-launch-1.0 uridecodebin uri=rtsp://10.0.0.201:554/h264 ! nvvideoconvert ! nvegltransform ! nveglglessink

// tee를 사용하여 1 Channel 재생  (동작확인) 
$  gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! tee name=t ! queue ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvegltransform ! nveglglessink 

//tee를 사용하여 2 채널 사용 (문제발생)
$  gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! tee name=t ! queue ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvegltransform ! nveglglessink t. ! queue ! omxh264enc !  h264parse ! qtmux  ! filesink location=test.mp4 

//tee를 사용하여 2 채널 사용 (NVIDIA 권고사항) 동작은되지만, 가끔 에러발생
$ gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! tee name=t ! queue ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvegltransform ! nveglglessink t. ! queue ! nvvideoconvert ! nvv4l2h264enc !  h264parse ! qtmux  ! filesink location=test.mp4

//tee를 사용하여 2 채널 사용 fakesink (동작확인)  , 4.0 부터는 NVIDIA에서는 omxh264enc를 권장하는 않는 것 같음 
$  gst-launch-1.0 filesrc location=../../../../samples/streams/sample_720p.mp4 ! decodebin ! tee name=t ! queue ! fakesink t. ! queue ! omxh264enc !  h264parse ! qtmux  ! filesink location=test.mp4

NVIDIA Community 질의사항

직접 물어보니 omxh264enc 대신 nvv4l2h264enc로 변경
https://devtalk.nvidia.com/default/topic/1061803/deepstream-sdk/tee-in-ds-4-0-on-xavier-/

nvinfer 와 nvosd를 적용 후 화면재생과 H.264 Encoding 동시작업

DS3.0 과 다르게 상위와 같이 omxh264enc 대신 nvv4l2h264enc을 사용해야 제대로 동작된다

//Sample TEST 2  nvoverlaysink 변경 (전체화면) 동작확인 

$ gst-launch-1.0 uridecodebin uri=file:///home/nvidia/deepstream-4.0/samples/streams/sample_720p.h264  ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! nvoverlaysink

// 기본동작은 상위와 동일하지만, 2 Channel로 nvosd를 걸쳐 화면재생과 H.264 Encoding 동시작업
$ gst-launch-1.0 uridecodebin uri=file:///home/nvidia/deepstream-4.0/samples/streams/sample_720p.h264  ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! \
        tee name=t ! queue ! nvegltransform ! nveglglessink \
        t. ! queue ! nvvideoconvert ! nvv4l2h264enc !  h264parse ! qtmux  ! filesink location=test.mp4

//NVIDIA가 추후 nvvideoconvert로 변경해보라고 해서 재테스트  
$ gst-launch-1.0 uridecodebin uri=file:///home/nvidia/deepstream-4.0/samples/streams/sample_720p.h264  ! \
        m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
        nvinfer config-file-path= dstest2_pgie_config.txt ! \
        nvtracker tracker-width=640 tracker-height=368  gpu-id=0  \
        ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so \
        ll-config-file=tracker_config.yml enable-batch-process=1 ! \
        nvinfer config-file-path= dstest2_sgie1_config.txt ! \
        nvinfer config-file-path= dstest2_sgie2_config.txt ! \
        nvinfer config-file-path= dstest2_sgie3_config.txt ! \
        nvvideoconvert ! nvdsosd ! \
        tee name=t ! queue ! nvvideoconvert ! nveglglessink \
        t. ! queue ! nvvideoconvert ! nvv4l2h264enc !  h264parse ! qtmux  ! filesink location=test.mp4

피드 구독하기: 글 ( Atom )

Github Page

8/02/2019

Deepstream SDK 4.0 변화 및 PlugIn 구조 및 생성방법 ( Gstreamer 변화 )