Jeonghun (James) Lee: 2019

12/29/2019

Terminal 색상설정

1. Terminal Color 설정

ANSI escape code를 이용하여 Terminal의 Color 및 다양한 변화를 줄 수 가 있다.
물론 개별 Terminal 마다 지원되는 부분이 다를 수 있으므로 주의하자

ANSI escape code는 ESC Key Code기반으로 Print를 이용하여 다양한 설정가능하며, 주로 내가 사용할 것은 CSI->SGR로 색상설정이다.

아래의 내용은 전부 Wiki에서 가져왔으며, 세부사항은 Wiki를 보도록하자.

ANSI_escape_code 정보
  https://en.wikipedia.org/wiki/ANSI_escape_code
  https://en.wikipedia.org/wiki/C0_and_C1_control_codes

Terminal color (색상정보 값 확인)
  https://misc.flogisoft.com/bash/tip_colors_and_formatting

1.1 Escape Sequences

ESC (dec 27 / hex 0x1B / oct 033) 의 ESC 값을 기반으로 사용되지는 Sequences들을 말한다.
글자색상변화 하고 싶다면 아래의 CSI->SGR 로 설정을 하면된다.
이외에도 다양한 기능이 있으니, 아래의 표를 보고 알아두자.

Sequence	C1	Short	Name	Effect
ESC N	0x8E	SS2	Single Shift Two	Select a single character from one of the alternative character sets. In xterm, SS2 selects the G2 character set, and SS3 selects the G3 character set.^[19]
ESC O	0x8F	SS3	Single Shift Three
ESC P	0x90	DCS	Device Control String	Terminated by ST. Xterm's uses of this sequence include defining User-Defined Keys, and requesting or setting Termcap/Terminfo data.^[19]
ESC [	0x9B	CSI	Control Sequence Introducer	Most of the useful sequences, see next section.
ESC \	0x9C	ST	String Terminator	Terminates strings in other controls.^[18]^:8.3.143
ESC ]	0x9D	OSC	Operating System Command	Starts a control string for the operating system to use, terminated by ST.^[18]^:8.3.89 In xterm, they may also be terminated by BEL.^[19] In xterm, the window title can be set by `OSC 0;this is the window title BEL`.
ESC X	0x98	SOS	Start of String	Takes an argument of a string of text, terminated by ST. The uses for these string control sequences are defined by the application^[18]^{:8.3.2,8.3.128} or privacy discipline.^[18]^:8.3.94 These functions are not implemented and the arguments are ignored by xterm.^[19]
ESC ^	0x9E	PM	Privacy Message
ESC _	0x9F	APC	Application Program Command
ESC c		RIS	Reset to Initial State	Resets the device to its original state. This may include (if applicable): reset graphic rendition, clear tabulation stops, reset to default font, and more.^[20]

CSI sequences

CSI (ESC [ ) 시작하는 CODE이며, 아래에서 사용되어지는 n, m 은 보통 숫자로 인식하지만, SGR의 경우 m은 문자로 사용

세부사항은 우측 Effect 부분을 참조

Code	Short	Name	Effect
CSI `n` A	CUU	Cursor Up	Moves the cursor `n` (default `1`) cells in the given direction. If the cursor is already at the edge of the screen, this has no effect.
CSI `n` B	CUD	Cursor Down
CSI `n` C	CUF	Cursor Forward
CSI `n` D	CUB	Cursor Back
CSI `n` E	CNL	Cursor Next Line	Moves cursor to beginning of the line `n` (default `1`) lines down. (not ANSI.SYS)
CSI `n` F	CPL	Cursor Previous Line	Moves cursor to beginning of the line `n` (default `1`) lines up. (not ANSI.SYS)
CSI `n` G	CHA	Cursor Horizontal Absolute	Moves the cursor to column `n` (default `1`). (not ANSI.SYS)
CSI `n` ; `m` H	CUP	Cursor Position	Moves the cursor to row `n`, column `m`. The values are 1-based, and default to `1` (top left corner) if omitted. A sequence such as `CSI ;5H` is a synonym for `CSI 1;5H` as well as `CSI 17;H` is the same as `CSI 17H` and `CSI 17;1H`
CSI `n` J	ED	Erase in Display	Clears part of the screen. If `n` is `0` (or missing), clear from cursor to end of screen. If `n` is `1`, clear from cursor to beginning of the screen. If `n` is `2`, clear entire screen (and moves cursor to upper left on DOS ANSI.SYS). If `n` is `3`, clear entire screen and delete all lines saved in the scrollback buffer (this feature was added for xterm and is supported by other terminal applications).
CSI `n` K	EL	Erase in Line	Erases part of the line. If `n` is `0` (or missing), clear from cursor to the end of the line. If `n` is `1`, clear from cursor to beginning of the line. If `n` is `2`, clear entire line. Cursor position does not change.
CSI `n` S	SU	Scroll Up	Scroll whole page up by `n` (default `1`) lines. New lines are added at the bottom. (not ANSI.SYS)
CSI `n` T	SD	Scroll Down	Scroll whole page down by `n` (default `1`) lines. New lines are added at the top. (not ANSI.SYS)
CSI `n` ; `m` f	HVP	Horizontal Vertical Position	Same as CUP, but counts as a format effector function (like CR or LF) rather than an editor function (like CUD or CNL).^[1]^:AppA.1 This can lead to different handling in certain terminal modes.^[18]^:AnnexA
CSI `n` m	SGR	Select Graphic Rendition	Sets the appearance of the following characters, see SGR parameters below.
CSI 5i		AUX Port On	Enable aux serial port usually for local serial printer
CSI 4i		AUX Port Off	Disable aux serial port usually for local serial printer
CSI 6n	DSR	Device Status Report	Reports the cursor position (CPR) to the application as (as though typed at the keyboard) `ESC[n;mR`, where `n` is the row and `m` is the column.)

CSI를 시작으로 상위 명령들을 간단히 실행을 해보자

echo 기반으로 상위 적용 Simple 예제

$ man echo
NAME
       echo - display a line of text

SYNOPSIS
       echo [SHORT-OPTION]... [STRING]...
       echo LONG-OPTION

DESCRIPTION
       Echo the STRING(s) to standard output.

       -n     do not output the trailing newline
       -e     enable interpretation of backslash escapes   // backslash를 이용하여 ESC Code를 사용 
       -E     disable interpretation of backslash escapes (default)

If -e is in effect, the following sequences are recognized:    // echo -e 했을 경우 backslash 사용하여 출력예제 
       \\     backslash              
       \a     alert (BEL)
       \b     backspace
       \c     produce no further output
       \e     escape
       \f     form feed
       \n     new line
       \r     carriage return
       \t     horizontal tab
       \v     vertical tab
       \0nnn  the character whose ASCII code is NNN (octal).  NNN can be 0 to 3 octal digits
       \xHH   the eight-bit character whose value is HH (hexadecimal).  HH can be one or two hex digits

// TEST Normal 
$ echo -e "TEST " 

// TEST CSI n J ( Erase in Display ) 
$ echo -e "\x1b[1J TEST " 

// TEST CSI n;m H (Cursor Position ) 
$ echo -e "\x1b[1;10H TEST "

Linux command로 echo를 이용하여 색상 및 상위 명령어를 표현시에는 -e 옵션을 반드시 적용

1.2 Terminal의 색상 설정변경 (CSI->SGR)

Terminal의 색상 설정 및 Blink를 비롯하여, Bold / Underline등 다양한 설정이 가능하다.
색상또한 본인이 원하면, 다양하게 설정이 가능하다.

SGR parameters (Set Graphics Rendition)

CSI n m (ESC[ n m)로 동작되며, n값은 아래의 Code값
Color Table은 3종류로 선택가능하며, 3/4bit , 8 bit , 24 bit 모드로 동작가능

Code	Effect	Note
0	Reset / Normal	all attributes off
1	Bold or increased intensity
2	Faint (decreased intensity)
3	Italic	Not widely supported. Sometimes treated as inverse.
4	Underline
5	Slow Blink	less than 150 per minute
6	Rapid Blink	MS-DOS ANSI.SYS; 150+ per minute; not widely supported
7	Reverse video	swap foreground and background colors
8	Conceal	Not widely supported.
9	Crossed-out	Characters legible, but marked for deletion.
10	Primary(default) font
11–19	Alternative font	Select alternative font $n - 10$
20	Fraktur	Rarely supported
21	Doubly underline or Bold off	Double-underline per ECMA-48.^[26] See discussion
22	Normal color or intensity	Neither bold nor faint
23	Not italic, not Fraktur
24	Underline off	Not singly or doubly underlined
25	Blink off
27	Inverse off
28	Reveal	conceal off
29	Not crossed out
30–37	Set foreground color	See color table below 3/4bit color
38	Set foreground color	Next arguments are `5;n` or `2;r;g;b`, see below
39	Default foreground color	implementation defined (according to standard)
40–47	Set background color	See color table below 3/4bit color
48	Set background color	Next arguments are `5;n` or `2;r;g;b`, see below
49	Default background color	implementation defined (according to standard)
51	Framed
52	Encircled
53	Overlined
54	Not framed or encircled
55	Not overlined
60	ideogram underline or right side line	Rarely supported
61	ideogram double underline or double line on the right side
62	ideogram overline or left side line
63	ideogram double overline or double line on the left side
64	ideogram stress marking
65	ideogram attributes off	reset the effects of all of `60`–`64`
90–97	Set bright foreground color	aixterm (not in standard) 3/4bit
100–107	Set bright background color	aixterm (not in standard) 3/4bit

현재 내 Terminal은 Blink가 미동작하며, 각 Terminal마다 동작 사항은 다를 것 같다.
주로 많이 사용하는 모드는 3/4bit의 color 모드로 보면 가장 심플하게 적용하자.

3/4 bit color 설정 및 Bold TEST

\x1b[1m : Bold
\x1b[93m : Bright Yellow (3/4bit)
\x1b[0m : Reset 을 할 경우 상위 설정된 값들이 모두 초기화

$ echo -e "\x1b[1m \x1b[93m TEST \x1b[0m"

3/4 bit color ";"를 사용하여 SGR를 동시적용 TEST

\x1b[1;93;4m : Bold/Bright Yellow (3/4bit) /Underline
\x1b[0m : Reset

//Linux 에서 동시적용시
$ echo -e "\x1b[1;93;4;6m TEST \x1b[0m"   // Hex ESC 1: Bold , 93: Bright Yellow , 4: Underline 6: Rapid Blink 
$ echo -e "\033[1;93;4;6m TEST \033[0m"   // Oct ESC 상위동일 

//Bright foreground color (3/4 bit)  
$ echo -e "\x1b[1;90m TEST \x1b[0m" // Hex ESC 1: Bold , 90: Bright Black (Gray)
$ echo -e "\x1b[1;91m TEST \x1b[0m" // Hex ESC 1: Bold , 91: Bright Red
$ echo -e "\x1b[1;92m TEST \x1b[0m" // Hex ESC 1: Bold , 92: Bright Green
$ echo -e "\x1b[1;93m TEST \x1b[0m" // Hex ESC 1: Bold , 93: Bright Yellow
$ echo -e "\x1b[1;94m TEST \x1b[0m" // Hex ESC 1: Bold , 94: Bright Blue
$ echo -e "\x1b[1;95m TEST \x1b[0m" // Hex ESC 1: Bold , 95: Bright Magenta
$ echo -e "\x1b[1;96m TEST \x1b[0m" // Hex ESC 1: Bold , 96: Bright Cyan
$ echo -e "\x1b[1;97m TEST \x1b[0m" // Hex ESC 1: Bold , 97: Bright Whight

//Foreground color (3/4 bit) , 
$ echo -e "\x1b[1;30m TEST \x1b[0m" // Hex ESC 1: Bold , 30: Black 
$ echo -e "\x1b[1;31m TEST \x1b[0m" // Hex ESC 1: Bold , 31: Red
$ echo -e "\x1b[1;32m TEST \x1b[0m" // Hex ESC 1: Bold , 32: Green
$ echo -e "\x1b[1;33m TEST \x1b[0m" // Hex ESC 1: Bold , 33: Yellow
$ echo -e "\x1b[1;34m TEST \x1b[0m" // Hex ESC 1: Bold , 34: Blue
$ echo -e "\x1b[1;35m TEST \x1b[0m" // Hex ESC 1: Bold , 35: Magenta
$ echo -e "\x1b[1;36m TEST \x1b[0m" // Hex ESC 1: Bold , 36: Cyan
$ echo -e "\x1b[1;37m TEST \x1b[0m" // Hex ESC 1: Bold , 37: Whight

//Bell Sound, 
$ echo -e '\007'
$ echo -e '\a'

https://rosettacode.org/wiki/Terminal_control/Ringing_the_terminal_bell

//Window Shell 인 CMD에서 실행 "\x"or "\033" 대신 Ctrl+[ 눌르면, "^[" 생성  
> echo ^[[1;93;4;6m TEST ^[[0m      // Hex ESC 1: Bold , 93: Bright Yellow , 4: Underline 6: Rapid Blink

https://stackoverflow.com/questions/2048509/how-to-echo-with-different-colors-in-the-windows-command-line

//Batch파일생성시 ESC Key 생성문제 때문에 아래와 같이 setESC사용 
setlocal
call :setESC


echo %ESC%[1;31m 
ECHO -----------------------------
ECHO ------     COLOR TEST
ECHO -----------------------------
echo %ESC%[0m

echo %ESC%[101;93m STYLES %ESC%[0m
echo %ESC%[0mReset%ESC%[0m
echo %ESC%[1mBold%ESC%[0m
echo %ESC%[4mUnderline%ESC%[0m
echo %ESC%[7mInverse%ESC%[0m
echo.
echo %ESC%[101;93m NORMAL FOREGROUND COLORS %ESC%[0m
echo %ESC%[30mBlack%ESC%[0m (black)
echo %ESC%[31mRed%ESC%[0m
echo %ESC%[32mGreen%ESC%[0m
echo %ESC%[33mYellow%ESC%[0m
echo %ESC%[34mBlue%ESC%[0m
echo %ESC%[35mMagenta%ESC%[0m
echo %ESC%[36mCyan%ESC%[0m
echo %ESC%[37mWhite%ESC%[0m
echo.
echo %ESC%[101;93m NORMAL BACKGROUND COLORS %ESC%[0m
echo %ESC%[40mBlack%ESC%[0m
echo %ESC%[41mRed%ESC%[0m
echo %ESC%[42mGreen%ESC%[0m
echo %ESC%[43mYellow%ESC%[0m
echo %ESC%[44mBlue%ESC%[0m
echo %ESC%[45mMagenta%ESC%[0m
echo %ESC%[46mCyan%ESC%[0m
echo %ESC%[47mWhite%ESC%[0m
echo.

PAUSE

:setESC
for /F "tokens=1,2 delims=#" %%a in ('"prompt #$H#$E# & echo on & for %%b in (1) do rem"') do (
  set ESC=%%b
  exit /B 0
)
exit /B 0

https://gist.github.com/mlocati/fdabcaeb8071d5c75a2d51712db24011

8bit Color 적용방법

ESC[ 38;5;⟨n⟩ m Select foreground color (n : 0~255)
ESC[ 48;5;⟨n⟩ m Select background color (n : 0~255)

24bit Color 적용방법 (24bit까지 필요성은 현재 없어보임)

ESC[ 38;2;⟨r⟩;⟨g⟩;⟨b⟩ m Select RGB foreground color
ESC[ 48;2;⟨r⟩;⟨g⟩;⟨b⟩ m Select RGB background color

간단히 아래와 같이 Shell scirpt로 작성하여 출력

8bit TEST Color Shell Script

$ vi 8bitTestColor.sh 
#!/bin/bash

FOR=0
BAK=255

for((FOR=0,BAK=255;FOR<255;FOR++,BAK--)); do
    echo -e "\x1b[38:5:${FOR}m \x1b[48:5:${BAK}m TEST \x1b[0m"
done

$ chmod +x ./8bitTestColor.sh  
$ ./8bitTestColor.sh

Python을 이용한 다양한 Color 설정

python에서도 동일하게 상위 정보기반으로 동일하게 적용가능
http://www.lihaoyi.com/post/BuildyourownCommandLinewithANSIescapecodes.html

C언어 적용 및 TEST

$ vi color_test.c 
#include <stdio.h>

#define RST  "\x1B[0m"   // Normal Reset 
#define BLD  "\x1B[1m"   // Bold
#define BLK  "\x1B[30m"  // Black
#define RED  "\x1B[31m"  // Red
#define GRN  "\x1B[32m"  // Green
#define YEL  "\x1B[33m"  // Yellow
#define BLU  "\x1B[34m"  // Blue
#define MAG  "\x1B[35m"  // Magenta
#define CYN  "\x1B[36m"  // Cyan
#define WHT  "\x1B[37m"  // Light Gray

#define GRA  "\x1B[90m"  // Dark Gray
#define LRD  "\x1B[91m"  // Light RED
#define LGR  "\x1B[92m"  // Light Green
#define LYL  "\x1B[93m"  // Light Yellow
#define LBL  "\x1B[94m"  // Light Blue
#define LMA  "\x1B[95m"  // Light Magenta
#define LCY  "\x1B[96m"  // Light Cyan
#define LWH  "\x1B[97m"  // Whight

int main()
{
	//Color 
    printf("%s %s TEST MY Color %s \n", BLD,RED,RST);
    //Bell sound
    printf("\a");    
}

11/29/2019

Docker GitLab-CE 설치 및 기본이용확인

1. Docker로 이용 Gitlab-CE Server 설치 및 실행

Docker Gitlab-CE 설치 및 기본운영

Gitlab에서 쉽게 설명이 잘되어있어 따라하기도 쉽고 따라하기 좋다
https://docs.gitlab.com/omnibus/docker/
https://hub.docker.com/r/gitlab/gitlab-ce/tags/

Gitlab 직접 설치 Ubuntu

https://about.gitlab.com/install/#ubuntu

문서를 읽다가 보면 현재 Docker Image에는 E-Mail을 위한 SMTP Server가 설치가 되어있지 않다고한다.
나중에 필요하다면 SMTP Server 설치를 진행해서도 테스트를 해보자

기존에 SSH Server가 설치되어 정지

$ sudo service ssh status  //SSH Server 상태확인 
or 
$ sudo systemctl status ssh

$ sudo service ssh stop // SSH Server 정지  
or
$ sudo systemctl stop ssh

$ sudo systemctl enable/disable ssh //SSH Service enable/disable

$ systemctl list-units --type service
$ systemctl list-units --type service --all

1.1 Image Download and Run Gitlab-CE Container

기존으로 Background(detach)로 돌리것이므로, Docker의 설정인 Terminal mode 설정을 비롯하여 다른 설정이 필요없다.
본인이 Terminal에 들어가서 확인하고 싶다면 docker exec 로 이용하여 확인한다

$ docker pull gitlab/gitlab-ce

$ docker run --detach \
  --hostname gitlab.example.com \
  -p 443:443 -p 80:80 -p 22:22 \
  --name gitlab \
  --restart always \
  -v /srv/gitlab/config:/etc/gitlab \
  -v /srv/gitlab/logs:/var/log/gitlab \
  -v /srv/gitlab/data:/var/opt/gitlab \
  gitlab/gitlab-ce:latest

$ docker ps -a  // Container 동작확인 
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS                 PORTS                                                          NAMES
e96d2500ff76        gitlab/gitlab-ce:latest   "/assets/wrapper"   2 hours ago         Up 2 hours (healthy)   0.0.0.0:22->22/tcp, 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   gitlab

PortMapping

SSH : 22
HTTP: 80
HTTPS:443

이미 상위 Server를 사용한다면, 다른 Port로 Mapping하여 사용하자

Host 와 Container Data 공유

/srv/gitlab/data For storing the GitLab configuration files
/srv/gitlab/logs For storing logs
/srv/gitlab/config For storing the GitLab configuration files

상위설정대로 하면 Host의 /srv/gitlab 에 모든 정보가 저장되어진다.

hostname

hostname은 나의 경우 없기때문에 그냥 기존대로 실행했으며, 추후 DDNS를 이용하여 hostname을 생성 한 후 나중에 다시 테스트 진행

1.2 Git Lab의 Config 설정

Gitlab 의 설정은 /etc/gitlab/gitlab.rb 에서 하면되며, 자세한 내용은 아래의 세부 설정부분에서 참조

GitLab Config 설정 및 확인

$ docker exec -it gitlab /bin/bash
root@gitlab:/#   vi /etc/gitlab/gitlab.rb 

or 

$ docker exec -it gitlab editor /etc/gitlab/gitlab.rb

Config의 SMTP 관련설정
https://docs.gitlab.com/omnibus/settings/smtp.html

Config HTTPS 관련설정
https://docs.gitlab.com/omnibus/settings/nginx.html#enable-https

Gitlab Docker 재실행

$ docker restart gitlab  //설정 변경후 Container 재시작

1.3 Gitlab 관리 부분

Data Backup 확인을 위해 Container 삭제 후 다시 재시작

여러명의 ID를 만들고 데이타를 저장을 한 후 Gitlab Container를 삭제 후 다시 시작을 해보면 제대로 /srv/gitlab 에 Backup 되었는지 확인가능하다

$ docker ps -a  // Container 동작확인 
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS                 PORTS                                                          NAMES
e96d2500ff76       gitlab/gitlab-ce:latest   "/assets/wrapper"   2 hours ago         Up 2 hours (healthy)   0.0.0.0:22->22/tcp, 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   gitlab

$ docker stop e96d2500ff76   // Container 정지 

$ docker rm e96d2500ff76    // Container 삭제

상위 docker run 명령어를 사용하여 다시 테스트 진행 (문제 없음확인)

GitLab의 Log 확인

$ docker logs gitlab | tail

2. Gitlab-CE 기본설정 및 확인

아래의 링크로 접속 (본인 IP접속가능 or Hostname)
http://localhost

Root Password를 설정

New User를 등록하고 관리시작
( Root 권한이라서 그런가 Sign in 이외 Resister 가 존재)

그룹 과 개인으로 구분해서 사용하자

그룹으로 생성

개인 Project 생성 및 기본 테스트

아직 Composer를 이용하여 별도설정 해보지 못했으며, 현재 테스트 용도로만 사용하고 있다.
그리고, Docker를 동시에 여러개 사용하지도 않는다.

Gitlab-CE 와 EE의 차이
http://developer.gaeasoft.co.kr/development-guide/gitlab/gitlab-introduce/

Raspberry PI에서 직접 GitLab Server 이용

https://hackernoon.com/create-your-own-git-server-using-raspberry-pi-and-gitlab-f64475901a66
https://projects.raspberrypi.org/en/projects/getting-started-with-git

11/14/2019

Custom Object Detection SSD / Faster RCNN 실행 및 분석 (3차분석)

1. Tensorflow 및 Custom Object Detection 위한 준비

Object Detection을 위한 준비를 위해서 아래와 같이 설치를 진행한다.

Tensorflow 설치를 진행
필요 Python Package / 필요 Package 설치진행
Model을 Download하여 진행

NVIDIA Docker 및 SSD Traning 2차분석
https://ahyuo79.blogspot.com/2019/10/docker-tensorflow.html

NVIDIA Docker 및 Tensorflow 기본 사용법
https://ahyuo79.blogspot.com/2019/10/nvidia-docker.html

Tensorflow Model
https://github.com/tensorflow/models

Tensorflow Model Branch 확인
Tensorflow의 Version에 Model source의 branch 변경하여 download
https://github.com/tensorflow/models/branches

1.1 Tensorflow 직접설치 및 설정

Tensorflow Object Detection를 사용하기 위해서는 아래와 같이 먼저 Tensorflow를 설치하고, 이후에 Object Detection Model을 Download와 관련 Package 설치한다.

Custom Object Detection 설치가이드
  https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html

1.2 General Tensorflow Docker 이용

Tensorflow Docker 기반으로 아래의 Model version을 Download하여 하나의 Image로 생성후 이를 진행하자.
이때 주의해야한 것은 Tags의 정보와 Tensorflow의 Version 일 것 같다.

Docker의 Tag 의미
  https://www.tensorflow.org/install/docker?hl=ko

Tensorflow Docker
Tensorflow version 과 상위의 model version을 같이 맞추도록하자
  https://hub.docker.com/r/tensorflow/tensorflow
  https://hub.docker.com/r/tensorflow/tensorflow/tags

1.3 NVIDIA Tensorflow Docker 이용

기존에 NVIDIA Tensorflow Docker를 설치하였던 것으로 이용 Object Detection을 사용가능.

NVIDIA Tensorflow Docker
  https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow/tags

이전에 NVIDIA SSD Docker 관련분석 참조
  https://ahyuo79.blogspot.com/2019/10/docker-tensorflow.html

2. Custom Data SET 구성

우선 다들 개와 고양이 사진으로 기본적으로 Custom DATA SET를 만들어 테스트를 진행하기에 나도 역시 쉽게 할수 있는 방법으로 시작

개와 고양이 사진 구하기 (DATASET)

$ cd ~/works/custom
$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download
$ python google_images_download/google_images_download.py --keywords "dogs" --size medium --output_directory ~/works/custom/data/
$ python google_images_download/google_images_download.py --keywords "cats" --size medium --output_directory ~/works/custom/data/

google image download 구할 수 있는 이미지들은 현재 제한적이며, 최대 100개까지 download가 가능하다.
옵션에서 limit를 100이상을 늘려도 한번에 100개이상의 image를 구할 수 없다.

google_image_download
https://google-images-download.readthedocs.io/en/latest/installation.html

google_image_download argument
https://google-images-download.readthedocs.io/en/latest/arguments.html

Image 정리 및 구성

$ cd ~/works/custom/data/
$ mkdir images        // Image들을 한곳정리  
$ mkdir annotation    // LableImg의 XML 저장장소 
$ mv ./dogs/*.jpg images/
$ mv ./cats/*.jpg images/

Annotation (LabelImg 사용, PascalVOC저장 )

$ cd ~/works/custom/labelImg    // labelImg 이미 이전에 설치됨
$ cat data/predefined_classes.txt   // Default Class 확인(개,고양이 있음), 만약 이름이 없다면, 새로생성 
dog
person
cat
tv
car
meatballs
marinara sauce
tomato soup
chicken noodle soup
french onion soup
chicken breast
ribs
pulled pork
hamburger

$ python3 labelImg.py   ~/works/custom/data/images    // images 안에 같이 xml 저장

주의사항
lableImg 실행 후 XML저장위치를 반드시 Change Save Dir ~/works/custom/data/annotation 설정
상위 정의 된 class의 순서가 달라도 상관 없지만 상위 이름과 label_map.pbtxt의 이름만 동일하면 된다.

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#annotating-images

Label_map 정의

$ cd ~/works/custom/data
$ vi label_map.pbtxt
item {
    id: 1
    name: 'cat'
}

item {
    id: 2
    name: 'dog'
}

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#creating-label-map

1.1 TF Record File 생성

다른블로그 혹은 Tensorflow 예제 사이트를 보면 XML->CSV 후 변환 CSV->TFRecord 로 변환하도록 하는데,
다른 소스들을 간단히 분석해보면 TF Record 작업은 거의 비슷한데, 왜 두번을 해야하는지 이해를 못해 아래와 같이 직접 변경시도

TF Record 만드는 법

현재 이방식으로 진행을 하지 않음
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#creating-tensorflow-records

TF_RECORD 생성

tf_record는 반드시 Tensorflow가 설치된 상태에서 실행가능

 root@3aac229c45c3:/workdir/models/research# pip install lxml
## 이전처럼 --data-dir path 주의 
root@3aac229c45c3:/workdir/models/research# python create_pascal_tf_record.py \
 --data_dir=/data \
 --annotations_dir=/data/annotation \
 --label_map_path=/data/label_map.pbtxt \
 --output_path=/data/pascal.record

Lablelimg TF Record 생성방법
https://ahyuo79.blogspot.com/2019/11/coco-set-annotation-tools.html

2. Custom Training/Evolution

Custom Model을 두개를 이용하여 테스트를 해보고 비교

2.1 Pre-trained Model Download

SSD (Single Shot MultiBox Detector)는 Feature extractor 용으로 별도의 Network를 구성해서 사용하고 있는데, 그 부분을 Download하여 기본구성을 갖춘다.

check 기본구성

$ cd ~/works/custom/check
$ mkdir -p models/configs
$ mkdir -p models/resnet_v1_50_2016_08_28
$ mkdir -p train_resnet                  //SSD-Resnet50   의  Checkpoint directory (Training 후 생성됨)
$ mkdir -p train_inception               //SSD-Inceptionv2 의  checkpoint directory (Training 후 생성됨)
$ mkdir -p fasterrcnn_train_resnet       //Faster RCNN-Resnet50 의 checkpoint directory (Training 후 생성됨)

Resnet 50 Download

$ cd ~/works/custom/check/models
$ wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
$ tar -xzf resnet_v1_50_2016_08_28.tar.gz
$ mv resnet_v1_50.ckpt resnet_v1_50_2016_08_28/model.ckpt

InceptionV2 Download

$ cd ~/works/custom/check/models
$ wget http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_11_06_2017.tar.gz
$ tar -xzf ssd_inception_v2_coco_11_06_2017.tar.gz

Pre-Trained model 정보
https://github.com/tensorflow/models/tree/master/research/slim

check 의 model 구성

$ cd ~/works/custom/check/models
$ tree 
.
├── configs                          // Pipeline Config 저장장소 (Resnet , Inceptionv2 ) 
├── resnet_v1_50_2016_08_28          // Resnet 50 (Pre-trained Model)
│   └── model.ckpt                   // checkpoint   
├── resnet_v1_50_2016_08_28.tar.gz
├── ssd_inception_v2_coco_11_06_2017    // Inception V2 (Pre-trained Model)
│   ├── frozen_inference_graph.pb          // Inception Pb file 
│   ├── graph.pbtxt                        // Inception Graph 구성 
│   ├── model.ckpt.data-00000-of-00001     // checkpoint
│   ├── model.ckpt.index
│   └── model.ckpt.meta
└── ssd_inception_v2_coco_11_06_2017.tar.gz

2.2 SSD / Faster RCNN Pipeline 설정

SSD의 경우 feature extractor로 Resnet 50 와 Inception V2 로 사용가능하며, 다른 Network로도 구성가능하다.
그리고, Pipleline의 Field들은 *.proto 에 선언이 되어있어야 동작이 가능한 것 같다.
나중에 시간이 된다면 면밀히 다시 봐야할 것 같다.

Docker Container 실행

$ docker run --gpus all --rm -it \
--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8888:8888 -p 6006:6006  \
-v /home/jhlee/works/custom/data:/data \
-v /home/jhlee/works/custom/check:/checkpoints \
--ipc=host \
--name nvidia_ssd \
nvidia_ssd

SSD-Resnet 50 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp configs/ssd320_full_1gpus.config  /checkpoints/models/configs
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/ssd320_full_1gpus.config 

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: true
    num_classes: 2    # label 갯수 (Cat/Dog) 
    box_coder {
      faster_rcnn_box_coder {   
        y_scale: 10.0              
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5    ## 테스트시, output_dict['detection_scores']가 0.5 이상인것만 
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
...
    image_resizer {            # 
      fixed_shape_resizer {
        height: 320
        width: 320
      }
    }
....

    feature_extractor {
      type: 'ssd_resnet50_v1_fpn'  # SSD의 feature extractor를 resnet 50 사용 
      fpn {
        min_level: 3
        max_level: 7
      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.0004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {                    ## post process 설정확인 
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100    ## Class당 100개설정         output_dict['detection_classes'] 
        max_total_detections: 100        ## Max detection 100개 설정  output_dict['num_detections']
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/checkpoints/models/resnet_v1_50_2016_08_28/model.ckpt"
  fine_tune_checkpoint_type: "classification"
  batch_size: 2            # OUT OF MEMORY 문제로 32->2 변경, GPU Memory가 많다면 그대로  
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 100         # steps 100000 -> 1000  (간단히 테스트용으로 변경, 실제 Training은 원래대로 )
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
....


train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # train TF Record 
  } 
  label_map_path: "/data/label_map.pbtxt" # label_map.pbtxt
}

eval_config: {
  #metrics_set: "coco_detection_metrics"
  #use_moving_averages: false
  num_examples: 8000   # eval 하지 않을 것이므로, 그대로 유지 
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # 현재 eval을 위한 tfrecord가 별도로 없음(Training과 동일하게 설정) 
  }
  label_map_path: "/data/label_map.pbtxt" # 설정만 변경 추후 
  shuffle: false
  num_readers: 1
}

Faster RCNN-Resnet 50 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp ./object_detection/samples/configs/faster_rcnn_resnet50_coco.config  /checkpoints/models/configs
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/faster_rcnn_resnet50_coco.config
model {
  faster_rcnn {
    num_classes: 2     # label 갯수 90->2 (Cat/Dog) 
    image_resizer {                 #  
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet50'     ## Resnet 50 사용확인 
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }

....
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6       ## IOU threhold 도 조절가능  
        max_detections_per_class: 100      ## 이전과 동일하게 Post Processing으로 Class당 Max 100개 
        max_total_detections: 300          ## 이전과 다르게 MAX 300 설정됨 
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}


train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/checkpoints/models/resnet_v1_50_2016_08_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 300          ### 전체 Step 수 200000->300 (임시테스트를 위해 변경)
  data_augmentation_options { 
    random_horizontal_flip {
    }
  }
}

....  
###  상위 SSD와 동일 

train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # train TF Record 
  } 
  label_map_path: "/data/label_map.pbtxt" # label_map.pbtxt
}

eval_config: {
  num_examples: 8000                                  ## evalution 
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"  # 설정만 변경 추후 eval을 사용할 경우 다시 변경 
  }
  label_map_path: "/data/label_map.pbtxt" # 설정만 변경 추후 
  shuffle: false
  num_readers: 1
}

Faster RCNN Precision FP32로 변경해서 실행해야하며, 현재 optimaizer 부분이 문제가 있다.
일단 Training은 되지만 관련부분을 자세히 볼 필요가 있다.

SSD Inception v2 Pipeline 설정변경

root@f46c490016e0:/workdir/models/research# cp ./object_detection/samples/configs/ssd_inception_v2_coco.config  /checkpoints/models/configs 
root@f46c490016e0:/workdir/models/research# vi /checkpoints/models/configs/ssd_inception_v2_coco.config 

model {
  ssd {
    num_classes: 2   ## Lable Number , label_map.pbtxt 참조 
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5   ## 테스트시, output_dict['detection_scores']가 0.5 이상인것만 
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }

..........

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }

..........
    feature_extractor {
      type: 'ssd_inception_v2'    # SSD의 feature_extractor를 Inception_v2로 사용 
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100  ## Class당 100개설정         output_dict['detection_classes'] 
        max_total_detections: 100      ## Max detection 100개 설정  output_dict['num_detections']
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 6     ## 24 -> 6  나의 경우 GPU 성능문제로 변경 
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/checkpoints/ssd_inception_v2_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 1000      ## 20000 -> 1000   랩탑에서 조금만 테스트하기 위해 변경 
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"          ## Train Record
  }
  label_map_path: "/data/label_map.pbtxt"      ## Train Labelmap
}

eval_config: {
  num_examples: 8000                                  ## evalution 
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/data/pascal.record"                 ## evalution의 test record
  }
  label_map_path: "/data/label_map.pbtxt"             ## evalution의 label map 
  shuffle: false
  num_readers: 1
}

세부 분석은 이전의 SSD 분석참조

eval_config 의 num_example
https://github.com/tensorflow/models/issues/5059
https://stackoverflow.com/questions/47086630/what-does-num-examples-2000-mean-in-tensorflow-object-detection-config-file

2.3 SSD / Faster RCNN Training

NVIDIA에서는 쉽게 Training 할 수 있도록 Shell Script로 쉽게 설정하였다. SSD의 경우 Precision을 FP16으로 사용하고 있지만,
Faster RCNN은 FP16으로 하면 에러가 발생하므로 주의해야한다.
간단히 Shell Script 내부를 보면 ./object_detection/model_main.py를 이용하여 실행하므로 이것으로 직접 실행해도 무방하다

Training Shell Script 수정 및 기본분석

root@1bfb89078878:/workdir/models/research# vi ./examples/SSD320_FP16_1GPU.sh     //Pipeline Config 부분 확인 및 수정 
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
### Pipeline 추가하고 Resnet50 or InceptionV2 중 선택사용 
## SSD-Resnet 50 Pipleline  (FP16지원, 기본설정 )
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

## SSD-Inception v2 Pipeline  (FP16지원, 추가설정)
#PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd_inception_v2_coco.config"

## Fastter RCNN-Resnet 50 Pipleline (FP32로만 사용)
#PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/faster_rcnn_resnet50_coco.config"

#FP16 PRESCISON MODE로 설정 (FP32로 설정시 주석처리) 
export TF_ENABLE_AUTO_MIXED_PRECISION=1


TENSOR_OPS=0
export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

time python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
       "${@:3}"

상위에서 본인 이 사용하고 싶은 Pipeline 을 정하고 아래와 같이 실행

SSD-Resnet 50 Training

root@f46c490016e0:/workdir/models/research#  bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/train_resnet /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path
..........

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/train_resnet

Fast-RCNN-Resnet 50 Training

root@f46c490016e0:/workdir/models/research#  bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/fasterrcnn_train_resnet /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path
..........

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/fasterrcnn_train_resnet

SSD-Inception V2 Training

root@1bfb89078878:/workdir/models/research# bash ./examples/SSD320_FP16_1GPU.sh /checkpoints/train_inception /checkpoints/models/configs 
// 1st checkpoints path , output
// 2nd pipeline path

Training 결과 인 checkpoint는 이곳에 저장: /checkpoints/train_inception

Training 후 생성된 CheckPoint File 확인 (e.g SSD-Resnet50)

root@f46c490016e0:/workdir/models/research# ls /checkpoints/train_resnet/
checkpoint                                   graph.pbtxt                       model.ckpt-0.index                  model.ckpt-300.data-00001-of-00002
eval                                         model.ckpt-0.data-00000-of-00002  model.ckpt-0.meta                   model.ckpt-300.index
events.out.tfevents.1574317170.7bdf29dc41cb  model.ckpt-0.data-00001-of-00002  model.ckpt-300.data-00000-of-00002  model.ckpt-300.meta

TF_ENABLE_AUTO_MIXED_PRECISION 관련내용
https://medium.com/tensorflow/automatic-mixed-precision-in-tensorflow-for-faster-ai-training-on-nvidia-gpus-6033234b2540

2.4 SSD validation/evaluation

Training 중 일부를 사용한다고 하며, Training 중 검증을 하기 위해서 사용한다고 하는데, 정확한 설정과 관련부분을 이해 해야 할 것 같다.

Shell script 수정

root@f46c490016e0:/workdir/models/research# vi examples/SSD320_evaluate.sh  //아래와 같이 pipeline 설정 
CHECKPINT_DIR=$1

TENSOR_OPS=0
export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

## Resnet or Inception 선택 
python object_detection/model_main.py --checkpoint_dir $CHECKPINT_DIR --model_dir /results --run_once --pipeline_config_path /checkpoints/models/configs/ssd320_full_1gpus.config

# python object_detection/model_main.py --checkpoint_dir $CHECKPINT_DIR --model_dir /results --run_once --pipeline_config_path /checkpoints/models/configs/ssd_inception_v2_coco.config

validation 실행

root@f46c490016e0:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints/train_resnet 
or 
root@f46c490016e0:/workdir/models/research# bash examples/SSD320_evaluate.sh /checkpoints/train_inception

상위 결과를 Tensorboard로 확인하고자 하면, 아래의 위치로 변경해서 확인

root@f46c490016e0:/workdir/models/research# ls /results/eval/    // /result/eval Tensorboard Log 생성 
events.out.tfevents.1574322912.74244b7e90c7

2.5 Training 과 Validation 기본분석

Training 과 Validation 명령어는 아래의 명령어로 동일하며, 현재 생각으로는 Training 만 해도 Validation도 같이 동작되는 것으로 생각이 된다.
그리고, pipeline config에 이미 관련 옵션을 설정을 했기 때문에 validation도 진행을 하는 것으로 생각하며,

이유는 Training 만 돌려도 Tensorboard의 Validation Log까지 나오는 것으로 봐도 그렇다.

이전의 Validation 전용 명령어는 --run_once를 넣어 eval-only 한번 돌리는 것 뿐인 것 같다.

root@f46c490016e0:/workdir/models/research# python object_detection/model_main.py -h

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Binary to run train and evaluation on object detection model.
flags:

object_detection/model_main.py:
  --[no]allow_xla: Enable XLA compilation
    (default: 'false')
  --checkpoint_dir: Path to directory holding a checkpoint.  If `checkpoint_dir` is provided, this binary operates in eval-only mode, writing
    resulting metrics to `model_dir`.
  --eval_count: How many times the evaluation should be run
    (default: '1')
    (an integer)
  --[no]eval_training_data: If training data should be evaluated for this job. Note that one call only use this in eval-only mode, and
    `checkpoint_dir` must be supplied.
    (default: 'false')
  --hparams_overrides: Hyperparameter overrides, represented as a string containing comma-separated hparam_name=value pairs.
  --model_dir: Path to output model directory where event and checkpoint files will be written.
  --num_train_steps: Number of train steps.
    (an integer)
  --pipeline_config_path: Path to pipeline config file.
  --[no]run_once: If running in eval-only mode, whether to run just one round of eval vs running continuously (default).
    (default: 'false')
  --sample_1_of_n_eval_examples: Will sample one of every n eval input examples, where n is provided.
    (default: '1')
    (an integer)
  --sample_1_of_n_eval_on_train_examples: Will sample one of every n train input examples for evaluation, where n is provided. This is only used if
    `eval_training_data` is True.
    (default: '5')
    (an integer)

root@f46c490016e0:/workdir/models/research# vi python object_detection/model_main.py 
..........
  if FLAGS.checkpoint_dir:
    if FLAGS.eval_training_data:    ## 기본이 FALSE
      name = 'training_data'
      input_fn = eval_on_train_input_fn   
    else:
      name = 'validation_data'     ## name은 이것으로 설정 
      # The first eval input will be evaluated.
      input_fn = eval_input_fns[0]
    if FLAGS.run_once:             ## validation 할 경우 이곳만 실행 
      estimator.evaluate(input_fn,
                         steps=None,
                         checkpoint_path=tf.train.latest_checkpoint(
                             FLAGS.checkpoint_dir))
    else:                          ##  Training 할 경우 이곳 실행 
      model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
                                train_steps, name)  
.........

이외에도 간단한 training 하는 명령어가 존재하며, 그것을 사용해도 상관 없다.

3. Inference (chpt -> pb)

Training 이 종료가 되면 아래와 같이 최종 Inference를 위해서 pb파일로 변경
파이프라인의 step의 숫자에 따라 checkpoint 파일명은 달라지므로, 본인의 설정에 따라 아래 명령도 변경

SSD-Resnet 50 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/ssd320_full_1gpus.config \
    --trained_checkpoint_prefix  /checkpoints/train_resnet/model.ckpt-100 \
    --output_directory /checkpoints/train_resnet/inference_graph_100

SSD-Inception V2 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/ssd_inception_v2_coco.config \
    --trained_checkpoint_prefix  /checkpoints/train_inception/model.ckpt-100 \
    --output_directory /checkpoints/train_inception/inference_graph_100

Faster RCNN-Resnet 50 inference

root@f46c490016e0:/workdir/models/research# python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path /checkpoints/models/configs/faster_rcnn_resnet50_coco.config \
    --trained_checkpoint_prefix  /checkpoints/fasterrcnn_train_resnet/model.ckpt-100 \
    --output_directory /checkpoints/fasterrcnn_train_resnet/inference_graph_100

4. Tensorboard 로 확인

Training or Validation 이 종료된 후 Tensorflow의 Log를 분석
Training에 관련된 부분만 분석

Tensorboard

root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/train_resnet  // SSD-Resnet50
or 
root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/train_inception     //SSD-Inceptionv2  
or
root@f46c490016e0:/workdir/models/research# tensorboard --logdir=/checkpoints/fasterrcnn_train_resnet     //Faster RCNN-Resnet50

Tensorboard Browser 연결

http://localhost:6006/

5. Object Detection TEST

jupyter를 이용하여 상위에서 만들어진 pb파일을 이용하여 Test Image를 준비하고 관련 소스를 수정하여 최종 테스트를 진행하자

root@f46c490016e0:/workdir/models/research# jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root

Jupyter 연결

http://localhost:8888/

object_detection/object_detection_tutorial.ipynb 를 실행하여 검증

5.1 object_detection_tutorial.ipynb 수정사항

현재 inference 한 pb파일을 가지고 object_detection/object_detection_tutorial.ipynb 에서 소스를 수정하여 가볍게 테스트가 가능하다.

Download 미실행하며, Variables 의 수정

MODEL_NAME = '/checkpoints/train_resnet/inference_graph_100'
PATH_TO_LABELS = os.path.join('/data', 'label_map.pbtxt')
PATH_TO_TEST_IMAGES_DIR = '/data/test_images'

기존의 소스는 Download를 진행하여 Pre-trained 된 모델을 바로 이용하는 것이지만, 이를 우리가 inference한 것으로 변경하고
TEST Image하여 테스트를 진행하자

GPU Memory 문제발생시 추가

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

Allocator (GPU_0_bfc) ran out of memory trying to allocate
https://eehoeskrap.tistory.com/360

5.2 object_detection_tutorial.ipynb 기본소스 이해

이 소스의 중요 포인트는 run_inference_for_single_image 이며 이곳에서 나온 출력 값을 test 이미지에 적용하여 테스트해보는 것이다.

아래의 key in에 있는 정보들은 반드시 상위 정의된 pipeline config와 연동이 되며, 이 부분을 알아두도록하자. (SSD기준)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])       ### 현재 Pipeline에서 100으로 정의해서 항상 100개를 찾음 
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)                                ### 내가 정의한 label_map.pbtxt 기준으로 100개를 찾음 
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]          ### bbox의 
      output_dict['detection_scores'] = output_dict['detection_scores'][0]        ### 100개의 각각의 Confidence를 알수 있지만, 화면에 표시되는 것은 Threshold값이 넘은 것들 
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,                                          ## image output
      output_dict['detection_boxes'],      ## bbox의  정보배열 100개  (4개의 정보)  ymin/ymax/xmin/ymax = box * height/ box * width
      output_dict['detection_classes'],    ## class  정보배열 100개   (1,2 )
      output_dict['detection_scores'],     ## confidence 정보배열 100개 (0.5 이상만표시)
      category_index,                                    ## 상위 내가 정의한 label_map.pbtxt 정보 
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)                                  ## line의 두께설정 
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  plt.title(image_path)

 # print("num_detections",output_dict['num_detections'])         ### Training이 되어 Max 100개 를 찾음 (상위 SSD Pipeline 부분 참조)
 # print("detection_boxes",output_dict['detection_boxes'])       ### 찾은 100개 배열 의 box의 위치 
 # print("detection_classes",output_dict['detection_classes'])   ### 찾은 100개 배열 의 class 1 or 2 (현재 1,2만 선언)
 # print("detection_scores",output_dict['detection_scores'])     ### 찾은 100개 배열 의 confidence 이며 pipeline의 threshold 값 이상인 것만 화면 표시 

  for i,v in enumerate (output_dict['detection_scores']):   ### i : index  v: list의 member 
      if v > 0.5:                                           ### 100의 중에 0.5가 넘는 것만 표시 
        print("  - class-name:", category_index.get(output_dict['detection_classes'][i]).get('name') )   ### category_index는 상위 정의된 lable_map.pbtxt 적용하여 이름을 출력          
        print("  - confidence: ",v * 100 )                  ### percent로 변경

6. 결론

SSD / Faster RCNN은 기본적으로 잘동작하고 있지만, 나의 랩탑에서 간단한 테스트는 가능하지만,

STEPS를 늘려 최종 테스트를 하는것은 힘들어서 Server에서 돌렸다.
(Laptop에서 문제가 발생하는 것은 거의 GPU Memory관련 문제였음)
Laptop에서는 GPU Memory를 항상 봐야하며, 한계가 있으며, Server 다르게 동작하므로 주의하도록 하자.

그리고, Transfer Learning 과 Fine Tuning은 개인적으로 지인의 일때문에, 한 달간 진행했지만, 좀 더 하면 금방익숙해 질거라고 본다.

나중에 기회가 되면 다시한번해보지만, 너무 어렵게 생각할 필요 없다.