Jeonghun (James) Lee

2/05/2019

MNIST 관련자료

1. MNIST 의 기본이해

가장 기본이되는 Neural Network로 MNIST는 Tensorflow를 가던, Keras를 가던 가장 기본이되는 학습이며, Hello World 처럼 이 구조를 이해를 해야,

CNN이든 다른 구조를 이해하기가 쉽다.

기본동작

28 픽셀 의 정사각형으로 데이타 입력 28x28 = 784 Pixel로 구성
이 정사각형에 입력된 데이타를 각 단계를 걸쳐 최종 각 0~9의 숫자로 인식

1의 경우 아래의 28x28 =784 Pixel 로 표시되며 이는 Matrix로 보면 다음과 같다.

https://en.wikipedia.org/wiki/MNIST_database

전제조건

칼라가 아닌 흑백이며 명암구분됨 (Gray Scale)

MNIST의 모델 구조 및 동작

Input Layer는 상위 Matrix와 일치 (총 784 Pixel 구성)
Hidden Layer1(renu)는 128 Node로 구성 (Hidden Layer의 Node구성 변경가능)
Hidden Layer2(renu)는 64 Node로 구성 (Hidden Layer의 Node구성 변경가능)
Output Layer(softmax) 10 Node로 classfication 구성되며, 이곳에서 숫자선택됨 (다중 선택도 가능)
상위 선택된 10개 중 다시 최종 확률이 높은 것으로 결정

아래의 구조를 MLP(MultiLayer Peceptron)이라고 하며 이는 CNN의 기초라고 하는데, 각 함수들을 이해하도록 해야겠다.

동작방식은 각 Layer를 걸쳐 특징을 추출하여, 이를 최종선택되는 구조이다.

최종 선택시 사용되는 것이 Activation Function 이며, 아래와 같이 각 ( Softmax/Sigmoid/Relu) 사용되어진다.

Activation Function

Weight, Bias를 적용되어진 상태에서 어느 기준에서 On/Off를 하는 것이라고 생각하면 될꺼 같다.

그러므로, 각 함수의 특징을 어느 정도는 인지를 하고 있어야 할 것으로 보인다.

쉽게 생각하면, Threshold 를 어디에 어떻게 적용하는 역할이 될 꺼 이기 때문인 걸로 생각되어진다.

https://en.wikipedia.org/wiki/Activation_function

기본 MNIST 모델 구성확인

Activation 함수로 Sigmoid 대신 ReLu를 사용했으며, 변경해서 사용해도 된다.

https://mxnet.apache.org/versions/1.3.1/tutorials/python/mnist.html

세부설명 및 참조

https://mxnet.apache.org/versions/1.3.1/tutorials/python/mnist.html

https://mlfromscratch.com/neural-network-tutorial/#/

1.1 Weight 의 Matrix 구성방법

각 Layer를 연결할때 필요한 것이 Weight 와 Bias 이며, 이를 Matrix 형태로 구성하여 Feature를 추출한다고 한다.

Input 과 Weight Matrix를 구성하는 방법이며, 쉽게 생각하면 행렬(Matrix)를 어떻게 해서 Faeture를 추출할지를 구성하는 방법이다.

Weight 의 Matrix 의 구성방식

Feature를 Column 로 구성
Feature를 Row로 구성

Feature as Column 방식

Column: Feature ( Weight 곱)
Row: Sample 의 갯수 (각 Node)

상위 X들은 Sample 즉 각 Node의 갯수

Feature as Row 방식

Column: Sample의 갯수 (각 Node)
Row: Feature

Matrix의 기본동작과 구성방식

Feature as Column 방식 : X * W + B
Feature as Row 방식 : W * X + B

Feature의 위치에 따라 상위와 같이 Weight의 위치가 변경이 되어진다.

Matrix 로 Weight 와 Bias 결합

각 Layer는 Weight의 합으로 연결되어 Activation 함수를 걸쳐 최종 Output Activate를 결정함

Activation 함수: Sigmoid /ReLu 등 다양함

1.2 Feature as Column 방식 Matrix 예제

각 Layer 구성이 다음과 같이 구성이 되어있다고 생각하고, 각 Layer0 과 Layer1을 Matrix로 연결해보자.

2개의 Layer Feature as Column 방식으로 연결

Layer0: 4 Node
Layer1: 6 Node

Layer0의 Input Node 4 와 Weight는 반드시 동일

Layer1 의 Output Node 이므로, Output Node 6개는 Bias와 동일하게 맞춤

최종 Layer 0 과 1의 구성되는 Matrix

이곳에 최종 Activation 함수만 적용하면된다.

상위 각 Matix의 구성 살펴보자 (Red: X , Blue: W , Green:B/Y)

상위그림을 보고 쉽게 이해하자.

Weight/Bias 의 Matrix 의 이해

https://medium.com/from-the-scratch/deep-learning-deep-guide-for-all-your-matrix-dimensions-and-calculations-415012de1568

https://cs231n.github.io/linear-classify/

1.3 MNIST의 의 Matrix 구성과 Bias 의 필요성

상위 맨위의 MNIST를 각 Layer 중 Input Layer 와 Hidden Layer1을 Matrix로 구성을 보도록하자.

MNIST의 Input Layer와 Hidden Layer1 연결

Input Layer: 784 Node
Hidden Layer1: 128 Node

구성으로 될 것이며, 이를 상위와 같이 Matrix를 구성하면 상위를 참조하여 구성한 후 ReLu로 최종 Activation을 하면된다.

Bias 의 필요성

처음 Bias의 필요성에 대해서 왜 사용하는지 몰랐으나, 지금까지 이해한 것으로는 단지 offset이 아닌 Activation의 최종 Threshold를 조절하기 위해서 보정값으로 봐야 할 것 같다.

Activation 함수들 과 Bias

더불어 Activation 함수들이 많은데, 왜 이 많은 함수들을 사용하는지 완벽히 이해하지 못했지만, 짐작으로 보면,

각 Activation 함수의 각 동작의 조건이 달라지기 때문으로 보인다.

예를들면, 각 x의 range에 따라 y의 range 범위의 변경이 되던가 혹은 함수마다 range, 범위가 다른것이 존재하여 이를 목적에 맞게 사용하는 것으로 보인다.

더불어 bias는 이곳에 offset으로 범위(range)의 위치를 변경하여, 최종 Threshold 값을 결정하여 on/off를 결정하는 것으로 보인다.

Weight 와 Bias 와 Activation

Weight: 각 Neuron에는 Weight , 즉 가중치값이 존재하며, 이는 각 특징을 추출
Bias: 일종의 최종 offset 이며, Activation 값에 영향을 미쳐 변경
Activation: Machin Learning에서 On / OFF 를 결정하는 함수로 각 특성이 있음

Sigmoid
ReLu
Softmax 등 다양함

https://en.wikipedia.org/wiki/Activation_function

Matrix 구성으로 본 Weight와 Bias

Input Layer: 784 Node
Hidden Layer1: 128 Node

상위 예제로 간단하게 Matrix를 구성해보도록 하고 Weight 와 Bias를 넣어보자.

W: Weight 784x128구성
X: Input Node (Sample) 784 구성하지만, 옆으로 구성
B: Bias 128 구성 (Output과 항상 동일)
Y: Output Node (Result) 128 구성되며 최종 Activation 함수 적용(ReLu)

https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html

Matrix Input/Output 구성방법

Layer간에 Input /Output Node의 갯수의 구성을 맞추기 위해서 Matrix를 수정

https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html

두 Layer 간의 구성

Weight 와 Input 적용 (W1 * X1 + W2* X2 + ... W64 * X64)
Bias 적용 (W1 * X1 + W2* X2 + ... W64 * X64) + B1
Activation 적용 Active( (W1 * X1 + W2* X2 + ... W64 * X64) + B1 )

Weight 와 Bias를 넣은 후 최종 Softmax 연산

각 Activation 함수들 특징들은 아래 링크 참조

ReLU

Activation Function으로 일정값 이상이면, On이 되어 동작

https://en.wikipedia.org/wiki/Rectifier_%28neural_networks%29

Softmax

https://en.wikipedia.org/wiki/Softmax_function

Softmax vs Sigmoid

https://medium.com/arteos-ai/the-differences-between-sigmoid-and-softmax-activation-function-12adee8cf322#:~:text=Softmax%20is%20used%20for%20multi,similar%20to%20the%20Sigmoid%20function.&text=This%20is%20main%20reason%20why%20the%20Softmax%20is%20cool.

Get started with TensorFlow's High-Level APIs (Google I/O '18)

Colab에 대해서 나오며, 설명을 보면 MNIST 관련부분과 CNN 설명해주고 있다.
https://www.youtube.com/watch?v=tjsHSIG8I08

Tensorflow의 MNIST 설명

기본 MNIST

https://tensorflowkorea.gitbooks.io/tensorflow-kr/content/g3doc/tutorials/mnist/beginners/

CNN을 이용한 MNIST

https://tensorflowkorea.gitbooks.io/tensorflow-kr/content/g3doc/tutorials/mnist/pros/

피드 구독하기: 글 ( Atom )

Github Page

2/05/2019

MNIST 관련자료