Neural Network(NN)로 MNIST 학습하기(ReLU, xavier initialization, Drop out) for tensorflow

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

Deep Learning study

Neural Network(NN)로 MNIST 학습하기(ReLU, xavier initialization, Drop out) for tensorflow 본문

AI/Tensorflow

Neural Network(NN)로 MNIST 학습하기(ReLU, xavier initialization, Drop out) for tensorflow

HwaniL.choi 2018. 1. 4. 16:44

이전에 공부했었던 softmax함수를 이용한 학습법은 정확도가 89-91% 정도였다.

하지만 이번에 해본 NN을 이용한 코드를 돌려보았더니 훨씬 좋은 결과가 나왔다.

1. ReLU

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

tf.set_random_seed(777)

mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
  
training_epochs = 15
 
batch_size = 100
 
x = tf.placeholder(tf.float32, [None,784])
y = tf.placeholder(tf.float32, [None,10])
 
w1 = tf.Variable(tf.random_normal([784,256]))
b1 = tf.Variable(tf.random_normal([256]))
l1 = tf.nn.relu(tf.matmul(x,w1) + b1)
 
w2 = tf.Variable(tf.random_normal([256,256]))
b2 = tf.Variable(tf.random_normal([256]))
l2 = tf.nn.relu(tf.matmul(l1,w2) + b2)
 
w3 = tf.Variable(tf.random_normal([256,10]))
b3 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(l2,w3) + b3
 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits= hypothesis, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
 
sess = tf.Session()
sess.run(tf.global_variables_initializer())
 
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)
 
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {x: batch_xs, y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
 
    print("Epoch:", "%04d" % (epoch +1), 'cost =', '{:.9f}'.format(avg_cost))
 
print("Learning Finished!")
 
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
 
 
Colored by Color Scripter
cs

데이터 셋은 tensorflow에 기본적으로 들어있는 mnist data set 을 이용하였다. 이것은 처음 한번 실행할때 다운받고 그 뒤로는 그냥 쓰면 된다 .

이 코드에서는 기존의 sigmoid를 relu 로 바꾸어 보았다.

그 이유는 sigmoid보다 relu함수가 성능이 더 좋다고 한다. 왜냐하면 sigmoid는 출력값이 0~1 사이로 한정적이기 때문인데, 이것이 왜 문제냐고 한다면 NN으로 학습시키는 원리에 답이 있다.

NN으로 학습시킬때 출력값이 입력값에 얼마나 영향을 받는지 알기위해 미분을 통해 그 변화량을 구해나간다. 구해나가는 과정에서 Chain rule 을 적용시키기 때문에 값들을 계속 곱해나가는데 , 이 sigmoid연산을 거치면서 계속 매우 작은 값이 나온다면 결과적으로는 그 변화량이 0에 가까워지기 때문에 입력값이 어떤지에 상관없이 결국 입력값이 출력값에 영향을 거의 주지 못하게 된다.

물론 많은 Layer를 쓰지 않는 이 코드에서는 sigmoid로도 상관은 없지만, 점점 많아질 수록 sigmoid는 성능을 발휘하지 못하게 된다고 한다.

ReLU는 y=max(0,x) 이렇게 생긴 함수이다.

ReLU 함수

sigmoid 함수

어찌됐든, 출력값을 보면 94%이상의 정확도를 보여준다.

2. xavier initialization

조금더 높은 성능을 보일수는 없을까?

성능을 조금더 높이는 방법은 랜덤으로 초기값을 잡아왔던 W(weight)값을 잘 건드려 주는것이다.

그 방법은 xavier initialization 이라고 하는 방법이다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

tf.set_random_seed(777)
 
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
 
training_epochs = 15
 
batch_size = 100
 
x = tf.placeholder(tf.float32, [None,784])
y = tf.placeholder(tf.float32, [None,10])
 
w1 = tf.get_variable("W1",shape = [784,256],initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([256]))
l1 = tf.nn.relu(tf.matmul(x,w1) + b1)
 
w2 = tf.get_variable("W2", shape = [256,256],initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([256]))
l2 = tf.nn.relu(tf.matmul(l1,w2) + b2)
 
w3 = tf.get_variable("W3", shape = [256,256],initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([256]))
l3 = tf.nn.relu(tf.matmul(l2,w3) + b3)
 
w4 = tf.get_variable("W4", shape = [256,10],initializer=tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(l3,w4) + b4
 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits= hypothesis, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
 
sess = tf.Session()
sess.run(tf.global_variables_initializer())
 
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)
 
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {x: batch_xs, y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
 
    print("Epoch:", "%04d" % (epoch +1), 'cost =', '{:.9f}'.format(avg_cost))
 
print("Learning Finished!")
 
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
 
 
Colored by Color Scripter
cs

코드를 보면 바뀐거라고는 tf.Variable()함수를 tf.get_variable()함수로 바꾼것이고, 그에 맞게 parameter를 추가해줬다.

initializer=tf.contrib.layers.xavier_initializer() 이부분이 xavier initialization을 쓰겠다는 뜻이다. 이것의 작동원리는 나도 잘 모르겠다 ... 나중에 더 알아봐야겠다.

여튼 출력값을 보면 97.87의 정확도를 보여준다

성능이 향상되었다.

여기서 한가지 봐야할 것은 cost값이다.

xavier initialization을 사용했을때에는 처음부터 cost가 매우 적다. 그 이유는 xavier initialization을 이용하였기 때문이라고 할 수 있다. 초기화가 매우 잘 되어있기 때문에 cost가 적게 나오게된다.

3. Drop out

다음으로 좀더 향상된 성능을 보이기 위해서 쓸 방법은 Drop out이라는 방법이다.

drop out은 overfitting을 줄이기 위한 정규화 기법이다. 동작 과정은 학습과정에서 neural network에 있는 몇몇의 unit들을 동작하지 않게하고 나머지들로만 학습을 시키는 것이다. 솔직히 잘 되는게 신기하지만,, 잘 된다고 한다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
 
tf.set_random_seed(777)
 
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
 
training_epochs = 15
 
batch_size = 100
 
keep_prob = tf.placeholder(tf.float32)
 
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
 
w1 = tf.get_variable("W1",shape = [784,512])
b1 = tf.Variable(tf.random_normal([512]))
l1 = tf.nn.relu(tf.matmul(x,w1) + b1)
l1 = tf.nn.dropout(l1, keep_prob=keep_prob)
 
w2 = tf.get_variable("W2", shape = [512,512])
b2 = tf.Variable(tf.random_normal([512]))
l2 = tf.nn.relu(tf.matmul(l1,w2) + b2)
l2 = tf.nn.dropout(l2, keep_prob=keep_prob)
 
 
w3 = tf.get_variable("W3", shape = [512,512])
b3 = tf.Variable(tf.random_normal([512]))
l3 = tf.nn.relu(tf.matmul(l2,w3) + b3)
l3 = tf.nn.dropout(l3, keep_prob=keep_prob)
 
w4 = tf.get_variable("W4", shape = [512,512])
b4 = tf.Variable(tf.random_normal([512]))
l4 = tf.nn.relu(tf.matmul(l3,w4) + b4)
l4 = tf.nn.dropout(l4, keep_prob=keep_prob)
 
 
w5 = tf.get_variable("W5", shape = [512,10])
b5 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(l4,w5) + b5
 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits= hypothesis, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
 
sess = tf.Session()
sess.run(tf.global_variables_initializer())
 
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)
 
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {x: batch_xs, y: batch_ys, keep_prob : 0.7}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
 
    print("Epoch:", "%04d" % (epoch +1), 'cost =', '{:.9f}'.format(avg_cost))
 
print("Learning Finished!")
 
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob : 1}))
 
 
Colored by Color Scripter
cs

여기서는 조심해야 할것이 keep_prob이다. 학습을 시킬때에는 keep_prob값을 1미만의 값으로 어느정도 쓸것인지 정해주지만, 테스트를 할 때에는 반드시 1을 써주어야 한다. 테스트를 할때에는 모든 유닛들을 다 사용해야 하기 때문이다 !

결과를 보면

무려 98%가 넘는 정확도가 나온다

위의 결과들이 별 차이 없어보인다고 느낄 수 있지만 , 1%올리는 일이 엄청나게 힘들다고 한다.

어쨌든 오늘 정리 끝 !! !

여기있는 코드나 내용들은 sung kim(김성훈) 교수님의 동영상 강의를 보고 공부한 내용들이며, 처음 입문하는 중이라 많이 미숙한것 같다. 열심히 해야지 :)

저작자표시 비영리 변경금지

'AI > Tensorflow' 카테고리의 다른 글

Recurrent Neural Network(RNN)with time series data (0)	2018.01.11
Recurrent Neural Network(RNN)을 사용해보자 ! (0)	2018.01.09
Convolutional Neural Network(CNN)으로 MNIST 99%이상 해보기 (0)	2018.01.08

'AI/Tensorflow' Related Articles

Comments

Deep Learning study

Neural Network(NN)로 MNIST 학습하기(ReLU, xavier initialization, Drop out) for tensorflow 본문

Neural Network(NN)로 MNIST 학습하기(ReLU, xavier initialization, Drop out) for tensorflow

'AI > Tensorflow' 카테고리의 다른 글

티스토리툴바