Trick_1
Batch_Normalization
with tf.variable_scope('fc_1'):
out = tf.layers.dense(out, 4000)
out = tf.layers.batch_normalization(out, momentum=bn_momentum, training=is_training)
out = tf.nn.relu(out)
...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())
- could be before or after relu, slight difference
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md
- is_training = false
use BN internally stored average of mean and variance to normalize the batch, not the batch's own mean and variance.
BN internal variables also don't get updated.
- update_ops
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS
, so they need to be executed alongside thetrain_op
. Also, be sure to add any batch_normalization ops before getting the update_ops collection. Otherwise, update_ops will be empty, and training/inference will not work properly.
No activation layer @ output
In general no other activation function is used after the output layer (beside the softmax itself), slight
Learning rate decay
global_step= tf.train.get_or_create_global_step()
# decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
learning_rate = tf.train.exponential_decay(0.0001, global_step, decay_steps=50, decay_rate=0.1)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
tf.argmax
tf.argmax
import tensorflow as tf
import numpy as np
A = [[0, 1, 2, 3, 4, 3, 2, 1, 0]]
B = [
[1,3,4],
[2,4,1],
[9,4,1]
]
with tf.Session() as sess:
print(sess.run(tf.argmax(A, 1)))
print(sess.run(tf.argmax(B, 1)))
# [4]
# [2 1 0]
sparse_softmax_cross_entropy
& softmax_cross_entropy
sparse_softmax_cross_entropy
& softmax_cross_entropy
They produce the same result, and the difference is simple.
sparse_*
As such, with sparse functions, the dimensions oflogits
andlabels
are not the same:labels
contain one number per example, whereaslogits
is one-hot, denoting probabilities.
>> For sparse_softmax_cross_entropy
labels shape = [batch_size]
label is an int
logits is one-hot
dtype int32 or int64
= = =
>> For softmax_cross_entropy
labels shape = [batch_size, num_classes]
label is one-hot encoding
logits is one-hot
dtype float32 or float64
tf.losses
already reduce_mean
tf.losses
already reduce_mean
logits = build_model(is_training, X)
predictions = tf.argmax(logits, 1)
labels = tf.argmax(Y, 1)
loss_op = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
acc_op = tf.reduce_mean(tf.cast(tf.equal(labels, predictions), tf.float32))
global_step= tf.train.get_or_create_global_step()
optimizer = tf.train.AdamOptimizer(learning_rate=0.00001)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss_op, global_step=global_step)
https://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow
Sigmoid vs softmax
In simple binary classification, there's no big difference between the two.
In case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a.multi-labels), while softmax deals with exclusive classes.
Last updated
Was this helpful?