Trick_1
Batch_Normalization
with tf.variable_scope('fc_1'):
out = tf.layers.dense(out, 4000)
out = tf.layers.batch_normalization(out, momentum=bn_momentum, training=is_training)
out = tf.nn.relu(out)
...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())- could be before or after relu, slight difference
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md
- is_training = false
use BN internally stored average of mean and variance to normalize the batch, not the batch's own mean and variance.
BN internal variables also don't get updated.
- update_ops
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be executed alongside thetrain_op. Also, be sure to add any batch_normalization ops before getting the update_ops collection. Otherwise, update_ops will be empty, and training/inference will not work properly.
No activation layer @ output
In general no other activation function is used after the output layer (beside the softmax itself), slight
Learning rate decay
tf.argmax
tf.argmaxsparse_softmax_cross_entropy & softmax_cross_entropy
sparse_softmax_cross_entropy & softmax_cross_entropyThey produce the same result, and the difference is simple.
sparse_*
As such, with sparse functions, the dimensions oflogitsandlabelsare not the same:labelscontain one number per example, whereaslogits is one-hot, denoting probabilities.
>> For sparse_softmax_cross_entropy
labels shape = [batch_size]
label is an int
logits is one-hot
dtype int32 or int64
= = =
>> For softmax_cross_entropy
labels shape = [batch_size, num_classes]
label is one-hot encoding
logits is one-hot
dtype float32 or float64
tf.losses already reduce_mean
tf.losses already reduce_meanhttps://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow
Sigmoid vs softmax
In simple binary classification, there's no big difference between the two.
In case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a.multi-labels), while softmax deals with exclusive classes.
Last updated
Was this helpful?