Trick_1
Last updated
Was this helpful?
Last updated
Was this helpful?
use BN internally stored average of mean and variance to normalize the batch, not the batch's own mean and variance.
BN internal variables also don't get updated.
In general no other activation function is used after the output layer (beside the softmax itself), slight
tf.argmax
sparse_softmax_cross_entropy
& softmax_cross_entropy
They produce the same result, and the difference is simple.
sparse_*
As such, with sparse functions, the dimensions oflogits
andlabels
are not the same:labels
contain one number per example, whereaslogits
is one-hot, denoting probabilities.
>> For sparse_softmax_cross_entropy
labels shape = [batch_size]
label is an int
logits is one-hot
dtype int32 or int64
= = =
>> For softmax_cross_entropy
labels shape = [batch_size, num_classes]
label is one-hot encoding
logits is one-hot
dtype float32 or float64
tf.losses
already reduce_mean
In simple binary classification, there's no big difference between the two.
In case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a.multi-labels), while softmax deals with exclusive classes.
when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in , so they need to be executed alongside thetrain_op
. Also, be sure to add any batch_normalization ops before getting the update_ops collection. Otherwise, update_ops will be empty, and training/inference will not work properly.