with tf.variable_scope('fc_1'): out = tf.layers.dense(out,4000) out = tf.layers.batch_normalization(out,momentum=bn_momentum,training=is_training) out = tf.nn.relu(out)...update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)with tf.control_dependencies(update_ops): train_op = optimizer.minimize(loss_op,global_step=tf.train.get_global_step())
- could be before or after relu, slight difference
when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be executed alongside thetrain_op. Also, be sure to add any batch_normalization ops before getting the update_ops collection. Otherwise, update_ops will be empty, and training/inference will not work properly.
No activation layer @ output
In general no other activation function is used after the output layer (beside the softmax itself), slight
They produce the same result, and the difference is simple.
sparse_*
As such, with sparse functions, the dimensions oflogitsandlabelsare not the same:labelscontain one number per example, whereaslogits is one-hot, denoting probabilities.
In simple binary classification, there's no big difference between the two.
In case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a.multi-labels), while softmax deals with exclusive classes.