# Trick\_1

## Batch\_Normalization

```python
with tf.variable_scope('fc_1'):
    out = tf.layers.dense(out, 4000)
    out = tf.layers.batch_normalization(out, momentum=bn_momentum, training=is_training)
    out = tf.nn.relu(out)

...

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())
```

### - could be before or after relu, slight difference

<https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md>

### - is\_training = false

1. use BN internally stored average of mean and variance to normalize the batch, not the batch's own mean and variance.
2. BN internal variables also don't get updated.

### - update\_ops

<https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization>

when training, the *moving\_mean* and *moving\_variance* need to be updated. By default the update ops are placed in [`tf.GraphKeys.UPDATE_OPS`](https://www.tensorflow.org/api_docs/python/tf/GraphKeys#UPDATE_OPS), so they need to be executed alongside the`train_op`. Also, be sure to add any batch\_normalization ops before getting the update\_ops collection. Otherwise, update\_ops will be empty, and training/inference will not work properly.

## No activation layer @ output

In general no other activation function is used after the output layer (beside the softmax itself), slight

## Learning rate decay

```python
global_step= tf.train.get_or_create_global_step()

# decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
learning_rate = tf.train.exponential_decay(0.0001, global_step, decay_steps=50, decay_rate=0.1)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
```

## `tf.argmax`

```python
import tensorflow as tf
import numpy as np

A = [[0, 1, 2, 3, 4, 3, 2, 1, 0]]
B = [
    [1,3,4], 
    [2,4,1],
    [9,4,1]
]

with tf.Session() as sess:
    print(sess.run(tf.argmax(A, 1)))
    print(sess.run(tf.argmax(B, 1)))

# [4]
# [2 1 0]
```

## `sparse_softmax_cross_entropy` & `softmax_cross_entropy`

They produce the same result, and the difference is simple.

`sparse_*`

As such, with sparse functions, the dimensions of`logits`and`labels`are ***not the same***:`labels`contain one number per example, whereas`logits` is one-hot, denoting probabilities.

\>> For `sparse_softmax_cross_entropy`

1. labels shape = \[batch\_size]
2. label is an int
3. logits is one-hot
4. dtype int32 or int64

\= = =

\>> For `softmax_cross_entropy`

1. labels shape = \[batch\_size, num\_classes]
2. label is ***one-hot encoding***
3. logits is one-hot
4. dtype float32 or float64

## `tf.losses` already `reduce_mean`

```python
logits = build_model(is_training, X)
predictions = tf.argmax(logits, 1)

labels = tf.argmax(Y, 1)

loss_op = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
acc_op = tf.reduce_mean(tf.cast(tf.equal(labels, predictions), tf.float32))

global_step= tf.train.get_or_create_global_step()

optimizer = tf.train.AdamOptimizer(learning_rate=0.00001)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss_op, global_step=global_step)
```

<https://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow>

## Sigmoid vs softmax

In simple binary classification, there's no big difference between the two.

In case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a.*multi-labels*), while softmax deals with exclusive classes.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://huang-jason.gitbook.io/deep/batchnormalization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
