Hyperparameters

Hyperparameters

level of importance

    • learning rate
    • momentum term
    • mini-batch size
    • number of hidden units
    • layers
    • learning rate decay

random hyperparameters

  • previous
    • sample the points in a grid
  • current
    • choose the points at random
    • coarse to fine

learning rate

np.random.rand()

exponentially weighted averages

  • np.random.rand()

Train Models

  1. babysitting one model
  2. training many models in parallel

Batch Normalization

  • reduce the changing of input values
    • more stable
  • limit the amount of effect on distribution of values

Formula

  • normalize

  • Given some intermediate values in NN

  • hidden units

    • : learnable parameters
      • arange the values
    • use instead of

Batch Normalization With Mini-Batch Gradient Descent

  • handle one mini-batch at a time

    • eliminate
      • controls the shift or the biased t erms
    • ...

Implement

  • for t = 1 ... number of Mini-Batches

    • forward propagation on
      • in each hidden layer
      • use to replace
      • compute
    • backpropagation on
      • compute
      • update

    At Test Time

results matching ""

    No results matching ""