python - How to set parameters of the Adadelta Algorithm in Tensorflow correctly? -

- March 15, 2012

i've been using tensorflow regression purposes. neural net small 10 input neurons, 12 hidden neurons in single layer , 5 output neurons.

activation function relu
cost square distance between output , real value
my neural net trains correctly other optimizers such gradientdescent, adam, adagrad.

however when try use adadelta, neural net won't train. variables stay same @ every step.

i have tried every initial learning_rate possible (from 1.0e-6 10) , different weights initialization : same.

does have slight idea of going on ?

thanks much

short answer: don't use adadelta

very few people use today, should instead stick to:

tf.train.momentumoptimizer 0.9 momentum standard , works well. drawback have find best learning rate.
tf.train.rmspropoptimizer: results less dependent on learning rate. algorithm very similar adadelta, performs better in opinion.

if want use adadelta, use parameters paper: learning_rate=1., rho=0.95, epsilon=1e-6. bigger epsilon @ start, prepared wait bit longer other optimizers see convergence.

note in paper, don't use learning rate, same keeping equal 1.

long answer

adadelta has slow start. full algorithm paper is:

the issue accumulate square of updates.

at step 0, running average of these updates zero, first update small.
as first update small, running average of updates small @ beginning, kind of vicious circle @ beginning

i think adadelta performs better bigger networks yours, , after iterations should equal performance of rmsprop or adam.

here code play bit adadelta optimizer:

import tensorflow tf  v = tf.variable(10.) loss = v * v  optimizer = tf.train.adadeltaoptimizer(1., 0.95, 1e-6) train_op = optimizer.minimize(loss)  accum = optimizer.get_slot(v, "accum")  # accumulator of square gradients accum_update = optimizer.get_slot(v, "accum_update")  # accumulator of square updates  sess = tf.session() sess.run(tf.initialize_all_variables())  in range(100):     sess.run(train_op)     print "%.3f \t %.3f \t %.6f" % tuple(sess.run([v, accum, accum_update]))

the first 10 lines:

  v       accum     accum_update 9.994    20.000      0.000001 9.988    38.975      0.000002 9.983    56.979      0.000003 9.978    74.061      0.000004 9.973    90.270      0.000005 9.968    105.648     0.000006 9.963    120.237     0.000006 9.958    134.077     0.000007 9.953    147.205     0.000008 9.948    159.658     0.000009

Search This Blog

If cop