python - How to set parameters of the Adadelta Algorithm in Tensorflow correctly? -
i've been using tensorflow regression purposes. neural net small 10 input neurons, 12 hidden neurons in single layer , 5 output neurons.
- activation function relu
- cost square distance between output , real value
- my neural net trains correctly other optimizers such gradientdescent, adam, adagrad.
however when try use adadelta, neural net won't train. variables stay same @ every step.
i have tried every initial learning_rate possible (from 1.0e-6 10) , different weights initialization : same.
does have slight idea of going on ?
thanks much
short answer: don't use adadelta
very few people use today, should instead stick to:
tf.train.momentumoptimizer
0.9
momentum standard , works well. drawback have find best learning rate.tf.train.rmspropoptimizer
: results less dependent on learning rate. algorithm very similar adadelta, performs better in opinion.
if want use adadelta, use parameters paper: learning_rate=1., rho=0.95, epsilon=1e-6
. bigger epsilon
@ start, prepared wait bit longer other optimizers see convergence.
note in paper, don't use learning rate, same keeping equal 1
.
long answer
adadelta has slow start. full algorithm paper is:
the issue accumulate square of updates.
- at step 0, running average of these updates zero, first update small.
- as first update small, running average of updates small @ beginning, kind of vicious circle @ beginning
i think adadelta performs better bigger networks yours, , after iterations should equal performance of rmsprop or adam.
here code play bit adadelta optimizer:
import tensorflow tf v = tf.variable(10.) loss = v * v optimizer = tf.train.adadeltaoptimizer(1., 0.95, 1e-6) train_op = optimizer.minimize(loss) accum = optimizer.get_slot(v, "accum") # accumulator of square gradients accum_update = optimizer.get_slot(v, "accum_update") # accumulator of square updates sess = tf.session() sess.run(tf.initialize_all_variables()) in range(100): sess.run(train_op) print "%.3f \t %.3f \t %.6f" % tuple(sess.run([v, accum, accum_update]))
the first 10 lines:
v accum accum_update 9.994 20.000 0.000001 9.988 38.975 0.000002 9.983 56.979 0.000003 9.978 74.061 0.000004 9.973 90.270 0.000005 9.968 105.648 0.000006 9.963 120.237 0.000006 9.958 134.077 0.000007 9.953 147.205 0.000008 9.948 159.658 0.000009
Comments
Post a Comment