Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.get_variable() error, variable does not exist or was not created #3

Open
SimpleXP opened this issue Feb 17, 2017 · 19 comments
Open

Comments

@SimpleXP
Copy link

My tensorflow version is 0.12.1

when I run run_main.py, I got this error

"ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/cond/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?"

Any one has any idea?

@davidz-zzz
Copy link

Maybe you could add:
with tf.variable_scope(tf.get_variable_scope(), reuse=False):
before ema.apply

carpedm20/DCGAN-tensorflow#59

@loliverhennigh
Copy link

This worked for me! (tensorflow 1.0 alpha

@chulaihunde
Copy link

chulaihunde commented Mar 5, 2017

This do not worked for me! (tensorflow 1.0 nightly)

Traceback (most recent call last):

File "main.py", line 55, in <module>
  tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
  _sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 44, in main
  FLAGS.optimizer_param)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 197, in create_network
  scope_reuse=True)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 118, in _discriminator
  h_bn = utils.batch_norm(h_conv, dims[index + 1], train_phase, scope="disc_bn%d" % index)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 145, in batch_norm
  lambda: (ema.average(batch_mean), ema.average(batch_var)))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1741, in cond
  orig_res, res_t = context_t.BuildCondBranch(fn1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1642, in BuildCondBranch
  r = fn()
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 139, in mean_var_with_update
  ema_apply_op = ema.apply([batch_mean, batch_var])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 375, in apply
  colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 135, in create_zeros_slot
  colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 112, in create_slot
  return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
  validate_shape=val.get_shape().is_fully_defined())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1033, in get_variable
  use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 932, in get_variable
  use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
  validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
  use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
  "VarScope?" % name)
ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

@kunrenzhilu
Copy link

It spent me two days to figure out the workaround and ends up with failure. It seems the reason is that although the fake_discriminator set the scope_reuse to be True, however, the tf.cond() statement will create a new control_flow every time, such that the get_variable() cannot retrieve the corresponding variables from the real_discriminator and throw a ValueError .../.../discriminator_1/disc_bn1/... blablabla. Bcuz according to my understanding, there shouldn't be a nested scope ../../discriminator_1 and nested ../../../disc_bn1. Tell me if I am wrong.
Anyway, I cannot make changes base on the original code. My workaround was to change to tf.contrib.layers.batch_norm(). Done with one statement.

@lengoanhcat
Copy link

@kunrenzhilu : could you be more specific about how you modify tf.contrib.layers.batch_norm() ? I am struggling with the same problem stated above.

@bottlecapper
Copy link

bottlecapper commented May 14, 2017

I have the same problem. After adding:
with tf.variable_scope(tf.get_variable_scope(), reuse=False):
before ema.apply
There comes another problem at model.initialize_network(FLAGS.logs_dir):

Traceback (most recent call last):
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 45, in main
    model.initialize_network(FLAGS.logs_dir)
  File "/media/jg/F/20170514/models/GAN_models.py", line 225, in initialize_network
    self.sess.run(tf.global_variables_initializer())
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'Placeholder', defined at:
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 43, in main
    FLAGS.optimizer_param)
  File "/media/jg/F/20170514/models/GAN_models.py", line 173, in create_network
    self._setup_placeholder()
  File "/media/jg/F/20170514/models/GAN_models.py", line 149, in _setup_placeholder
    self.train_phase = tf.placeholder(tf.bool)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1587, in placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2043, in _placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]


Process finished with exit code 1

@AshishBora
Copy link

AshishBora commented May 14, 2017

Check this out. It seems to have fixed the problems for me.
https://github.com/AshishBora/WassersteinGAN.tensorflow/commit/1c6cfa1c20959e9dcca01f0a96f7ca8c54403d1a

UPDATE: After training for 8+ hours with this change, the GAN seems to not learn anything and the loss ranges (for d_loss and g_loss) are way off.

UPDATE 2: I trained with this commit and TF v1.1.0. It seems to have learned to produce faces.

@kinsumliu
Copy link

@AshishBora
Hi, may you report the number you get for generator and discriminator loss?
I am doing WGAN for MNIST images and I see g loss is ~200 and d loss is ~0.003 in the first hour.

@RyanHangZhou
Copy link

@kunrenzhilu could you give a concrete solution to the problem? I can hardly solve it either.

@ayrtondenner
Copy link

@AshishBora your commits are giving a 404 for me, could you show how did you fixed it?

@AshishBora
Copy link

AshishBora commented Feb 14, 2018

@ayrtondenner I changed line 115 here to something like:

h_bn = tf.contrib.layers.batch_norm(inputs=h_conv, decay=0.9, epsilon=1e-5, is_training=train_phase, scope="disc_bn%d" % index)

@ayrtondenner
Copy link

ayrtondenner commented Feb 14, 2018

I should change line 326 too, right? They are both batch_norm inside a discriminator network.

@AshishBora
Copy link

Yup, that seems right.

@ayrtondenner
Copy link

I'm already running it, seems like now it's going to work. Anyway, do you know why your commits are 404'd by now?

@AshishBora
Copy link

Great. Oh, 404 is because I deleted my fork some time ago since I wasn't using it anymore.

@ayrtondenner
Copy link

I see. I had to re-run it since there were still some minor changes because of TensorFlow 1.0 and compatibility issues. Anyway, do you still have these commits? It would be nice to see if you did any other code changes.

@AshishBora
Copy link

I have a local copy of the whole repo. I have uploaded a zip here.

@ayrtondenner
Copy link

I had the network training during 10 hours, 11k epochs, and that's the result I got. It still not a human face, but I wanted to know if the training is going ok or not, because as you said above, you can run the network but it doesn't mean its necessarily working. Also, I changed both utils.batch_norm calls in the discriminator network, but just realized that there are also calls in the generator network, maybe I can replace them to see if it will work better.

Loss functions

Network images

@shimafoolad
Copy link

On tensorflow 1.12.0, I had the same problem and fixed it by adding the line:

        with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):

before ema.apply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests