Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I confirm that the deterministic environment variable is working? #2

Closed
cklsoft opened this issue Sep 17, 2019 · 5 comments
Closed
Labels
question Further information is requested

Comments

@cklsoft
Copy link

cklsoft commented Sep 17, 2019

I've use the newest NGC container, and specify os.environ['TF_DETERMINISTIC_OPS'] = '1' at the begin of my main entry. But I don't know wether the environ is work or not. I have set tf log level to debug and didn't find any tf determinism relate log.

@duncanriach
Copy link
Collaborator

Hi @cklsoft, there is currently nothing printed to the logs by TensorFlow that confirms that the environment variable has been acted upon. I have made a note to potentially add this in the future.

@cklsoft
Copy link
Author

cklsoft commented Sep 18, 2019

there is currently nothing printed to the logs by TensorFlow that confirms that the environment variable has been acted upon. I have made a note to potentially add this in the future.

Will TF_DETERMINISTIC_OPS increment the number of graph nodes?

@duncanriach
Copy link
Collaborator

That's a great idea.

Yes, if you observe the number of graph nodes when running with TF_DETERMINISTIC_OPS not set (or set to '0' or 'false') and then observe them again with TF_DETERMINISTIC_OPS set to '1' or 'true' then you should see the number of graph nodes increase with the current implementation (NGC 19.06, NGC 19.07, and stock TF 1.14).

Note that TF_DETERMINISTIC_OPS is sticky in the python process; it's queried and then cached by TensorFlow the first time it's used. So, to operate without it, you need to run from scratch.

@duncanriach
Copy link
Collaborator

duncanriach commented Oct 31, 2019

The ultimate test is whether your weights at the end of training change from run to run.

For Keras models, you can call the following at the end of training, and make sure it produces the same result on two consecutive runs:

def summarize_keras_weights(model):
  weights = model.get_weights()
  summary = sum(map(lambda x: x.sum(), weights))
  print("Summary of weights: %.13f" % summary)

If you're not using Keras, it would look something like this:

def summarize_weights(session):
  if hasattr(session, 'raw_session'): session = session.raw_session()
  weights = session.run(tf.trainable_variables())
  summary = sum(map(lambda x: x.sum(), weights))
  print("Summary of weights: %.13f" % summary)

It's also good to confirm that your weights are the same, on both runs, before training starts.

Please note that while the above code is based on code I've used, the code as given above has not been tested. It may contain bugs and/or may not work on more recent versions of TensorFlow or Keras.

@duncanriach
Copy link
Collaborator

This question has been answered, and there is nothing else to be done here. Closing.

@duncanriach duncanriach changed the title How to confirm determinism environ is work? [question] How can I confirm that the deterministic environment variable is working? Jan 17, 2020
@duncanriach duncanriach added the question Further information is requested label Jan 17, 2020
@duncanriach duncanriach changed the title [question] How can I confirm that the deterministic environment variable is working? How can I confirm that the deterministic environment variable is working? Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants