Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ashahba
Copy link

@ashahba ashahba commented Jul 9, 2019

When using the image gcr.io/deeplearning-platform-release/tf-cpu.1-14 and while following this steps: https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md
I get the following error:

2019-07-09 17:03:43.718942: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2019-07-09 17:03:43.771372: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0709 17:03:43.853285 139741926975296 deprecation_wrapper.py:119] From /workspace/models/tf_unet/unet.py:301: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0709 17:03:43.874177 139741926975296 deprecation.py:323] From /root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "/workspace/benchmarks/image_segmentation/tensorflow/unet/inference/fp32/unet_infer.py", line 78, in <module>
    prediction = net.predict(arg_parser.parse_args().ckpt_path, x_test)
  File "/workspace/models/tf_unet/unet.py", line 274, in predict
    self.restore(sess, model_path)
  File "/workspace/models/tf_unet/unet.py", line 302, in restore
    saver.restore(sess, model_path)
  File "/root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1278, in restore
    compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt
Ran inference with batch size 1
Log location outside container: /jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks/common/tensorflow/logs/benchmark_unet_inference_fp32_20190709_170331.log
nrvalgo_jenkinsadm@aipg-fm-skx-48:/jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks$ ls $CHECKPOINT_DIR/
checkpoint  events.out.tfevents.1548972182.4e4b03cdde24  model.ckpt.data-00000-of-00001  model.ckpt.index  model.ckpt.meta

@ashahba ashahba changed the title Ashahba/unet hotfix Fix for The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt Jul 9, 2019
@ashahba ashahba changed the title Fix for The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt Fix for "passed save_path is not a valid checkpoint: /checkpoints/model.cpkt" Jul 9, 2019
@ashahba ashahba changed the title Fix for "passed save_path is not a valid checkpoint: /checkpoints/model.cpkt" Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts Jul 9, 2019
@ashahba
Copy link
Author

ashahba commented Jul 9, 2019

@jakeret this is basically just bringing #202 up to date with master.
I also realized the issue with https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md was that I was using checkpoint_name=model.cpkt not realizing that it's now checkpoint_name=model.ckpt and I fixed our docs.

Thanks.

@ashahba
Copy link
Author

ashahba commented Jul 9, 2019

@mpjlu would you also please review and provide feedback if needed.

Thanks.

@jakeret
Copy link
Owner

jakeret commented Jul 11, 2019

hi @ashahba , thank you for your contribution.
I wasn't aware that this repo is being used in IntelAI benchmarks, nice.

I hadn't merged #202 because of two reasons

  • the thread handling should not be part of the PR as it has nothing to do with the dropout

  • In my understanding if we set e.g. keep_prop != 1 e.g. 0.5 it can't be changed for validation or prediction (where we don't want any regularization) as it is a fix part of the graph. Or am I missing something?

@ashahba
Copy link
Author

ashahba commented Jul 24, 2019

Thanks @jakeret
That sounds great. In the meantime I'm unblocked right now but I keep my eyes open for the any activity on #202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants