Skip to content

Conversation

@saimidu
Copy link
Contributor

@saimidu saimidu commented May 11, 2021

Issue #, if available:

PR Checklist

  • I've prepended PR tag with frameworks/job this applies to : [mxnet, tensorflow, pytorch] | [ei/neuron] | [build] | [test] | [benchmark] | [ec2, ecs, eks, sagemaker]
  • (If applicable) I've documented below the DLC image/dockerfile this relates to
  • (If applicable) I've documented below the tests I've run on the DLC image
  • (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the Apache Software Foundation Third Party License Policy Category A or Category B license list. See https://www.apache.org/legal/resolved.html.
  • (If applicable) I've scanned the updated and new binaries to make sure they do not have vulnerabilities associated with them.

Benchmark Checklist

  • When creating a PR:
  • I've modified src/config/test_config.py in my PR branch by setting ENABLE_BENCHMARK_DEV_MODE = True
  • When PR is reviewed and ready to be merged:
  • I've reverted the code change on the config file mentioned above

Reviewer Checklist

  • For reviewer, before merging, please cross-check:
  • I've verified the code change on the config file mentioned above has already been reverted

Description:
PR to fix bug on TF2 SM benchmark tests where tests fail when aggregating throughput results due to overlapping log file names.

Tests run:

DLC image/dockerfile:

Additional context:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@saimidu saimidu self-assigned this May 11, 2021
@saimidu saimidu requested a review from junpuf May 11, 2021 02:34
pytest.skip("Skipping benchmark test on TF 1.x images.")

processor = "gpu" if "gpu" in image_uri else "cpu"
device_cuda_str = f"gpu-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this

Suggested change
device_cuda_str = f"gpu-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"
device_cuda_str = f"{processor}-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else processor

jeet4320
jeet4320 previously approved these changes May 11, 2021
Copy link
Contributor

@jeet4320 jeet4320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge after reverting changes which was made in config

pytest.skip("Skipping benchmark test on TF 1.x images.")

processor = "gpu" if "gpu" in image_uri else "cpu"
device_cuda_str = f"{processor}-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else "cpu" should be else processor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Adding this as well.

@saimidu saimidu merged commit 8469a38 into aws:master May 12, 2021
@saimidu saimidu deleted the fix_parallel_testing_on_tf_sm_benchmark branch May 12, 2021 02:05
aws-vrnatham pushed a commit to aws-vrnatham/deep-learning-containers that referenced this pull request May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants