[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names #1098

saimidu · 2021-05-11T02:32:32Z

Issue #, if available:

PR Checklist

I've prepended PR tag with frameworks/job this applies to : [mxnet, tensorflow, pytorch] | [ei/neuron] | [build] | [test] | [benchmark] | [ec2, ecs, eks, sagemaker]
(If applicable) I've documented below the DLC image/dockerfile this relates to
(If applicable) I've documented below the tests I've run on the DLC image
(If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the Apache Software Foundation Third Party License Policy Category A or Category B license list. See https://www.apache.org/legal/resolved.html.
(If applicable) I've scanned the updated and new binaries to make sure they do not have vulnerabilities associated with them.

Benchmark Checklist

When creating a PR:

I've modified src/config/test_config.py in my PR branch by setting ENABLE_BENCHMARK_DEV_MODE = True

When PR is reviewed and ready to be merged:

I've reverted the code change on the config file mentioned above

Reviewer Checklist

For reviewer, before merging, please cross-check:

I've verified the code change on the config file mentioned above has already been reverted

Description:
PR to fix bug on TF2 SM benchmark tests where tests fail when aggregating throughput results due to overlapping log file names.

Tests run:

DLC image/dockerfile:

Additional context:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

src/config/build_config.py

src/config/test_config.py

tensorflow/buildspec.yml

jeet4320 · 2021-05-11T17:10:10Z

...dlc_tests/benchmark/sagemaker/tensorflow/training/test_performance_tensorflow_sm_training.py

        pytest.skip("Skipping benchmark test on TF 1.x images.")

    processor = "gpu" if "gpu" in image_uri else "cpu"
+    device_cuda_str = f"gpu-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"


How about this

Suggested change

device_cuda_str = f"gpu-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"

device_cuda_str = f"{processor}-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else processor

jeet4320

ok to merge after reverting changes which was made in config

jeet4320 · 2021-05-11T19:49:33Z

...dlc_tests/benchmark/sagemaker/tensorflow/training/test_performance_tensorflow_sm_training.py

        pytest.skip("Skipping benchmark test on TF 1.x images.")

    processor = "gpu" if "gpu" in image_uri else "cpu"
+    device_cuda_str = f"{processor}-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"


nit: else "cpu" should be else processor

My bad. Adding this as well.

…s#1098)

saimidu added 2 commits May 10, 2021 19:28

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names

f26a820

Modify configs, and build TF 2.3 for multiple cuda versions

65df954

saimidu self-assigned this May 11, 2021

saimidu commented May 11, 2021

View reviewed changes

src/config/build_config.py Outdated Show resolved Hide resolved

src/config/test_config.py Outdated Show resolved Hide resolved

tensorflow/buildspec.yml Outdated Show resolved Hide resolved

saimidu requested a review from junpuf May 11, 2021 02:34

jeet4320 reviewed May 11, 2021

View reviewed changes

Fix as suggested in review

ae021b7

jeet4320 previously approved these changes May 11, 2021

View reviewed changes

jeet4320 reviewed May 11, 2021

View reviewed changes

saimidu added 2 commits May 11, 2021 12:51

Make correction

3624672

Revert all config changes

1cac551

saimidu dismissed jeet4320’s stale review via 1cac551 May 11, 2021 19:52

jeet4320 approved these changes May 11, 2021

View reviewed changes

saimidu added 2 commits May 11, 2021 12:55

Merge branch 'master' into fix_parallel_testing_on_tf_sm_benchmark

e72c1ed

Merge branch 'master' into fix_parallel_testing_on_tf_sm_benchmark

ff65b73

saimidu merged commit 8469a38 into aws:master May 12, 2021

saimidu deleted the fix_parallel_testing_on_tf_sm_benchmark branch May 12, 2021 02:05

aws-vrnatham pushed a commit to aws-vrnatham/deep-learning-containers that referenced this pull request May 13, 2021

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names (aw…

beeee4c

…s#1098)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names #1098

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names #1098

Uh oh!

saimidu commented May 11, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeet4320 May 11, 2021

Uh oh!

jeet4320 left a comment

Uh oh!

jeet4320 May 11, 2021

Uh oh!

saimidu May 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	device_cuda_str = f"gpu-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else "cpu"
	device_cuda_str = f"{processor}-{get_cuda_version_from_tag(image_uri)}" if processor == "gpu" else processor

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names #1098

[test][benchmark][sagemaker][tensorflow,mxnet] Fix log file names #1098

Uh oh!

Conversation

saimidu commented May 11, 2021

PR Checklist

Benchmark Checklist

Reviewer Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeet4320 May 11, 2021

Choose a reason for hiding this comment

Uh oh!

jeet4320 left a comment

Choose a reason for hiding this comment

Uh oh!

jeet4320 May 11, 2021

Choose a reason for hiding this comment

Uh oh!

saimidu May 11, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants