Adding support for bf16_full_eval#610
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@regisss I have enabled bf16_full_eval and verified the same with t5 inference. |
|
@bhargaveede please also update (1) summarization README file to include this extra argument for t5-3b prediction example, (2) performance targets in test for both G1 and G2. |
|
@regisss Can your review and merge this? |
|
@regisss Can we merge this? |
|
Looks good to me. We can merge it once @regisss also reviews. |
|
I'll review it this week 👍 |
| ("facebook/bart-large-cnn", "Habana/bart", 4.691, 26.0688, 2, 1), | ||
| ("t5-3b", "Habana/t5", 2.28, 21.56, 2, 1), | ||
| ("t5-3b", "Habana/t5", 2.88, 21.56, 2, 1), | ||
| ], | ||
| } | ||
| else: | ||
| # Gaudi1 CI baselines | ||
| MODELS_TO_TEST = { | ||
| "bf16": [ | ||
| ("facebook/bart-large-cnn", "Habana/bart", 2.588, 26.0688, 2, 1), | ||
| ("t5-3b", "Habana/t5", 0.585, 21.72, 2, 1), | ||
| ("t5-3b", "Habana/t5", 0.98, 21.56, 2, 1), |
There was a problem hiding this comment.
For Gaudi1, I get a RougeLsum of 21.3831 and a throughput of 1.005. It doesn't matter much since the test passes (no need to update the numbers).
For Gaudi2 however, runs are not reproducible it seems. I get different RougeLsum from one run to another, is it something you also observed?
There was a problem hiding this comment.
I didn't get different RougeLsum. When I added perf numbers, I ran the test twice to check and I was getting same RogueLsum. Let me check again.
There was a problem hiding this comment.
Interesting, did you run it with Synapse 1.13?
There was a problem hiding this comment.
I could see the variation. However, I'm seeing variation on v1.9-release too for the test "test_run_summarization_t5-small_multi_card". Can you confirm if it's same on your end
There was a problem hiding this comment.
I cannot run multi-card tests on my Gaudi2 instance at the moment but if you observed the same behavior for "test_run_summarization_t5-small_multi_card" it means that this "issue" was already there before.
Anyway, tests still pass so I'm going to merge it and I'll investigate that later.
What does this PR do?
Fixes # (issue)
Before submitting