Fix skip of test_training_gradient_checkpointing#34723
Fix skip of test_training_gradient_checkpointing#34723ydshieh merged 1 commit intohuggingface:mainfrom
Conversation
|
@amyeroberts : test failures seem unrelated. Please, advice when I should rebase/try again? |
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: huggingface#34722 Fixes: 19d58d3 ("Add MLLama (huggingface#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
@amyeroberts : rerun tests, no side fails now. Please, help review. |
|
Hi @dvrogozh, thank you for the PR. When I run things like
I see and I also check the body after the Could you provide us in which cases it will enter that body while |
|
Could you set breakpoint(s) to check which Also, is |
Shortly: if @ydshieh : this seems interesting. I was wondering why HF ci does not see this issue. Your comment above also suggests that you don't see the issue on your side. It seems I've found the reason. I have 2 systems, one with XPU, another with CUDA. Initially I saw issue on both systems. Yesterday, I fully cleaned and reinstall environment for XPU and issue was gone. So, the issue which I observe is triggered by environment difference. In particular, it shows up if the following package is installed: |
|
Most likely I got |
Most likely no. I am just building pytorch from sources on my side since I on purposely look for XPU backend issues in the most recent code. I did not try to check other pytorch versions, but the issue I see does not seem to be related to pytorch. |
|
Hi thanks for the information. Indeed, our CI env. doesn't have I will check it and see how a fix would be. Currently, without |
This sounds like we need to add |
As discussed in [1], pytest-subtests changes behavior of .skipTest() having effect to really skip individual subtests or skip the entire test if module is not installed. Huggingface Accelerate has module in its dependencies. It makes sense to add it for Transformers as well to avoid divergent environment between users and ci. See[1]: huggingface#34723 (comment) Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
Although I think installing fails with while change I am not sure I am doing stupid things 😭 |
|
without |
|
with So, it loops thru all 6 cases. It correctly skips first 3, but after that I don't understand what's going on. It should have reported 4 failures, but it reported 1 failure, 1 pass and ate up 2 others. And in the end only printed assertion failure for the last iteration :). |
|
Yeah. Maybe it's a bug. I will open an issue. |
You mean without output: |
|
yes, sorry. I mean |
|
@ydshieh : I filed #34755 to specifically consider what to do with subtests story. See breakdown on how it works in different cases. Really confusing... As for the
|
ArthurZucker
left a comment
There was a problem hiding this comment.
Leaving this up to you @ydshieh 🤗 I think this one makes sense!
ydshieh
left a comment
There was a problem hiding this comment.
Despite the weired issue from the pytest-subtests with subTest + skipTest, this PR itself makes sense.
Furthermore, it doesn't change the current behavior when pytest-subtests is not installed (although that behavior is not desirable).
Therefore LGTM to merge.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: huggingface#34722 Fixes: 19d58d3 ("Add MLLama (huggingface#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under the "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors.
Fixes: #34722
Fixes: 19d58d3 ("Add MLLama (#33703)")
CC: @amyeroberts, @ArthurZucker