Implement # sage.doctest: flaky marker#39539
Implement # sage.doctest: flaky marker#39539user202729 wants to merge 22 commits intosagemath:developfrom
Conversation
|
Documentation preview for this PR (built with commit e9b1b35; changes) is ready! 🎉 |
| @@ -1,3 +1,4 @@ | |||
| # sage.doctest: flaky | |||
There was a problem hiding this comment.
From the log it looks like this was correctly retried once. But it still timeout…? (what's the probability this fails twice?)
sagemathgh-39664: Add some 'not tested' marks to avoid CI failure As in the title. I don't think there's any advantage in running the test again. There's only a very small risk of the fixer forget to delete the marker, but it seems like a nonexistent issue (whichever pull request that fix it should also remove the `# not tested`) At least for those that doesn't segmentation fault or hang. (For those who do the only solution I can think of is sagemath#39539 ) Side note: not sure what's a good solution to this. Maybe we can do sagemath#39470 instead? (but then it doesn't apply to meson…) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#39664 Reported by: user202729 Reviewer(s):
sagemathgh-40814: Rerun plural and singular/function on failure This pull request: * add new feature `--all-except` to `sage -t` (does what you expect) * modify `ci-meson.yml` to workaround sagemath#29528 , the cause of which is yet unknown. (I also tried porting an old pull request that purportedly fix the issue at sagemath#39628, but the result is even worse.) controlling this in bash seems easier than sagemath#39539 , for now. I suspect testing these files separately will make it stop failing however (doesn't really matter, the bug remains). sagemath#40729 (comment) contains a traceback, but I think it isn't of too much help. (Thought? Is `--all --exclude=a --exclude=b` better?) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [ ] The title is concise and informative. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#40814 Reported by: user202729 Reviewer(s): Tobias Diez
sagemathgh-40814: Rerun plural and singular/function on failure This pull request: * add new feature `--all-except` to `sage -t` (does what you expect) * modify `ci-meson.yml` to workaround sagemath#29528 , the cause of which is yet unknown. (I also tried porting an old pull request that purportedly fix the issue at sagemath#39628, but the result is even worse.) controlling this in bash seems easier than sagemath#39539 , for now. I suspect testing these files separately will make it stop failing however (doesn't really matter, the bug remains). sagemath#40729 (comment) contains a traceback, but I think it isn't of too much help. (Thought? Is `--all --exclude=a --exclude=b` better?) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [ ] The title is concise and informative. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#40814 Reported by: user202729 Reviewer(s): Tobias Diez
|
Now that #40814 is merged, what's the plan to go forward here? |
|
I don't know, depends on if tests fail. (like, I hope tests don't fail but…?) on the other hand the other solution won't work on someone who is not running the CI. |
|
There are enough doctests that randomly fail and it would be nice to have a solution for this. It's annoying locally as well, but the main headache is CI. We could add essentially every file that has known flaky tests to the list in #40814. But I got the impression you would like to keep that list reserved for tests that fail with segfaults/timeouts which are hard to catch in python/doctest module. So your plan is to use this PR here for "normal" flaky tests? |
|
I mean, normal flaky test can also be tested with say ugly but works. (or migrate to pytest, where you probably gain some pytest marker thing (https://pytest-rerunfailures.readthedocs.io/stable/mark.html?), but lose preparsing and need to explicitly import, and make the test far away from the code…) |
|
I've now added those flaky tests also to the exception list in CI; was the easiest solution for now #41235. |
The intention is to avoid the annoying failing doctests.
Background on timed out tests
Basically one of the problems with occasional timeout is the following: Sometimes malloc need to hold a lock while doing something. If a signal comes while it is holding the lock, the next time malloc is called it will try to acquire the lock again and deadlock there.
A workaround is to unlock the malloc lock inside the signal handler — but how do you know which lock it is?
I can't reproduce this on my machine (in fact on my machine setting a gdb breakpoint in
__lll_lock_wait_privatedoesn't even hit it during the computation), so I can't figure out a way to fix it.Workaround
Files starting with
# sage.doctest: flakywill be ran once more time if they timeout. Same for segmentation fault. E.g.plural.pyx→ #39098📝 Checklist
⌛ Dependencies