python312Packages.reflex: disable flaky test#423001
Conversation
|
GaetanLepage
left a comment
There was a problem hiding this comment.
Can this be reported upstream?
I'm checking if an update will fix it first. That may trigger a large fanout due to also updating |
Looks like it was rewritten in the next release along with a substantial amount of code. But the update requires updating both Can we just apply the simple fix for now and deal with that massive project later? This is to fix UPDATE: Updating |
|
|
Blocking #421308 |
0b48f4b to
cfb613f
Compare
|
@GaetanLepage I'm not sure what more I can do. Before this fix, the problem shows up under Blame on the test shows numerous changes since the release in nixpkgs. https://github.com/reflex-dev/reflex/blame/35da6a25a2e9f3eb9ca200d3f40e94fe2afb5e05/tests/units/test_state.py#L2205 I locally overrode the dependenies to build the next release and it didn't show the bug, so I said it "appears" fixed. I can't get any more precise on this as it's apparently flaky and may be due to an interaction in our infrastructure. |
cfb613f to
8e990d8
Compare
|
It's quite strange, and annoying, that this test is failing only in nixpkgs-review and not in a normal There should be no difference between |
|
If the test is failing only under heavy load, then it's still a flaky test and should be disabled. |
Definitely, but then it should be commented as such. |
|
I'm honestly puzzled now. The request started out as "can this be reported upstream?" which is generaally a nice-to-have (and which I'm generally happy to do). But this is resisting characterization and if I can't file a useful bug I won't. And the new version has changed significantly enough that filing a bug is likely moot. Now it's an argument about what comment should go with disabling a single test and it appears to be blocking the merge. The package maintainer has approved the PR as is. This package has 3-4 small downstream apps, and the flaky test is fairly minor (generate an update running in the background, write to disk, then compare the result with a hardcoded object). It could be a race condition, a timeout, a timestamp, or needing a writeable directory. This came up as a side-effect of running nixpkgs-review for Have we perhaps gone down a rabbit-hole? |
|
Thanks for pointing this out. I think the PR is good as-is, and none of the concerns raised should IMO be blocking.
I've run into cases where it builds the package for a different python version, see #419973 (comment). |
They were not meant to be indeed. Thanks for dealing with this @sarahec and sorry for the very slow review on my behalf. |
|
The way I read "slow review"s are IMO generally not a problem. You should not have to push yourself to respond in a overly frequent and timely manner. Instead aim to make each cycle significant, i.e. making your reviews be actionable or to conclude it all with a approve / merge / close. Open-ended observations and questions generally tend to lead to a "guessing game" for the PR author about what it is you think the PR is missing. |
I think it's load-related: during that review, the load average is over 100 and the swap partition was nearly full (and this is a fairly hefty machine). A test that hits the disk at that point has several ways to fail. But I needed the test to fail reliably a few times to verify the root cause. Though, interestingly, I ran review in one screen and tried an ordinary I dug through the code to look for weak points -- timestamps, possible timeouts, etc. -- and couldn't find any. The code in question is several years old, so it's probably an interaction between the test and our build tools. In any case, load-sensitive failures often turn into Hydra failures and I'll typically disable such cases. |
|
Before i had tuned my machines with the correct max-jobs and max-cores per job, I also frequently saw flaky build failures with nixpkgs-review. Once I tuned my machines to have a minimum of 4 cores and 8GB RAM per job however the problem mostly went away. The nixpksg-review Luckily however i find that the infra team has done a great job tuning the hydra runners. While flakiness still happens its nowhere near as frequent as for most users running builds locally. |
I have a 14-core 38GB M3 Max and run |
|
oof, then the test should indeed be disabled. I also frequently open PRs for failures i only see once in nixpkgs-review or on ofborg as well :) |
CheckPhase fails with
FAILED tests/units/test_state.py::test_background_task_no_block[disk] - AssertionError: assert StateUpdate(d...], final=True) == StateUpdate(d...],...Disabled test due to flaky behavior. (Doesn't pass or fail reliably.)
Discovered in
nixpkgs-reviewof #421308Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.