python312Packages.reflex: disable flaky test by sarahec · Pull Request #423001 · NixOS/nixpkgs

sarahec · 2025-07-06T19:01:36Z

CheckPhase fails with FAILED tests/units/test_state.py::test_background_task_no_block[disk] - AssertionError: assert StateUpdate(d...], final=True) == StateUpdate(d...],...

Disabled test due to flaky behavior. (Doesn't pass or fail reliably.)

Discovered in nixpkgs-review of #421308

Things done

Add a 👍 reaction to pull requests you find important.

sarahec · 2025-07-06T19:09:51Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 423001
Commit: 0b48f4bdbcdd6bfd1ca3c2d81a95b4b96fadff62

`aarch64-darwin`

✅ 8 packages built:

python312Packages.reflex
python312Packages.reflex-chakra
python312Packages.reflex-chakra.dist
python312Packages.reflex.dist
python313Packages.reflex
python313Packages.reflex-chakra
python313Packages.reflex-chakra.dist
python313Packages.reflex.dist

GaetanLepage

Can this be reported upstream?

sarahec · 2025-07-06T19:32:37Z

Can this be reported upstream?

I'm checking if an update will fix it first. That may trigger a large fanout due to also updating python3Packages.click.

sarahec · 2025-07-06T20:24:06Z

Can this be reported upstream?

I'm checking if an update will fix it first. That may trigger a large fanout due to also updating python3Packages.click.

Looks like it was rewritten in the next release along with a substantial amount of code. But the update requires updating both python3Packages.click (massive fanout via flask, black, magic-wormhole, mitmproxy, typer, and flit-core) granian (possibly large fanout).

Can we just apply the simple fix for now and deal with that massive project later? This is to fix nixpkgs-review for, ultimately, duckdb

UPDATE: Updating click directly affects 11977 packages, and I have no idea of the fanout past that. We'd be well into staging territory.

sarahec · 2025-07-06T21:26:02Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 423001
Commit: 0b48f4bdbcdd6bfd1ca3c2d81a95b4b96fadff62

`x86_64-linux`

✅ 8 packages built:

python312Packages.reflex
python312Packages.reflex-chakra
python312Packages.reflex-chakra.dist
python312Packages.reflex.dist
python313Packages.reflex
python313Packages.reflex-chakra
python313Packages.reflex-chakra.dist
python313Packages.reflex.dist

pkgs/development/python-modules/reflex/default.nix

Pandapip1 · 2025-07-07T13:49:49Z

Blocking #421308

sarahec · 2025-07-07T18:21:07Z

@GaetanLepage I'm not sure what more I can do.

Before this fix, the problem shows up under nixpkgs-review but not under nix-build. The build log from review truncates the strings such that I can't point to a specific error. There's nothing in the upstream repo's issues. There's not enough detail to file an issue.

Blame on the test shows numerous changes since the release in nixpkgs. https://github.com/reflex-dev/reflex/blame/35da6a25a2e9f3eb9ca200d3f40e94fe2afb5e05/tests/units/test_state.py#L2205

I locally overrode the dependenies to build the next release and it didn't show the bug, so I said it "appears" fixed. I can't get any more precise on this as it's apparently flaky and may be due to an interaction in our infrastructure.

GaetanLepage · 2025-07-08T09:35:53Z

It's quite strange, and annoying, that this test is failing only in nixpkgs-review and not in a normal nix-build.
Are you sure, the failure doesn't occur when you are trying to build too many packages at one on the same system.

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

Pandapip1 · 2025-07-08T12:38:48Z

If the test is failing only under heavy load, then it's still a flaky test and should be disabled.

GaetanLepage · 2025-07-08T12:56:46Z

If the test is failing only under heavy load, then it's still a flaky test and should be disabled.

Definitely, but then it should be commented as such.

sarahec · 2025-07-09T02:47:31Z

I'm honestly puzzled now. The request started out as "can this be reported upstream?" which is generaally a nice-to-have (and which I'm generally happy to do). But this is resisting characterization and if I can't file a useful bug I won't. And the new version has changed significantly enough that filing a bug is likely moot.

Now it's an argument about what comment should go with disabling a single test and it appears to be blocking the merge.

The package maintainer has approved the PR as is.

This package has 3-4 small downstream apps, and the flaky test is fairly minor (generate an update running in the background, write to disk, then compare the result with a hardcoded object). It could be a race condition, a timeout, a timestamp, or needing a writeable directory.

This came up as a side-effect of running nixpkgs-review for chart-studio (a fairly significant package) which in turn is needed for ibis-framework. If I can get clean reviews on these, we can finally merge the duckdb update. (gpytorch has thrown a late wrench in the works, but I think I have identified the major issue and will file a bug here.)

Have we perhaps gone down a rabbit-hole?

pbsds · 2025-07-09T09:33:49Z

Thanks for pointing this out. I think the PR is good as-is, and none of the concerns raised should IMO be blocking.

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

I've run into cases where it builds the package for a different python version, see #419973 (comment).

GaetanLepage · 2025-07-09T12:00:00Z

Thanks for pointing this out. I think the PR is good as-is, and none of the concerns raised should IMO be blocking.

They were not meant to be indeed. Thanks for dealing with this @sarahec and sorry for the very slow review on my behalf.

pbsds · 2025-07-09T15:32:32Z

The way I read "slow review"s are IMO generally not a problem. You should not have to push yourself to respond in a overly frequent and timely manner. Instead aim to make each cycle significant, i.e. making your reviews be actionable or to conclude it all with a approve / merge / close. Open-ended observations and questions generally tend to lead to a "guessing game" for the PR author about what it is you think the PR is missing.

sarahec · 2025-07-09T15:58:58Z

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

I've run into cases where it builds the package for a different python version, see #419973 (comment).

I think it's load-related: during that review, the load average is over 100 and the swap partition was nearly full (and this is a fairly hefty machine). A test that hits the disk at that point has several ways to fail. But I needed the test to fail reliably a few times to verify the root cause.

Though, interestingly, I ran review in one screen and tried an ordinary nix-build in the other to try and simulate testing under load and couldn't reproduce it.

I dug through the code to look for weak points -- timestamps, possible timeouts, etc. -- and couldn't find any. The code in question is several years old, so it's probably an interaction between the test and our build tools.

In any case, load-sensitive failures often turn into Hydra failures and I'll typically disable such cases.

pbsds · 2025-07-09T18:00:34Z

Before i had tuned my machines with the correct max-jobs and max-cores per job, I also frequently saw flaky build failures with nixpkgs-review. Once I tuned my machines to have a minimum of 4 cores and 8GB RAM per job however the problem mostly went away. The nixpksg-review --eval local spilling into swap however is difficult to account for...

Luckily however i find that the infra team has done a great job tuning the hydra runners. While flakiness still happens its nowhere near as frequent as for most users running builds locally.

sarahec · 2025-07-09T18:29:09Z

Before i had tuned my machines with the correct max-jobs and max-cores per job, I also frequently saw flaky build failures with nixpkgs-review. Once I tuned my machines to have a minimum of 4 cores and 8GB RAM per job however the problem mostly went away. The nixpksg-review --eval local spilling into swap however is difficult to account for...

I have a 14-core 38GB M3 Max and run nixpkgs-review with --max-jobs=4 so this one fell inside your approach. (Well, 3.5 cores/build). It drove me a little nuts trying to isolate it.

pbsds · 2025-07-09T18:31:18Z

oof, then the test should indeed be disabled. I also frequently open PRs for failures i only see once in nixpkgs-review or on ofborg as well :)

sarahec requested a review from GaetanLepage July 6, 2025 19:01

nixpkgs-ci bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Jul 6, 2025

nix-owners bot requested a review from pbsds July 6, 2025 19:08

GaetanLepage reviewed Jul 6, 2025

View reviewed changes

nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Jul 6, 2025

GaetanLepage reviewed Jul 7, 2025

View reviewed changes

pkgs/development/python-modules/reflex/default.nix Outdated Show resolved Hide resolved

pbsds approved these changes Jul 7, 2025

View reviewed changes

nixpkgs-ci bot added 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages. labels Jul 7, 2025

Pandapip1 added the 1.severity: blocker This is preventing another PR or issue from being completed label Jul 7, 2025

sarahec force-pushed the reflex-magic-string branch from 0b48f4b to cfb613f Compare July 7, 2025 17:24

nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. and removed 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Jul 7, 2025

python312Packages.reflex: disable broken test due to use of magic string

8e990d8

sarahec force-pushed the reflex-magic-string branch from cfb613f to 8e990d8 Compare July 7, 2025 18:39

sarahec changed the title ~~python312Packages.reflex: disable broken test due to use of magic string~~ python312Packages.reflex: disable flaky test Jul 9, 2025

pbsds merged commit 73290a5 into NixOS:master Jul 9, 2025
26 of 29 checks passed

sarahec deleted the reflex-magic-string branch July 9, 2025 16:45

Uh oh!

Conversation

sarahec commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things done

Uh oh!

sarahec commented Jul 6, 2025

nixpkgs-review result

aarch64-darwin

Uh oh!

GaetanLepage left a comment

Choose a reason for hiding this comment

Uh oh!

sarahec commented Jul 6, 2025

Uh oh!

sarahec commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarahec commented Jul 6, 2025

nixpkgs-review result

x86_64-linux

Uh oh!

Uh oh!

Pandapip1 commented Jul 7, 2025

Uh oh!

sarahec commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaetanLepage commented Jul 8, 2025

Uh oh!

Pandapip1 commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaetanLepage commented Jul 8, 2025

Uh oh!

sarahec commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbsds commented Jul 9, 2025

Uh oh!

Uh oh!

GaetanLepage commented Jul 9, 2025

Uh oh!

pbsds commented Jul 9, 2025

Uh oh!

sarahec commented Jul 9, 2025

Uh oh!

pbsds commented Jul 9, 2025

Uh oh!

sarahec commented Jul 9, 2025

Uh oh!

pbsds commented Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sarahec commented Jul 6, 2025 •

edited

Loading

`nixpkgs-review` result

`aarch64-darwin`

sarahec commented Jul 6, 2025 •

edited

Loading

`nixpkgs-review` result

`x86_64-linux`

sarahec commented Jul 7, 2025 •

edited

Loading

Pandapip1 commented Jul 8, 2025 •

edited

Loading

sarahec commented Jul 9, 2025 •

edited

Loading