Skip to content

python312Packages.reflex: disable flaky test#423001

Merged
pbsds merged 1 commit intoNixOS:masterfrom
sarahec:reflex-magic-string
Jul 9, 2025
Merged

python312Packages.reflex: disable flaky test#423001
pbsds merged 1 commit intoNixOS:masterfrom
sarahec:reflex-magic-string

Conversation

@sarahec
Copy link
Contributor

@sarahec sarahec commented Jul 6, 2025

CheckPhase fails with FAILED tests/units/test_state.py::test_background_task_no_block[disk] - AssertionError: assert StateUpdate(d...], final=True) == StateUpdate(d...],...

Disabled test due to flaky behavior. (Doesn't pass or fail reliably.)

Discovered in nixpkgs-review of #421308

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Nixpkgs 25.11 Release Notes (or backporting 25.05 Nixpkgs Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
  • NixOS 25.11 Release Notes (or backporting 25.05 NixOS Release notes)
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other contributing documentation in corresponding paths.

Add a 👍 reaction to pull requests you find important.

@sarahec sarahec requested a review from GaetanLepage July 6, 2025 19:01
@nixpkgs-ci nixpkgs-ci bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Jul 6, 2025
@nix-owners nix-owners bot requested a review from pbsds July 6, 2025 19:08
@sarahec
Copy link
Contributor Author

sarahec commented Jul 6, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 423001
Commit: 0b48f4bdbcdd6bfd1ca3c2d81a95b4b96fadff62


aarch64-darwin

✅ 8 packages built:
  • python312Packages.reflex
  • python312Packages.reflex-chakra
  • python312Packages.reflex-chakra.dist
  • python312Packages.reflex.dist
  • python313Packages.reflex
  • python313Packages.reflex-chakra
  • python313Packages.reflex-chakra.dist
  • python313Packages.reflex.dist

Copy link
Contributor

@GaetanLepage GaetanLepage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be reported upstream?

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Jul 6, 2025
@sarahec
Copy link
Contributor Author

sarahec commented Jul 6, 2025

Can this be reported upstream?

I'm checking if an update will fix it first. That may trigger a large fanout due to also updating python3Packages.click.

@sarahec
Copy link
Contributor Author

sarahec commented Jul 6, 2025

Can this be reported upstream?

I'm checking if an update will fix it first. That may trigger a large fanout due to also updating python3Packages.click.

Looks like it was rewritten in the next release along with a substantial amount of code. But the update requires updating both python3Packages.click (massive fanout via flask, black, magic-wormhole, mitmproxy, typer, and flit-core) granian (possibly large fanout).

Can we just apply the simple fix for now and deal with that massive project later? This is to fix nixpkgs-review for, ultimately, duckdb

UPDATE: Updating click directly affects 11977 packages, and I have no idea of the fanout past that. We'd be well into staging territory.

@sarahec
Copy link
Contributor Author

sarahec commented Jul 6, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 423001
Commit: 0b48f4bdbcdd6bfd1ca3c2d81a95b4b96fadff62


x86_64-linux

✅ 8 packages built:
  • python312Packages.reflex
  • python312Packages.reflex-chakra
  • python312Packages.reflex-chakra.dist
  • python312Packages.reflex.dist
  • python313Packages.reflex
  • python313Packages.reflex-chakra
  • python313Packages.reflex-chakra.dist
  • python313Packages.reflex.dist

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages. labels Jul 7, 2025
@Pandapip1 Pandapip1 added the 1.severity: blocker This is preventing another PR or issue from being completed label Jul 7, 2025
@Pandapip1
Copy link
Member

Blocking #421308

@sarahec sarahec force-pushed the reflex-magic-string branch from 0b48f4b to cfb613f Compare July 7, 2025 17:24
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. and removed 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. labels Jul 7, 2025
@sarahec
Copy link
Contributor Author

sarahec commented Jul 7, 2025

@GaetanLepage I'm not sure what more I can do.

Before this fix, the problem shows up under nixpkgs-review but not under nix-build. The build log from review truncates the strings such that I can't point to a specific error. There's nothing in the upstream repo's issues. There's not enough detail to file an issue.

Blame on the test shows numerous changes since the release in nixpkgs. https://github.com/reflex-dev/reflex/blame/35da6a25a2e9f3eb9ca200d3f40e94fe2afb5e05/tests/units/test_state.py#L2205

I locally overrode the dependenies to build the next release and it didn't show the bug, so I said it "appears" fixed. I can't get any more precise on this as it's apparently flaky and may be due to an interaction in our infrastructure.

@sarahec sarahec force-pushed the reflex-magic-string branch from cfb613f to 8e990d8 Compare July 7, 2025 18:39
@GaetanLepage
Copy link
Contributor

It's quite strange, and annoying, that this test is failing only in nixpkgs-review and not in a normal nix-build.
Are you sure, the failure doesn't occur when you are trying to build too many packages at one on the same system.

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

@Pandapip1
Copy link
Member

Pandapip1 commented Jul 8, 2025

If the test is failing only under heavy load, then it's still a flaky test and should be disabled.

@GaetanLepage
Copy link
Contributor

If the test is failing only under heavy load, then it's still a flaky test and should be disabled.

Definitely, but then it should be commented as such.

@sarahec
Copy link
Contributor Author

sarahec commented Jul 9, 2025

I'm honestly puzzled now. The request started out as "can this be reported upstream?" which is generaally a nice-to-have (and which I'm generally happy to do). But this is resisting characterization and if I can't file a useful bug I won't. And the new version has changed significantly enough that filing a bug is likely moot.

Now it's an argument about what comment should go with disabling a single test and it appears to be blocking the merge.

The package maintainer has approved the PR as is.

This package has 3-4 small downstream apps, and the flaky test is fairly minor (generate an update running in the background, write to disk, then compare the result with a hardcoded object). It could be a race condition, a timeout, a timestamp, or needing a writeable directory.

This came up as a side-effect of running nixpkgs-review for chart-studio (a fairly significant package) which in turn is needed for ibis-framework. If I can get clean reviews on these, we can finally merge the duckdb update. (gpytorch has thrown a late wrench in the works, but I think I have identified the major issue and will file a bug here.)

Have we perhaps gone down a rabbit-hole?

@sarahec sarahec changed the title python312Packages.reflex: disable broken test due to use of magic string python312Packages.reflex: disable flaky test Jul 9, 2025
@pbsds
Copy link
Member

pbsds commented Jul 9, 2025

Thanks for pointing this out. I think the PR is good as-is, and none of the concerns raised should IMO be blocking.

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

I've run into cases where it builds the package for a different python version, see #419973 (comment).

@pbsds pbsds merged commit 73290a5 into NixOS:master Jul 9, 2025
26 of 29 checks passed
@GaetanLepage
Copy link
Contributor

Thanks for pointing this out. I think the PR is good as-is, and none of the concerns raised should IMO be blocking.

They were not meant to be indeed. Thanks for dealing with this @sarahec and sorry for the very slow review on my behalf.

@pbsds
Copy link
Member

pbsds commented Jul 9, 2025

The way I read "slow review"s are IMO generally not a problem. You should not have to push yourself to respond in a overly frequent and timely manner. Instead aim to make each cycle significant, i.e. making your reviews be actionable or to conclude it all with a approve / merge / close. Open-ended observations and questions generally tend to lead to a "guessing game" for the PR author about what it is you think the PR is missing.

@sarahec
Copy link
Contributor Author

sarahec commented Jul 9, 2025

There should be no difference between nixpkgs-review --build-args "--max-jobs 1" and nix-build -A python3Packages.reflex.

I've run into cases where it builds the package for a different python version, see #419973 (comment).

I think it's load-related: during that review, the load average is over 100 and the swap partition was nearly full (and this is a fairly hefty machine). A test that hits the disk at that point has several ways to fail. But I needed the test to fail reliably a few times to verify the root cause.

Though, interestingly, I ran review in one screen and tried an ordinary nix-build in the other to try and simulate testing under load and couldn't reproduce it.

I dug through the code to look for weak points -- timestamps, possible timeouts, etc. -- and couldn't find any. The code in question is several years old, so it's probably an interaction between the test and our build tools.

In any case, load-sensitive failures often turn into Hydra failures and I'll typically disable such cases.

@sarahec sarahec deleted the reflex-magic-string branch July 9, 2025 16:45
@pbsds
Copy link
Member

pbsds commented Jul 9, 2025

Before i had tuned my machines with the correct max-jobs and max-cores per job, I also frequently saw flaky build failures with nixpkgs-review. Once I tuned my machines to have a minimum of 4 cores and 8GB RAM per job however the problem mostly went away. The nixpksg-review --eval local spilling into swap however is difficult to account for...

Luckily however i find that the infra team has done a great job tuning the hydra runners. While flakiness still happens its nowhere near as frequent as for most users running builds locally.

@sarahec
Copy link
Contributor Author

sarahec commented Jul 9, 2025

Before i had tuned my machines with the correct max-jobs and max-cores per job, I also frequently saw flaky build failures with nixpkgs-review. Once I tuned my machines to have a minimum of 4 cores and 8GB RAM per job however the problem mostly went away. The nixpksg-review --eval local spilling into swap however is difficult to account for...

I have a 14-core 38GB M3 Max and run nixpkgs-review with --max-jobs=4 so this one fell inside your approach. (Well, 3.5 cores/build). It drove me a little nuts trying to isolate it.

@pbsds
Copy link
Member

pbsds commented Jul 9, 2025

oof, then the test should indeed be disabled. I also frequently open PRs for failures i only see once in nixpkgs-review or on ofborg as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.severity: blocker This is preventing another PR or issue from being completed 6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants