Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Prelude.hs does not always rebuild when Prelude.sawcore is changed #1856

Closed
m-yac opened this issue Apr 13, 2023 · 17 comments
Closed

[CI] Prelude.hs does not always rebuild when Prelude.sawcore is changed #1856

m-yac opened this issue Apr 13, 2023 · 17 comments
Labels
tooling: CI Issues involving CI/CD scripts or processes type: bug Issues reporting bugs or unexpected/unwanted behavior

Comments

@m-yac
Copy link
Contributor

m-yac commented Apr 13, 2023

I noticed while working on #1849 that it appears Prelude.hs does not always rebuild on CI when Prelude.sawcore is changed on the branch relative to master.

Specifically, it appears that every commit which does not itself change Prelude.sawcore or Prelude.hs uses a cached version of Prelude.sawcore from master – even when the branch as a whole relative to master does change Prelude.sawcore.

Here's the timeline of commits and CI passes/failures that makes me suspect this:

  1. Commit 6d81c44 changes Prelude.sawcore and mr-solver-tests such that mr-solver-tests will fail if the version of Prelude.sawcore from master is used
  2. Commit c0f364c does not change Prelude.sawcore, but also does not change mr-solver-tests in any way that should make it fail – however mr-solver-tests fails! The failure seems to indicate that the saw executable used for mr-solver-tests has the version of Prelude.sawcore from master (test_fun0 was imported from Prelude).
  3. Commit e99d03d changes Prelude.sawcore in a way which should have no impact on mr-solver-tests, but now mr-solver-tests passes again!
  4. Commit b25fb57 (in a similar way to the one before last) does not change Prelude.sawcore or mr-solver-tests, but now mr-solver-tests fails – in the same way which indicates that the Prelude.sawcore used is the one from master.
  5. I also tried making a commit which just makes a whitespace change to Prelude.hs, and the result was that mr-solver-tests passes (unfortunately I deleted this commit by force-pushing in an attempt to clean up the branch, but you can still see the CI run).

As I wrote all this out I realized that based on my hypothesis, the force-push cleanup I did will not actually work because I didn't change Prelude.sawcore or Prelude.hs in my last commit – so we'll see what happens.

Tagging @RyanGlScott and @kquick because I think maybe you guys are the CI guys? If not, let me know who I should ping!

@m-yac m-yac added type: bug Issues reporting bugs or unexpected/unwanted behavior tooling: CI Issues involving CI/CD scripts or processes labels Apr 13, 2023
@RyanGlScott
Copy link
Contributor

I believe this is an unfortunate limitation of Template Haskell's ability to detect when files need to be recompiled. In particular:

  • Verifier.SAW.Prelude uses Template Haskell to load Prelude.sawcore,
  • CI builds Verifier.SAW.Prelude and caches the result, and
  • If a subsequent commit changes Prelude.sawcore, GHC is likely not to notice the fact that Verifier.SAW.Prelude needs to be rebuilt.

I believe the typical way to work around this issue is by calling Template Haskell's addDependentFile function, which explicitly signals to GHC that the file using Template Haskell needs to be recompiled whenever the dependent file (in this case, Prelude.sawcore) is changed. This is the technique that libraries like file-embed use, and it seems appropriate to mimic file-embed here.

@m-yac
Copy link
Contributor Author

m-yac commented Apr 13, 2023

I did a bit of digging, and it looks like we actually do use (at least a version of) addDependentFile!

Prelude.sawcore gets loaded by defineModuleFromFileWithFns in Prelude.hs, which calls defineModuleFromFile, which calls addDep, which finally calls qAddDependentFile! (Assuming the TH version is less than 2.7.0.)

This makes sense, because on my local machine, cabal does seem to always know to rebuild Prelude.hs when I change Prelude.sawcore – which seems to be the behavior you're talking about. On the other hand, it seems like the problem here is that when Prelude.sawcore doesn't change the CI uses an outdated version of the resulting Prelude.hs (i.e. one from master, not from the last commit on the branch).

@kquick
Copy link
Member

kquick commented Apr 13, 2023

My theory is that the main build is building saw and uploading it (https://github.com/GaloisInc/saw-script/blob/master/.github/workflows/ci.yml#L212), then the mr-solver-tests job is downloading that same artifact (https://github.com/GaloisInc/saw-script/blob/master/.github/workflows/ci.yml#L245). If there were local changes, then saw gets rebuilt, overwriting that artifact and you get the good tests, but if there were no local changes, the artifact version runs. Since the artifact name does not include the branch, you probably get the one uploaded from main (but who knows?).

To test this, I'd change the name (line 216, line 247, line 281, line 348, line 408, line 567, and anywhere else I missed) to include the branch (e.g. name: "${{ runner.os }}-${{ github.ref_name }}-bins").

If the above fixes it for you, I think we definitely need it because it could be affecting other things here as well.

@RyanGlScott
Copy link
Contributor

Unfortunately, these errors are now occurring on the master branch (example).

@m-yac
Copy link
Contributor Author

m-yac commented Apr 19, 2023

Crap. Well maybe we ought to push a hotfix which changes the name of test_fun0 in mr_solver_tests so it doesn't conflict with the test_fun0 from the old version of Prelude.sawcore?

Of course there's still the deeper problem of why saw executable used during mr_solver_tests test is using the old version of Prelude.sawcore which defines a test_fun0. Now that this error is happening on master I really have no clue what could be happening...

@bboston7
Copy link
Contributor

Just wanted to add as a datapoint that I've seen this issue locally of cabal not rebuilding the prelude when only Prelude.sawcore changes, so it's not purely a CI caching issue.

@bboston7
Copy link
Contributor

A few other things I found:

@bboston7
Copy link
Contributor

It's not immediately clear to me which Cabal version includes the fix for this.

Looks like the fix went out in Cabal 3.10.1.0. I use Cabal 3.8 locally, so that could be the reason?

@RyanGlScott
Copy link
Contributor

Interesting, I hadn't considered the possibility of this happening due to a cabal bug. We are currently using 3.8.1.0 in CI:

cabal: ["3.8.1.0"]

It might be worth bumping that to see if that makes a difference.

@m-yac
Copy link
Contributor Author

m-yac commented Apr 19, 2023

Very interesting @bboston7! Somehow I get away with using cabal 3.4.0 – so maybe I'm on an old enough version that these features still work for me locally?

@m-yac
Copy link
Contributor Author

m-yac commented Apr 19, 2023

From the longstanding issue Brett linked, it looks like others have fixed this issue in their CIs by downgrading from cabal 3.8 to 3.6:
Screen Shot 2023-04-19 at 15 36 24

@RyanGlScott
Copy link
Contributor

Cruelly, there is a different bug in cabal-3.6 that imperils our use of cabal list-bin in SAW's CI, which is one of the reasons that I chose cabal-3.8.1.0. My hope is that we can use cabal-3.10.1.0 instead to avoid this bug (if it is indeed affecting us), as downgrading to cabal-3.6 would make things more complicated for us.

@bboston7
Copy link
Contributor

Updating to cabal-3.10.1.0 seems to have fixed the problem locally for me! It's worth a shot on the CI system.

@m-yac
Copy link
Contributor Author

m-yac commented Apr 19, 2023

Testing in #1861!

@kquick
Copy link
Member

kquick commented Apr 19, 2023

That's unfortunate news about cabal, but I'm also worried that maybe we have two problems here: cabal issues and also my hypothesis about the archived binary we are using for these tests, so we should check into both.

@RyanGlScott
Copy link
Contributor

After merging #1861, the CI for the master branch is now restored. I suppose it's still possible that there is some misconfiguration in our CI's use of {upload,download}-action, but the immediate issue appears to have been resolved. Would you be OK with closing this, @kquick?

@RyanGlScott
Copy link
Contributor

After a discussion with @kquick, we've agreed that the primary issue (Prelude.sawcore modifications not triggering a rebuild) has been addressed, and as such, this issue can be closed.

@kquick is separately looking into whether the lack of branch information on the download-artifact could cause a job on a pull request to accidentally grab artifacts from a separate branch, and if this is the case, he will open a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tooling: CI Issues involving CI/CD scripts or processes type: bug Issues reporting bugs or unexpected/unwanted behavior
Projects
None yet
Development

No branches or pull requests

4 participants