chore(do not merge): Experiment with caching for "Build Doc" Job#2271
chore(do not merge): Experiment with caching for "Build Doc" Job#2271kevaundray wants to merge 7 commits intoethereum:forks/amsterdamfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2271 +/- ##
===================================================
- Coverage 86.00% 85.86% -0.15%
===================================================
Files 600 599 -1
Lines 39357 39390 +33
Branches 3770 3770
===================================================
- Hits 33850 33822 -28
- Misses 4877 4938 +61
Partials 630 630
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
It skips around 70% of pairs and fails after 40 minutes. Barring the actual error, this makes sense since it was previously taking an hour and we shaved off 30% of the work. This gives weight to the claim that the bottleneck is the tree diffing, but need to have better logs to confirm |
|
Seems the diffs rely on https://github.com/SamWilsn/fladrif |
|
Currently it seems to build docs for all forks unconditionally -- I think most people will modify the latest fork(currently amsterdam), so a lot of this is also repeated work that should never fail. I think one reasonable optimization here is to have two settings:
Rationale here is that docs are not released on the PRs anyways, so building everything is not needed |
|
Added the above suggestion, but it won't be noticeable in this PR because any change outside of a fork(like the github action fileS) will by default trigger a full rebuild -- should be runnable locally by setting |
|
Diffs are published here for reference. @SamWilsn Do you know if folks are using this? (Asking because whenever I visit the specs, I usually just look at the code) If so I think we could rewrite it in Rust to be faster, given that this will grow in time as we add more forks. I wrote a proof of concept here https://kevaundray.github.io/execution-specs-viewer/ that compiles in about a minute using Rust. It uses a different stack etc, but largely gives the features I think you originally intended? |
|
So if we just build docs for amsterdam, it takes around 40 minutes, which I think is still pretty high |
Thanks for taking a look at this, @kevaundray. Yes, the docs build is too slow. I got it down to about ~4-5 minutes locally by tweaking the existing build tool, but I never PR'd these changes / followed up with @SamWilsn. Perhaps it's time to do that. My changes didn't change the fork diffset, however, which is a also great idea! |
Ah nice, curious to know what you changed! I guess I'm still curious to know if this is actually used, given the work that needs to be done, grows with the number of hardforks |
Will try and get some PRs up soon! But quite a few cache optimizations and replaced
I think the diff is really nice feature, but I suspect it has a limited number of users, but there was recently this comment:
|
Ah interesting, I didn't get to the parser, amazing! In that issue, it seemed like the user could benefit from just a normal diff between two forks but I could be mistaken as I didn't look carefully at their project. Though I think that if the time can be brought down to 4-5 minutes then my question no longer becomes less consequential since it is no longer the bottleneck |
|
I mean, I use it, especially whenever I'm trying to explain a change to someone else. One of the biggest selling points for creating EELS was the ability to show both snapshot (the source itself) and consensus-specs-style diffs. That said, I don't think https://ethereum.github.io/execution-specs gets much attention. A lot of the fault there lies with me not continuing to improve
I'd love to write this in Rust! Would give an opportunity to fix a lot of the architectural issues in
Holy shit that's impressive! Tree sitter was what I was going to use originally before I found libcst (which actually has a partial implementation in rust). Regardless of where we go, I'd love to get your help on the frontend side of things.
Most of the complexity of Screencast_20260223_143534.webm |
Thanks for the clarification and context! This all makes sense and I think the go-to definition functionality is very useful :)
When you get the cycles, it may be interesting to scope out your vision of it in an issue, so others can maybe come and implement it.
This was what I originally wanted to do! Intuitively, I would think an architecture that allows one to do something like Didn't dive too deep into this, so there might be something I was missing here.
Ah I was using claude 🤣 it's pretty likely that you have more design skill points than me :)
Would leave this upto you and Dan, happy to follow with what you folks think makes sense. My initial investigation was aimed at reducing the CI times since this gets ran on every PR, but it seems Dan may have a way to reduce this down to less than 10 minutes, so its not that much of a concern |
SamWilsn
left a comment
There was a problem hiding this comment.
Code-wise, I think this makes a lot of sense!
What about just disabling diffs when building on a branch? The artifacts don't get uploaded anywhere, and any missing references will get caught by the non-diff build anyway.
I think this would be much better |
|
Closing due to #2296 |
🗒️ Description
Was briefly looking into this to check why it takes so long.
Wrote some unorganized notes below for my own reference. (It could be wrong)
From my understanding, "Build Doc" is creating diffs across each fork so we can see how a file has changed over forks.
If this is correct, then this jobs run time will increase if more forks are added.
I believe it does a pairwise diff, ie if we have the three forks:
Then it will create a diff for:
This is a file by file diff, so if frontier has 38 files and homestead has 38 files, then we would be diffing 38 file pairs.
If we take the Frontier -> Homestead diff as an example. Lets say the code wants to check the diff between
frontier/vm/gas.pyandhomestead/vm/gas.py. It does the following:Hypothesis: The tree diff algorithm is the expensive part. Noting that even if there is no change between Frontier and Homestead, I believe it still runs the algorithm, to then conclude that nothing has changed.
🔗 Related Issues or PRs
N/A.
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture