Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
|
Can you add docs on how this works with op-conductor? from the description it seems like each sequencer in the HA setup has 1 builder. how would op-conductor behave if every builder was down and not healthy? |
Yeah this seems to be the typical setup. A 1 to many builder-sequencer relationship is undefined behaviour according to OP. So you need 1:1 sequencer builder.
Currently op conductor doesn't have any logic for this. Once we have the appropriate probes here we can try to get op-conductor support.
If all 3 are not healthy then we're in a pretty bad state. I suppose a random instance would be chosen to be the primary sequencer in that case. |
Co-authored-by: shana <avalonche@protonmail.com>
|
@avalonche @0xOsiris I've added some docs. I'm second guessing my use of the
What do you think? |
avalonche
left a comment
There was a problem hiding this comment.
lgtm, only thing is that I would add in the docs /healthz returns 206 and 503 after the builder / l2 fails to produce a block only once and that endpoint will still return 200 if the builder is up but the local l2 is not
|
FYI the op-conductor change ethereum-optimism/optimism#15316 depends on this PR now. |
This should probably be debounced locally if this is expected to happen frequently. |
|
Have been integration testing this with our conductor rollup-boost monitoring PR It seems the /healthz response is sticky which I dont think will work great for conductor, for ex:
Similarly, you can kill r-builder on a non-active sequencer and conductor will report is as healthy. Is there any way we can move the health probe to background so we get async rbuilder health updates? |
Yeah this makes sense - this is because the health probe is only updated during a |
…for non-sequencing el's
Osiris/background health check
|
@zhwrd @teddyknox Thanks for the callout on the sticky health status. I've added an additional background health check to the rollup-boost server that continuously monitors unsafe head progression of the builder which should functionally work the same as the health check op-conductor is performing to ensure the unsafe head is progressing within This should resolve the issue of the health status not being updated on non-sequencing EL's. In the sequencing case we have now have 2 health checks running in parallel.
|
* wip * wip * wip * wip * clean things up * fix for cloned service * cleanup process_response * eyre bail * remove unnecessary deps * Add kubernetes probe layer * implement health/ready check logic * modify ready logic * fix comment/feature * delete old file * working * Update src/client/http.rs Co-authored-by: shana <avalonche@protonmail.com> * parse response cod * clippy fix * Probe docs * Switch to returning health status only from /healthz using http status codes * Update docs to describe health status codes * remove stray comments * cleanup, add tests * remove stray comment * chore: rm mocks * chore: fmt * fix: default to healthy status * chore: fix dockerignore * feat: add background process to query block height as a health check for non-sequencing el's * fix: signatures * chore: update comments * chore: clippy * test: add tests * fix: stress tests * fix: change health check to check unsafe head progression on builder * chore: update doc comments * fix: loop * chore: update comments * merge main --------- Co-authored-by: shana <avalonche@protonmail.com> Co-authored-by: 0xOsiris <djosiris@proton.me>
* wip * wip * wip * wip * clean things up * fix for cloned service * cleanup process_response * eyre bail * remove unnecessary deps * Add kubernetes probe layer * implement health/ready check logic * modify ready logic * fix comment/feature * delete old file * working * Update src/client/http.rs Co-authored-by: shana <avalonche@protonmail.com> * parse response cod * clippy fix * Probe docs * Switch to returning health status only from /healthz using http status codes * Update docs to describe health status codes * remove stray comments * cleanup, add tests * remove stray comment * chore: rm mocks * chore: fmt * fix: default to healthy status * chore: fix dockerignore * feat: add background process to query block height as a health check for non-sequencing el's * fix: signatures * chore: update comments * chore: clippy * test: add tests * fix: stress tests * fix: change health check to check unsafe head progression on builder * chore: update doc comments * fix: loop * chore: update comments * merge main --------- Co-authored-by: shana <avalonche@protonmail.com> Co-authored-by: 0xOsiris <djosiris@proton.me>
This PR is dependent on #141
You can view the diff from that PR here
Health, readiness, liveness checks layer added.
rollup-boost.