ci: ensure bootstrap node is ready before starting other nodes#7321
ci: ensure bootstrap node is ready before starting other nodes#7321steviez merged 1 commit intoanza-xyz:masterfrom
Conversation
| # 1 == bootstrap validator, wait until it boots before starting | ||
| # other validators | ||
| # wait for bootstrap validator to boot before starting other validators | ||
| if [[ "$i" -eq 1 ]]; then |
There was a problem hiding this comment.
Maybe move this whole if block above the loop?
There was a problem hiding this comment.
would you mind explaining that in a bit more detail?
There was a problem hiding this comment.
I'm not sure I follow either - the loop initializes all validators; we just need to do different stuff for validators > 0
There was a problem hiding this comment.
I don't want to speak for what Alex was thinking, but if he means start the bootstrap validator first, separately and then start the rest, I agree with him.
There was a problem hiding this comment.
I also don't want to bikeshed this while I'm seeing failures on 50% of my runs though. And would prefer we got something that worked in first.
There was a problem hiding this comment.
yes i have meant exactly what Rory mentioned - that we set up bootstrap first to avoid specialcasing based on index. but it is good to have CI fixed thank you!
|
|
||
| # 1 == bootstrap validator, wait until it boots before starting | ||
| # other validators | ||
| # wait for bootstrap validator to boot before starting other validators |
There was a problem hiding this comment.
nit: Maybe explicitly mention that bootstrap == index 0.
Also, this might have "worked" at one point. Passing initCompleteFile to waitForNodToInit is tricky because the value (if what integer is attached to the log file) is getting updated in the loop
Maybe we add a new variable to make the intent a little more clear. Something like:
if [[ "$i" -eq 1 ]]; then
declare bootstrapInitCompleteFile="init-complete-node0.log"
waitForNodeToInit "$bootstrapInitCompleteFile"
...| # 1 == bootstrap validator, wait until it boots before starting | ||
| # other validators | ||
| # wait for bootstrap validator to boot before starting other validators | ||
| if [[ "$i" -eq 1 ]]; then |
There was a problem hiding this comment.
I'm not sure I follow either - the loop initializes all validators; we just need to do different stuff for validators > 0
|
Added @roryharr for review as he was looking at this test in depth yesterday |
steviez
left a comment
There was a problem hiding this comment.
Given how people have been running into this, I'm fine to merge it as-is
|
I'm going to merge this as it is probably pretty late for @yihau |
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
(cherry picked from commit e0b720b)
We are seeing flaky tests that hopefully this will help resolve.
Problem
context: https://discord.com/channels/428295358100013066/560503042458517505/1402093272872128613
localnet test is flaky
Summary of Changes
in the
startNodesfunction, we have a comment that makes sense, but the actual logic doesn’t quite align with it.let’s say we have three nodes: the bootstrap node, node 1, and node 2. the current logic starts the nodes in order: first the bootstrap node (index 0), then node 1 (index 1), and so on. however, when starting node 1, the code waits for its own init-complete file instead of checking if the bootstrap node (node 0) is ready. I’ve moved the init-complete check to the top of the loop. this way, when we start node 1, we’ll correctly wait for node 0’s init-complete log (the bootstrap node) before proceeding