This repository was archived by the owner on Jan 8, 2024. It is now read-only.
serverinstall/nomad: Spend more time looking at waypoint-runner allocation on install to ensure start up #2698
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prior to this commit, the server install command would look for the very
first instance for when an allocation was considered running. This is
generally ok, however if the Nomad job fails to get started a moment
later (like a static runner failing to connect back to Waypoint Server),
the runner allocation would exit later but the server install command
would consider the installation successful.
This is especially important when users are configuring Consul DNS
with Waypoint Server installed to Nomad. The static runner might
fail to properly make a connection through the Consul DNS hostname
and fail, leaving the installation without a static runner but the CLI claiming
the install succeeded.
This commit fixes that by doing a few retries for a few seconds on the scheduled
allocation once its in a "running" state to validate it properly started
up beyond the first few moments of the job.
Fixes #2683