Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Check for "active" status #233

Merged
merged 9 commits into from
Jun 4, 2020
Merged

Conversation

wadells
Copy link
Contributor

@wadells wadells commented May 28, 2020

Description

This change introduces more rigorous status checking in all suite tests cases. Specifically, we go from state != degraded (30m timeout) to state = active && system_status = Running (5m timeout).

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Internal change (not necessarily a bug fix or a new feature)

I mention breaking because the 30m -> 5m timeout may break some of the tests cases we don't run presently. It should be safe for all currently run test cases, because they wait for operations to complete before querying status.

Linked tickets and other PRs

Fixes #225.
Heavily rewrites the code introduced in #136.

TODOs

  • Self-review the change
  • Write documentation
  • Address review feedback
  • Write automated tests

Manual testing

  • install testing
  • upgrade testing
  • shrink testing
  • resize/expand testing

Testing done

Added unit tests for status validations.

Logs from end-to-end runs:

Copy link
Contributor Author

@wadells wadells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor issues in need of followup tickets or cleanup (e.g. when the old .Status() is replaced with the new .Status() which no longer loops.

infra/gravity/cluster_install.go Outdated Show resolved Hide resolved
infra/gravity/cluster_install.go Show resolved Hide resolved
lib/constants/constants.go Outdated Show resolved Hide resolved
suite/sanity/autoscale.go Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/gravity_test.go Outdated Show resolved Hide resolved
infra/gravity/gravity_test.go Show resolved Hide resolved
infra/gravity/testdata/status-degraded-1521.json Outdated Show resolved Hide resolved
suite/sanity/resize.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/node_commands.go Outdated Show resolved Hide resolved
lib/constants/constants.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@wadells wadells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed all comments in a string of 'squash-*' commits I'll squash back into the original history before merging/rebasing. I've left them separate for now to aid-re-review.

To review just the new changes, see:

https://github.com/gravitational/robotest/pull/233/files/d4ad793b0409fe0d889ba1074ce15044ee08503c..67c8aeb172d284a8c0439f489c48ae32c4ff1581

infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/gravity_test.go Outdated Show resolved Hide resolved
infra/gravity/gravity_test.go Show resolved Hide resolved
infra/gravity/node_commands.go Outdated Show resolved Hide resolved
@wadells wadells force-pushed the active-status branch 2 times, most recently from ee5a6bc to 67c8aeb Compare June 3, 2020 21:47
infra/gravity/cluster_status.go Show resolved Hide resolved
infra/gravity/cluster_status.go Outdated Show resolved Hide resolved
infra/gravity/node_commands.go Show resolved Hide resolved
@wadells wadells requested a review from a-palchikov June 4, 2020 17:34
@wadells
Copy link
Contributor Author

wadells commented Jun 4, 2020

Testing updated with an install that includes 0809da1.

These do not need the lenient 30 minute status timeout. Splitting them
out allows independent tuning and opens the status timeout up for
refactoring without breaking these code paths.
Previously the code shared one context for two separate ssh command
invocations.  Using independent contexts gives more control as to how
long each command should take.
Previously, these operations all shared the 30 minute status timeout
which is far too much time (especially when we want to fail fast during
debugging). These new independent timeouts are my best estimate as a
overly cautious window for the operations.
Fixes gravitational#225.

This will help robotest catch issues when Gravity is not 'degraded'
but is also not healthy, such as gravitational/gravity#1523.
This allows unit testing of the status validation function as well as
plugging in different status validation functions (e.g. if a test case
wants to expect that a cluster is degraded).

Contributes to gravitational#225.
This is consistent with the language Gravity uses.
During testing I discovered that robotest was proceeding with upgrades
before the cluster was ready.  I captured the status json, added it as a
test case, and updated the check to mark this status as invalid.

Contributes to gravitational#225.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assert more about gravity status than 'not degraded' after install
3 participants