Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-drain: Fix link to experiment #2024

Merged
merged 2 commits into from
May 16, 2024

Conversation

kosstennbl
Copy link
Collaborator

In 69dedb3, Litmus version was updated And all links for experiments were changed accordingly except for the node_drain. This commit fixes that.

ref: #2022

In 69dedb3, Litmus version was updated
And all links for experiments were changed accordingly except for the node_drain.
This commit fixes that.

ref: cnti-testcatalog#2022
Signed-off-by: Konstantin Yarovoy <[email protected]>
@taylor
Copy link
Collaborator

taylor commented May 13, 2024

Lgtm

@martin-mat martin-mat self-requested a review May 14, 2024 12:31
@martin-mat
Copy link
Collaborator

lgtm

@taylor taylor self-requested a review May 14, 2024 15:16
Copy link
Collaborator

@taylor taylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec test before this change should fail and the spec should pass after this fix.

If not the spec should be updated to produce these results before this is merged to ensure we have coverage.

@martin-mat
Copy link
Collaborator

martin-mat commented May 15, 2024

there is a good spec test for node_drain
https://github.com/cnti-testcatalog/testsuite/blob/main/spec/workload/resilience/node_drain_spec.cr
which behaves correctly (verified).

The real reason why the issue was not detected earlier during github actions is that they use one-node kind setup for testing. node_drain test needs multi-node setup and the spec tests "passes" because the test is "skipped".
Example: https://github.com/cnti-testcatalog/testsuite/actions/runs/9025631332/job/24802869636

⏭️ 🏆SKIPPED: [node_drain] node_drain chaos test requires the cluster to have atleast two schedulable nodes 🗡️💀♻

The spec tests is happy with such skipping:

      if KubectlClient::Get.schedulable_nodes_list.size > 1
        (/(PASSED).*(node_drain chaos test passed)/ =~ result[:output]).should_not be_nil
      else
        (/(SKIPPED).*(node_drain chaos test requires the cluster to have atleast two)/ =~ result[:output]).should_not be_nil
      end

So I propose to adapt github actions so they run on kind with 2 schedulable nodes
Since it is more generic adaptation I suggest to handle this in a separate ticket.
#2026

@daniel-wilmes
Copy link

@martin-mat I have verified that the fix for node drain works as a single test and in the cert command. However, I will say the cert command does not ever finish, which appears to be a seperate issue from node drain. It may pertain to either sig_term_handled, zombie_handled, or specialized_init_system. The logging doesn't appear to indicate where we are stuck. But for this ticket I think the fix for node_drain should go in.

`---
name: cnf testsuite
testsuite_version: node-drain-fix-2024-05-15-142132-3258c691
status:
command: /home/dwilmes/.mtx/konstruxx/working/tests/testCHF/cnf-testsuite cert essential
points: 100
exit_code: 0
items:

  • name: increase_decrease_capacity
    status: passed
    type: essential
    points: 100
  • name: volume_hostpath_not_found
    status: passed
    type: essential
    points: 100
  • name: node_drain
    status: passed
    type: essential
    points: 100
  • name: privileged_containers
    status: passed
    type: essential
    points: 100
  • name: non_root_containers
    status: failed
    type: essential
    points: 0
  • name: cpu_limits
    status: passed
    type: essential
    points: 100
  • name: memory_limits
    status: passed
    type: essential
    points: 100
  • name: hostpath_mounts
    status: passed
    type: essential
    points: 100
  • name: container_sock_mounts
    status: passed
    type: essential
    points: 100
  • name: selinux_options
    status: na
    type: essential
    points: 0
  • name: hostport_not_used
    status: passed
    type: essential
    points: 100
  • name: hardcoded_ip_addresses_in_k8s_runtime_configuration
    status: passed
    type: essential
    points: 100
  • name: latest_tag
    status: passed
    type: essential
    points: 100
  • name: log_output
    status: passed
    type: essential
    points: 100
  • name: specialized_init_system
    status: failed
    type: essential
    points: 0
    `

@taylor
Copy link
Collaborator

taylor commented May 16, 2024

@daniel-wilmes please open a new issue for

However, I will say the cert command does not ever finish, which appears to be a seperate issue from node drain. It may pertain to either sig_term_handled, zombie_handled, or specialized_init_system. The logging doesn't appear to indicate where we are stuck.

You can try disabling those with ~testname (eg. ~sig_term_handled) and isolate which test needs to be investigated. Add that info to the new ticket you create.

cc: @Smitholi67

@taylor
Copy link
Collaborator

taylor commented May 16, 2024

@martin-mat good catch on the kind in github actions. #2024 (comment)

Copy link
Collaborator

@taylor taylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm and specs pass

@taylor taylor merged commit 56256e7 into cnti-testcatalog:main May 16, 2024
191 of 264 checks passed
@kosstennbl kosstennbl deleted the node-drain-fix branch May 27, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants