Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node drain test not starting due to unable to get chaos resources (ChaosExperiment.litmuschaos.io "node-drain" not found) #2022

Closed
sysarch-repo opened this issue May 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@sysarch-repo
Copy link

sysarch-repo commented May 11, 2024

Node drain test not starting due to chaos resources not found

Steps to reproduce
Steps to reproduce the behavior:
$ cnf-testsuite version
CNF TestSuite version: v1.2.0

$ cnf-testsuite node_drain
🎬 Testing: [node_drain]
< not progressing>

$ kubectl get nodes

NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-116-213.ec2.internal   Ready                      <none>   39m   v1.28.8-eks-ae9a62a
ip-10-0-76-157.ec2.internal    Ready,SchedulingDisabled   <none>   39m   v1.28.8-eks-ae9a62a

$ kubectl describe chaosengine -n cnti dns-dserver-1c6b0d96

Name:         dns-dserver-1c6b0d96
Namespace:    cnti
Labels:       <none>
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosEngine
Metadata:
  Creation Timestamp:  2024-05-11T19:52:17Z
  Finalizers:
    chaosengine.litmuschaos.io/finalizer
  Generation:        2
  Resource Version:  4544
  UID:               e4bd93c5-290f-48f2-ae0f-d968f55b0590
Spec:
  Appinfo:
    Appkind:              deployment
    Applabel:             app.nti/pod-group=dns-dserver
    Appns:                cnti
  Chaos Service Account:  node-drain-sa
  Components:
    Runner:
      Resources:
  Engine State:  active
  Experiments:
    Name:  node-drain
    Spec:
      Components:
        Env:
          Name:   TOTAL_CHAOS_DURATION
          Value:  90
          Name:   TARGET_NODE
          Value:  ip-10-0-76-157.ec2.internal
        Resources:
        Status Check Timeouts:
  Job Clean Up Policy:  delete
Status:
  Engine Status:  initialized
  Experiments:    <nil>
Events:
  Type     Reason                         Age                  From            Message
  ----     ------                         ----                 ----            -------
  Normal   ChaosEngineInitialized         23m                  chaos-operator  Identifying app under test & launching dns-dserver-1c6b0d96-runner
  Warning  ChaosResourcesOperationFailed  116s (x19 over 23m)  chaos-operator  (chaos start) Unable to get chaos resources

Chaos operator logs:

2024-05-11T20:03:22.491Z	ERROR	controller.chaosengine	Reconciler error	{"reconciler group": "litmuschaos.io", "reconciler kind": "ChaosEngine", "name": "dns-dserver-1c6b0d96", "namespace": "cnti", "error": "ChaosExperiment.litmuschaos.io \"node-drain\" not found"}

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

2024-05-11T20:03:22.491Z	DEBUG	events	Warning	{"object": {"kind":"ChaosEngine","namespace":"cnti","name":"dns-dserver-1c6b0d96","uid":"e4bd93c5-290f-48f2-ae0f-d968f55b0590","apiVersion":"litmuschaos.io/v1alpha1","resourceVersion":"4544"}, "reason": "ChaosResourcesOperationFailed", "message": "(chaos start) Unable to get chaos resources"}

Expected behavior
The expectation is that the AUT runner is started and the test is executed.
In cases like this (broken external link), the testsuite shall not run an endless loop and terminate with error instead.
Release tests shall be enhanced to maintain high quality of the releases software.

Device (please complete the following information):

$ uname -a
Linux ip-10-0-33-96 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux


NOTE: you can enable higher logging level output via the command line or env var. to help with debugging

# cmd line
./cnf-testsuite -l debug test
...

I, [2024-05-11 20:52:08 +00:00 #8052]  INFO -- cnf-testsuite: Cordoned node ip-10-0-76-157.ec2.internal successfully.

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: Workload Node Name: ip-10-0-76-157.ec2.internal

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: Litmus Node Name: ip-10-0-116-213.ec2.internal

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: download_template url, filename: https://raw.githubusercontent.com/litmuschaos/chaos-charts/3.6.0/charts/generic/node-drain/experiment.yaml, node_drain_experiment.yaml

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: chaos_manifests_path

I, [2024-05-11 20:52:09 +00:00 #8052]  INFO -- cnf-testsuite: filepath: /home/ubuntu/.cnf-testsuite/tools/chaos-experiments/node_drain_experiment.yaml

$ cat /home/ubuntu/.cnf-testsuite/tools/chaos-experiments/node_drain_experiment.yaml
404: Not Found --> The URL https://raw.githubusercontent.com/litmuschaos/chaos-charts/3.6.0/charts/generic/node-drain/experiment.yaml does not exit

@sysarch-repo sysarch-repo added the bug Something isn't working label May 11, 2024
@kosstennbl
Copy link
Collaborator

It seems that during this commit, node drain was forgotten.
I'll prepare a quick PR

kosstennbl pushed a commit to kosstennbl/cnf-testsuite that referenced this issue May 13, 2024
In 69dedb3, Litmus version was updated
And all links for experiments were changed accordingly except for the node_drain.
This commit fixes that.

ref: cnti-testcatalog#2022
Signed-off-by: Konstantin Yarovoy <[email protected]>
@martin-mat
Copy link
Collaborator

@kosstennbl @HashNuke @agentpoyo any idea why the issue was not detected in spec tests during github actions?

@lixuna
Copy link
Collaborator

lixuna commented May 14, 2024

@martin-mat please create a new bug issue for the node drain spec test

@daniel-wilmes
Copy link

Results of running with compiled branch:

14:31:27.784 [pool-73-thread-1] INFO com.matrixx.konstruxx.tools.cnftestsuite.CnfTestsuiteResultParser - Test Score: 100
14:31:27.787 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:16:23.603-0400 (15m) Executed Command #1 in testCHF: CNF TestSuite
Creating cnf-testsuite.yml
Getting chart directory
Running cnf-testsuite setup
Running cnf-testsuite cnf_setup
Running cnf-testsuite node_drain
Parsing Results
Score Breakdown - default
Test Name,Received Points,Max Points,Status,Category
node_drain,100,100,passed,essential

Score Summary - default
Summary - default

Total Essential Tests Passed: 1
Total Essential Tests Failed: 0
Percentage for Essential: 100.0

Total Bonus Tests Passed: 0
Total Bonus Tests Failed: 0

Total Normal Tests Passed: 0
Total Normal Tests Failed: 0

14:31:27.787 [pool-73-thread-1] INFO com.matrixx.konstruxx.Konstruxx - Performing Captures...
14:31:27.832 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:31:27.787-0400 (45ms) Completed Capture for testCHF in default
14:31:28.050 [pool-73-thread-1] INFO com.matrixx.konstruxx.Context - EVENT: SUCCESS 2024-05-14T14:31:27.832-0400 (218ms) Completed Capture for testCHF in cnf-testsuite
capturing info for cluster-tools-dhg57
capturing info for cluster-tools-whbhz

14:31:28.050 [pool-73-thread-1] INFO com.matrixx.konstruxx.Konstruxx - Blueprint created

taylor pushed a commit that referenced this issue May 16, 2024
In 69dedb3, Litmus version was updated
And all links for experiments were changed accordingly except for the node_drain.
This commit fixes that.

ref: #2022

Signed-off-by: Konstantin Yarovoy <[email protected]>
taylor added a commit that referenced this issue May 16, 2024
In 69dedb3, Litmus version was updated
And all links for experiments were changed accordingly except for the node_drain.
This commit fixes that.

ref: #2022

Signed-off-by: Konstantin Yarovoy <[email protected]>
Co-authored-by: Konstantin Yarovoy <[email protected]>
Co-authored-by: Konstantin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants