Conversation
|
Can one of the admins verify this patch? |
|
@paigerube14 please check the travis-ci failure - looks like a style issue: https://travis-ci.com/github/openshift-scale/kraken/builds/170789518 |
|
Hey @paigerube14, I had worked on stop kubelet and node crash scenarios as I was assigned the node chaos scenarios issue #8 . However, I was not able to update the same in my PR as I got sidetracked with the issues in cerberus. |
|
Hey @yashashreesuresh, ah sorry about that. I had seen your pull request that covered the first couple of scenarios and figured you had not gotten to the other 2 yet so was going to try to help out. What do you want to do here? I would love to see what you were working on here and I can add in some things you had worked on to get this pull request merged for these 2 scenarios. Or I can just drop this pull request and you can fully cover the issue in your pull request. @mffiedler @chaitanyaenr Would love some input here. |
|
@chaitanyaenr @yashashreesuresh Thoughts on moving forward? Keep working in this PR? or move the work over to #10? (we should also assign issues to avoid a scenario like this - feel free to self-assign when you start work) |
|
Agreed that we should assign the issues to ourselves to avoid any confusions, I totally forgot to do that, sorry my bad. Think both the PR's are valid and we can keep working on them in parallel since they cover different node scenarios. #10 covers node reboot,stop and terminate scenario while this PR covers stopping a kubelet and fork bombing a node. We might want to sync up on the code structure/design for consuming node scenarios before moving forward though to make sure they are in sync. Thoughts? |
|
Definitely agree that we need to try to sync up code for all node scenarios. It sounded like the pull requests cover different scenarios but Yashashree might have already been working on the 2 I covered as well just not in the pull request. |
|
Sorry I missed that, we can move the work to #10 then if it's okay with everyone. |
I have been working on aws node scenarios. I have used boto to start/stop/restart/terminate the nodes. In all my node scenarios I execute a scenario and perform the check to see if the node is back healthy. For the kubelet scenario, I have used the restart module of boto to restart the kubelet after it is stopped. Therefore, the entire scenario to stop and restart the kubelet becomes cloud specific. However, if you could add the functions kubelet_action and crash_node under same directory structure kraken/node_actions/common_node_functions.py, I can use them in my functions which include restart and checking if the node is back healthy. This way, both of us can contribute to the kubelet and node crash scenarios. :) |
|
@paigerube14 @chaitanyaenr I think we can close this in favor of the current AWS node scenario support. Agree? We can continue Azure and GCP support using #40 and #41 . If no objections I will close this out. |
Yes I agree that this can be closed. This was partially included in Yashashree's PR |
|
+1, I agree that this can be closed as well. |
This covers the stop kubelet and node crash scenarios Mike mentioned in issue: #8
I based a lot of my more general set up of the node kill scenario yaml file from the below and this could be combined at some point https://github.com/openshift-scale/kraken/pull/10/files
These 2 scenarios are not cloud specific.
This is a first pass of stopping kubelet. I was hoping to add the ability to stop, wait, and then restart of kubelet but I was not able to get it to work properly.
I also had to find a separate command then the one Mike mentioned for the fork bomb scenario.