Node failures 1 by paigerube14 · Pull Request #18 · krkn-chaos/krkn

paigerube14 · 2020-06-11T00:54:52Z

This covers the stop kubelet and node crash scenarios Mike mentioned in issue: #8

I based a lot of my more general set up of the node kill scenario yaml file from the below and this could be combined at some point https://github.com/openshift-scale/kraken/pull/10/files

These 2 scenarios are not cloud specific.
This is a first pass of stopping kubelet. I was hoping to add the ability to stop, wait, and then restart of kubelet but I was not able to get it to work properly.
I also had to find a separate command then the one Mike mentioned for the fork bomb scenario.

rht-perf-ci · 2020-06-11T01:51:02Z

Can one of the admins verify this patch?

mffiedler · 2020-06-11T13:16:45Z

@paigerube14 please check the travis-ci failure - looks like a style issue: https://travis-ci.com/github/openshift-scale/kraken/builds/170789518

yashashreesuresh · 2020-06-11T15:58:53Z

Hey @paigerube14, I had worked on stop kubelet and node crash scenarios as I was assigned the node chaos scenarios issue #8 . However, I was not able to update the same in my PR as I got sidetracked with the issues in cerberus.

paigerube14 · 2020-06-11T16:27:13Z

Hey @yashashreesuresh, ah sorry about that. I had seen your pull request that covered the first couple of scenarios and figured you had not gotten to the other 2 yet so was going to try to help out. What do you want to do here? I would love to see what you were working on here and I can add in some things you had worked on to get this pull request merged for these 2 scenarios. Or I can just drop this pull request and you can fully cover the issue in your pull request. @mffiedler @chaitanyaenr Would love some input here.

mffiedler · 2020-06-11T16:28:51Z

@chaitanyaenr @yashashreesuresh Thoughts on moving forward? Keep working in this PR? or move the work over to #10?

(we should also assign issues to avoid a scenario like this - feel free to self-assign when you start work)

chaitanyaenr · 2020-06-11T16:56:45Z

Agreed that we should assign the issues to ourselves to avoid any confusions, I totally forgot to do that, sorry my bad.

Think both the PR's are valid and we can keep working on them in parallel since they cover different node scenarios. #10 covers node reboot,stop and terminate scenario while this PR covers stopping a kubelet and fork bombing a node. We might want to sync up on the code structure/design for consuming node scenarios before moving forward though to make sure they are in sync.

Thoughts?

paigerube14 · 2020-06-11T17:01:33Z

Definitely agree that we need to try to sync up code for all node scenarios. It sounded like the pull requests cover different scenarios but Yashashree might have already been working on the 2 I covered as well just not in the pull request.

chaitanyaenr · 2020-06-11T17:10:05Z

Sorry I missed that, we can move the work to #10 then if it's okay with everyone.

yashashreesuresh · 2020-06-12T14:54:48Z

Hey @yashashreesuresh, ah sorry about that. I had seen your pull request that covered the first couple of scenarios and figured you had not gotten to the other 2 yet so was going to try to help out. What do you want to do here? I would love to see what you were working on here and I can add in some things you had worked on to get this pull request merged for these 2 scenarios. Or I can just drop this pull request and you can fully cover the issue in your pull request. @mffiedler @chaitanyaenr Would love some input here.

I have been working on aws node scenarios. I have used boto to start/stop/restart/terminate the nodes. In all my node scenarios I execute a scenario and perform the check to see if the node is back healthy. For the kubelet scenario, I have used the restart module of boto to restart the kubelet after it is stopped. Therefore, the entire scenario to stop and restart the kubelet becomes cloud specific. However, if you could add the functions kubelet_action and crash_node under same directory structure kraken/node_actions/common_node_functions.py, I can use them in my functions which include restart and checking if the node is back healthy. This way, both of us can contribute to the kubelet and node crash scenarios. :)
There’s no option for me to add the label on the PR indicating WIP or assign the issue to myself, else I would have added the label which would have avoided this confusion.

mffiedler · 2020-11-13T02:02:46Z

@paigerube14 @chaitanyaenr I think we can close this in favor of the current AWS node scenario support. Agree? We can continue Azure and GCP support using #40 and #41 . If no objections I will close this out.

paigerube14 · 2020-11-13T15:31:45Z

@paigerube14 @chaitanyaenr I think we can close this in favor of the current AWS node scenario support. Agree? We can continue Azure and GCP support using #40 and #41 . If no objections I will close this out.

Yes I agree that this can be closed. This was partially included in Yashashree's PR

chaitanyaenr · 2020-11-13T15:41:39Z

+1, I agree that this can be closed as well.

paigerube14 added 14 commits June 8, 2020 13:02

Adding start to node secenarios

6243ffc

Some updates to node actions

04b3b09

Adding start to node secenarios

d871d97

Some updates to node actions

c556cd7

Adding merge and prints

f994ed3

Fixing merge issues

4aa101b

Adding scenario and code updates

cf42f2d

Adding file to start and stop kubelet

97dd814

adding updtes to node crash

cb50cba

fork bomb additions

8eaef42

taking out start stop sh

b436d9e

Cleaning up files

d716ca7

Adding defaults

d2a1d95

renaming yaml and deleting second example

a73c8a9

Fix tox spacing

a8d9103

mffiedler mentioned this pull request Jun 18, 2020

Added node chaos scenarios #10

Merged

paigerube14 closed this Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node failures 1#18

Node failures 1#18
paigerube14 wants to merge 15 commits intokrkn-chaos:masterfrom
paigerube14:node_failures_1

paigerube14 commented Jun 11, 2020

Uh oh!

rht-perf-ci commented Jun 11, 2020

Uh oh!

mffiedler commented Jun 11, 2020

Uh oh!

yashashreesuresh commented Jun 11, 2020

Uh oh!

paigerube14 commented Jun 11, 2020

Uh oh!

mffiedler commented Jun 11, 2020 •

edited

Loading

Uh oh!

chaitanyaenr commented Jun 11, 2020

Uh oh!

paigerube14 commented Jun 11, 2020

Uh oh!

chaitanyaenr commented Jun 11, 2020

Uh oh!

yashashreesuresh commented Jun 12, 2020 •

edited

Loading

Uh oh!

mffiedler commented Nov 13, 2020

Uh oh!

paigerube14 commented Nov 13, 2020

Uh oh!

chaitanyaenr commented Nov 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

paigerube14 commented Jun 11, 2020

Uh oh!

rht-perf-ci commented Jun 11, 2020

Uh oh!

mffiedler commented Jun 11, 2020

Uh oh!

yashashreesuresh commented Jun 11, 2020

Uh oh!

paigerube14 commented Jun 11, 2020

Uh oh!

mffiedler commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaitanyaenr commented Jun 11, 2020

Uh oh!

paigerube14 commented Jun 11, 2020

Uh oh!

chaitanyaenr commented Jun 11, 2020

Uh oh!

yashashreesuresh commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mffiedler commented Nov 13, 2020

Uh oh!

paigerube14 commented Nov 13, 2020

Uh oh!

chaitanyaenr commented Nov 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mffiedler commented Jun 11, 2020 •

edited

Loading

yashashreesuresh commented Jun 12, 2020 •

edited

Loading