Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the reconcile process to fix the potential inconsistency between etcd cluster and statefulset #26

Open
ahrtr opened this issue Jan 13, 2025 · 8 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2025

Fix the following TODO item,

// TODO: finish the logic later
if int(targetReplica) != memberCnt {
// TODO: finish the logic later
if int(targetReplica) < memberCnt {
// a new added learner hasn't started yet
// re-generate configuration for the new learner member;
// increase statefulsets's replica by 1
} else {
// an already removed member hasn't stopped yet.
// Decrease the statefulsets's replica by 1
}
// return
}

Please also read #17 (comment)

We also need to add e2e tests to verify it. We can leverage the gofailpoint to intentionally crash the etcd-operator right after updating the etcd cluster but before updating the statefulset.

@ahrtr ahrtr added the help wanted Extra attention is needed label Jan 13, 2025
@ahrtr ahrtr added this to the v0.1.0 milestone Jan 13, 2025
@ahrtr
Copy link
Member Author

ahrtr commented Jan 13, 2025

cc @gdasson

@soma00333
Copy link
Contributor

Hi @ahrtr
Good day
I'm working here #50
Please assign to me

@soma00333
Copy link
Contributor

Hi @ahrtr
I hope you’re doing well.
Regarding the discussion in #19 about removing ginkgo, I have a question:
1. Limit the scope of #26 to etcdcluster_controller.go and create a separate issue for the corresponding e2e tests
2. Wait for #19 to be resolved before proceeding with #26

Which approach would be better?
The etcdcluster_controller.go has already been fixed.
https://github.com/etcd-io/etcd-operator/pull/50/files#diff-7e023adf1c245d768f95e3f174cab882db08efd47023a9ce81e29a3a7767c12a

@abdurrehman107
Copy link

@soma00333 we can have #19 PR by Sunday (26th January) eve.
cc @ahrtr

@soma00333
Copy link
Contributor

@abdurrehman107
I got it. I appreciate your hard work.
Take care, and have a good weekend

@abdurrehman107
Copy link

You too @soma00333

@soma00333
Copy link
Contributor

[note]
After the discussion in #50, we decided to create a new PR to change only internal/controller/etcdcluster_controller.go (To enhance the logic. No gofail this time.)
The PR is here #53

@soma00333
Copy link
Contributor

[note]
#53 was merged.
Our remaining task is to add E2E tests that appropriately trigger crashes utilizing gofail to verify the behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants