-
Notifications
You must be signed in to change notification settings - Fork 3.8k
fix(operator): set ingester minimum available pods based on replication factor #19517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bayan Taani seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| } | ||
| } | ||
|
|
||
| func GetPDBMinAvailable(opts Options) (error, intstr.IntOrString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move this function inside newIngesterPodDisruptionBudget.
JoaoBraveCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to:
- squash the commits to comply with the CLA signature
- make sure to
make testsbefore pushing some tests were failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to run make web-pre in order to generate the new documentation for this API change
| ErrSummaryAnnotationMissing = errors.New("rule requires annotation: summary") | ||
| // ErrDescriptionAnnotationMissing indicates that an alerting rule is missing the description annotation | ||
| ErrDescriptionAnnotationMissing = errors.New("rule requires annotation: description") | ||
| // ErrReplicationFactorTooHigh indicates that the lokistack's replication factor must always be less than the number of ingester replicas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // ErrReplicationFactorTooHigh indicates that the lokistack's replication factor must always be less than the number of ingester replicas | |
| // ErrReplicationFactorTooHigh indicates that the configured Loki replication factor must always be less than the number of ingester replicas |
| ReasonZoneAwareEmptyLabel LokiStackConditionReason = "ReasonZoneAwareEmptyLabel" | ||
| // ReasonStorageNeedsSchemaUpdate when the object storage schema version is older than V13 | ||
| ReasonStorageNeedsSchemaUpdate LokiStackConditionReason = "StorageNeedsSchemaUpdate" | ||
| // ReasonInvalidReplicationFactor when the replication factor is equal to or more than the component replicas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // ReasonInvalidReplicationFactor when the replication factor is equal to or more than the component replicas | |
| // ReasonInvalidReplicationFactor when the replication factor is equal to or bigger than the ingrester replicas |
| objects, err := manifests.BuildAll(opts) | ||
| if err != nil { | ||
| if errors.Is(err, lokiv1.ErrReplicationFactorTooHigh) { | ||
| return nil, &status.DegradedError{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't return the status.DegradedErrror in operator/internal/manifests/ingester.go because of an import cycle, right? I wonder if we should just move the definition of status.DegradedError around to allow package manifests to use it.
cc @xperimental any thoughts?
| pdbMinAvailable := intstr.FromInt32(int32(1)) | ||
| switch opts.Stack.Size { | ||
| case lokiv1.SizeOneXPico, lokiv1.SizeOneXMedium: | ||
| // For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 3 | ||
| if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor { | ||
| return nil, lokiv1.ErrReplicationFactorTooHigh | ||
| } | ||
| pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor) | ||
| case lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall: | ||
| // For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 2 | ||
| // Therefore set the pdbMinAvailable to 1 to keep 1 spare pod for rolling updates | ||
| if opts.Stack.Template.Ingester.Replicas > opts.Stack.Replication.Factor { | ||
| pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor - 1) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current way we would always skip the validation for sizes lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall this new way isolates it to just the default values scenario
| pdbMinAvailable := intstr.FromInt32(int32(1)) | |
| switch opts.Stack.Size { | |
| case lokiv1.SizeOneXPico, lokiv1.SizeOneXMedium: | |
| // For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 3 | |
| if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor { | |
| return nil, lokiv1.ErrReplicationFactorTooHigh | |
| } | |
| pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor) | |
| case lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall: | |
| // For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 2 | |
| // Therefore set the pdbMinAvailable to 1 to keep 1 spare pod for rolling updates | |
| if opts.Stack.Template.Ingester.Replicas > opts.Stack.Replication.Factor { | |
| pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor - 1) | |
| } | |
| } | |
| pdbMinAvailable := intstr.FromInt32(1) | |
| // Because by default both 1x.extra-small and 1x.small have replication | |
| // factor of 2 and ingester replicas of 2, they would always trip the error | |
| // condition so until we reevaluate the default values, we skip the check and | |
| // keep pdbMinAvailable at 1 to keep 1 spare pod for rolling updates. | |
| if (opts.Stack.Size != lokiv1.SizeOneXExtraSmall && opts.Stack.Size != lokiv1.SizeOneXSmall) || | |
| opts.Stack.Template.Ingester.Replicas != 2 || opts.Stack.Replication.Factor != 2 { | |
| if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor { | |
| return nil, lokiv1.ErrReplicationFactorTooHigh | |
| } | |
| pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor) | |
| } |
What this PR does / why we need it:
Set the default value of the ingester's PodDisruptionBudget's minimum available pods to the replication factor
Which issue(s) this PR fixes:
Fixes LOG-6715
Special notes for your reviewer:
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR