Skip to content

Conversation

@btaani
Copy link
Contributor

@btaani btaani commented Oct 16, 2025

What this PR does / why we need it:
Set the default value of the ingester's PodDisruptionBudget's minimum available pods to the replication factor

Which issue(s) this PR fixes:
Fixes LOG-6715

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ btaani
❌ Bayan Taani


Bayan Taani seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@btaani btaani changed the title operator: set ingester minimum available pods based on replication factor fix(operator): set ingester minimum available pods based on replication factor Oct 20, 2025
}
}

func GetPDBMinAvailable(opts Options) (error, intstr.IntOrString) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could move this function inside newIngesterPodDisruptionBudget.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 21, 2025
Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to:

  • squash the commits to comply with the CLA signature
  • make sure to make tests before pushing some tests were failing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to run make web-pre in order to generate the new documentation for this API change

ErrSummaryAnnotationMissing = errors.New("rule requires annotation: summary")
// ErrDescriptionAnnotationMissing indicates that an alerting rule is missing the description annotation
ErrDescriptionAnnotationMissing = errors.New("rule requires annotation: description")
// ErrReplicationFactorTooHigh indicates that the lokistack's replication factor must always be less than the number of ingester replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ErrReplicationFactorTooHigh indicates that the lokistack's replication factor must always be less than the number of ingester replicas
// ErrReplicationFactorTooHigh indicates that the configured Loki replication factor must always be less than the number of ingester replicas

ReasonZoneAwareEmptyLabel LokiStackConditionReason = "ReasonZoneAwareEmptyLabel"
// ReasonStorageNeedsSchemaUpdate when the object storage schema version is older than V13
ReasonStorageNeedsSchemaUpdate LokiStackConditionReason = "StorageNeedsSchemaUpdate"
// ReasonInvalidReplicationFactor when the replication factor is equal to or more than the component replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ReasonInvalidReplicationFactor when the replication factor is equal to or more than the component replicas
// ReasonInvalidReplicationFactor when the replication factor is equal to or bigger than the ingrester replicas

objects, err := manifests.BuildAll(opts)
if err != nil {
if errors.Is(err, lokiv1.ErrReplicationFactorTooHigh) {
return nil, &status.DegradedError{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't return the status.DegradedErrror in operator/internal/manifests/ingester.go because of an import cycle, right? I wonder if we should just move the definition of status.DegradedError around to allow package manifests to use it.

cc @xperimental any thoughts?

Comment on lines +299 to 313
pdbMinAvailable := intstr.FromInt32(int32(1))
switch opts.Stack.Size {
case lokiv1.SizeOneXPico, lokiv1.SizeOneXMedium:
// For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 3
if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor {
return nil, lokiv1.ErrReplicationFactorTooHigh
}
pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor)
case lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall:
// For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 2
// Therefore set the pdbMinAvailable to 1 to keep 1 spare pod for rolling updates
if opts.Stack.Template.Ingester.Replicas > opts.Stack.Replication.Factor {
pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor - 1)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current way we would always skip the validation for sizes lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall this new way isolates it to just the default values scenario

Suggested change
pdbMinAvailable := intstr.FromInt32(int32(1))
switch opts.Stack.Size {
case lokiv1.SizeOneXPico, lokiv1.SizeOneXMedium:
// For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 3
if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor {
return nil, lokiv1.ErrReplicationFactorTooHigh
}
pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor)
case lokiv1.SizeOneXExtraSmall, lokiv1.SizeOneXSmall:
// For these sizes, default Replication.Factor = 2 and Ingester.Replicas = 2
// Therefore set the pdbMinAvailable to 1 to keep 1 spare pod for rolling updates
if opts.Stack.Template.Ingester.Replicas > opts.Stack.Replication.Factor {
pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor - 1)
}
}
pdbMinAvailable := intstr.FromInt32(1)
// Because by default both 1x.extra-small and 1x.small have replication
// factor of 2 and ingester replicas of 2, they would always trip the error
// condition so until we reevaluate the default values, we skip the check and
// keep pdbMinAvailable at 1 to keep 1 spare pod for rolling updates.
if (opts.Stack.Size != lokiv1.SizeOneXExtraSmall && opts.Stack.Size != lokiv1.SizeOneXSmall) ||
opts.Stack.Template.Ingester.Replicas != 2 || opts.Stack.Replication.Factor != 2 {
if opts.Stack.Template.Ingester.Replicas <= opts.Stack.Replication.Factor {
return nil, lokiv1.ErrReplicationFactorTooHigh
}
pdbMinAvailable = intstr.FromInt32(opts.Stack.Replication.Factor)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants