Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix topic operator loop for unmanaged topics #10451

Merged
merged 1 commit into from
Aug 31, 2024

Conversation

padilo
Copy link
Contributor

@padilo padilo commented Aug 14, 2024

Type of change

  • Bugfix

Description

Fixes a bug in the topic operator with unmanaged topics, that keeps updating the custom resources even isn't going to update the Kafka Topic

Closes #10380

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch 2 times, most recently from a4abdbf to 41330db Compare August 14, 2024 14:38
@ppatierno ppatierno requested review from fvaleri and a team August 14, 2024 14:42
@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch from 41330db to 48f1cfe Compare August 14, 2024 14:54
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the right fix for this issue. Also, this issue would likely manifest also in other situations with managed topics.

It seems like the Topic operator keeps updating the .status withotu changing it whihc is wrong. It needs to use the StatusDiff class to detect if something other than the timestamp changed in the status and only if it did, it should update the custom resource status. That should be done regardless whether it is managed or not.

Aside from that, I would expect the topic to not be in the Ready state when it is not managed but be in some special state such as Unmanaged to indicate that the state has been reflected by the Topic Operator. Likely it should not have the topic ID set either as it will not know if it changed or not? But that is not the core of the issue here and can be discussed separately.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the good direction. I left one comment. But the tests also seemed to have failed in the Topic Operator which seems to be related I guess?

if (statusChanged(reconcilableTopic.kt(), oldStatus)) {

StatusDiff statusDiff = new StatusDiff(oldStatus, reconcilableTopic.kt().getStatus());
if (!statusDiff.isEmpty() || differentConditions(oldStatus.getConditions(), reconcilableTopic.kt().getStatus().getConditions())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StatusDiff should include the conditions. So why do we need the second part of the condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we can remove it, I misslooked how StatusDiff evaluates the conditions path.

I'll remove it and have a look at the tests as well.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for working on this. I like the StatusDiff logic and the new Unmanaged condition type, but they require some additional changes.

The setObservedGeneration and setTopicName should be computed before the StatusDiff when reconciliation is enabled. This is why the following integration test is failing:

  • shouldUpdateTopicInKafkaWhenListConfigChangedInKube

You should also handle the corner case of "new AND unmanaged" KTs been created (the expected status should be "Unmanaged" instead of "Ready"). This is why the following integration tests are failing:

  • shouldNotCreateTopicInKafkaWhenUnmanagedTopicCreatedInKube
  • shouldRestoreFinalizerIfRemoved (unmanaged KTs only)

Likely it should not have the topic ID set either as it will not know if it changed or not?

I agree that we should reset the topicId when the KT is paused/unmanaged, and set the topicId when switching back to unpaused/managed. This is actually a separate bug, so I logged #10467 that we should address with a dedicated PR.

pom.xml Outdated
@@ -162,6 +162,7 @@
<jupiter.version>5.8.2</jupiter.version>
<strimzi-test-container.version>0.107.0</strimzi-test-container.version>
<mockserver.version>5.13.2</mockserver.version>
<awaitility.version>4.2.2</awaitility.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an extra dependency when we have TopicOperatroTestUtils.waitUntilCondition, and a number of wait methods in TopicControllerIT that you can use or adapt?

@fvaleri fvaleri added this to the 0.43.0 milestone Aug 19, 2024
@scholzj scholzj modified the milestones: 0.43.0, 0.44.0 Aug 19, 2024
@fvaleri
Copy link
Contributor

fvaleri commented Aug 21, 2024

@padilo I opened #10481 to address the issue with unmanaged topics that are partially reconciled to add finalizers. This should make it easier to complete this bugfix.

@padilo
Copy link
Contributor Author

padilo commented Aug 22, 2024

Tests should be fixed, but I still need to have a look at the test and remove awaitability lib :)

Sorry for the delay on this.

@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch 2 times, most recently from 00c30b6 to 6a3e244 Compare August 22, 2024 17:13
@fvaleri
Copy link
Contributor

fvaleri commented Aug 23, 2024

Sorry for the delay on this.

No problem.

Can you please rebase and fix checkstyle errors?

@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch 5 times, most recently from 23fd983 to 75963bd Compare August 25, 2024 14:20
@padilo
Copy link
Contributor Author

padilo commented Aug 25, 2024

I think this is already done in a different PR, so maybe this should be closed already :)

@fvaleri
Copy link
Contributor

fvaleri commented Aug 26, 2024

I think this is already done in a different PR, so maybe this should be closed already :)

If you are referring to #10481, that addresses a different issue. There is some overlap (introduction of the "Unmanaged" status condition) but there is no fix for the tight reconciliation loop reported by you. The important bit to deliver here is the introduction of StatusDiff.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Federico. This still fixes the main issue.

if (statusChanged(reconcilableTopic.kt(), oldStatus)) {

StatusDiff statusDiff = new StatusDiff(oldStatus, reconcilableTopic.kt().getStatus());
if (!statusDiff.isEmpty() || synchronizableAndDifferentGenerations(reconcilableTopic.kt(), oldStatus)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why the synchronizableAndDifferentGenerations. What exactly does it check that isn't covered by the StatusDiff? synchronizableAndDifferentGenerations seems to to return true for manage unpaused topics with different generations. But even StatusDiff should be able to handle that.

@fvaleri fvaleri mentioned this pull request Aug 27, 2024
4 tasks
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@padilo I tested your changes after removing synchronizableAndDifferentGenerations and they look good. If you remove that code, then we are good to go. Thanks.

@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch from 75963bd to ea813f6 Compare August 30, 2024 14:31
@padilo
Copy link
Contributor Author

padilo commented Aug 30, 2024

done :)

@fvaleri
Copy link
Contributor

fvaleri commented Aug 30, 2024

done :)

There is an unused import to fix.

@padilo padilo force-pushed the fix-unmanaged-infinite-loop branch from ea813f6 to 3bcb22a Compare August 30, 2024 15:48
@scholzj
Copy link
Member

scholzj commented Aug 30, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the system tests pass. Thanks for the PR.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@scholzj scholzj merged commit cb45f8d into strimzi:main Aug 31, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Infinite topic sync in topic operator
3 participants