cluster: unstuck cluster manager when update the initializing cluster#13875
cluster: unstuck cluster manager when update the initializing cluster#13875mattklein123 merged 20 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
The new test AdsClusterV3Test.ClusterUpdateWhenWarming is failing on CI but I cannot reproduce locally Is it another #11877 ? It seems CDS and EDS update on Envoy are concurrent executing |
|
The PR that is being reverted is causing production issues, so let's land that revert, and then work on re-applying with better tests? cc @Shikugawa /wait |
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
|
linux_x64 release test passed! |
|
Sorry is this still valid after reverting the other PR? Or should I wait until that issue is resolved? /wait-any |
|
I can try this one tomorrow, we should spot the problem immediately. |
That is, on top of master with #13344 already reverted. |
Oh nvm, this is not for the crash. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks in general this makes sense to me with a few small comments.
/wait
…erfix Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks LGTM pending potentially a better integration test for the primary case. Can you also merge main to pick up the CI fix?
/wait
| // The override cluster is added. Manually drop the previous cluster. In production flow this is | ||
| // achieved by ClusterManagerImpl. | ||
| sds.reset(); |
There was a problem hiding this comment.
Is it possible to add a cluster manager test of this case that uses "real" DNS clusters? I think the main sequence that can actually happen here is:
- CDS adds DNS cluster
- DNS cluster begins init
- CDS updates DNS cluster?
I think you could also pretty easily test this in your integration test below by doing:
- Have CDS deliver a DNS cluster configured with health checking.
- DNS cluster won't initialize pending health checking.
- Update DNS cluster
?
There was a problem hiding this comment.
Thank you for the DNS with health check!
After a quick search I didn't find an existing example but this test case
TEST_F(ClusterManagerImplTest, RemoveWarmingCluster) seem putting a defaultStaticCluster("fake_cluster") in warm.
Writing a test
There was a problem hiding this comment.
A slight difference is that DNS cluster is always depending on resolver.
I modified the cluster to STATIC type to make it immediate ready.
There was a problem hiding this comment.
Thanks this all looks good, but can you add a real integration test for this in ADS integration test? I think it's really easy based on what you already have. Just use a static cluster with health checking, it will be stuck initializing, then you can update it?
/wait
There was a problem hiding this comment.
I had the impression that static cluster with bad health check doesn't block initialization...
Yeah, adding integrate test to confirm initialization and the behavior health check behavior. Will udpate
There was a problem hiding this comment.
Added.
Also confirmed that failed health check in static cluster also turn the warm cluster to ready.
And STRICT DNS resolve failure has the same scary side effect.
I end up not accept the tcp connection but not write back to hold the DNS request.
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
| } | ||
| } | ||
|
|
||
| AssertionResult compareSets(const std::set<std::string>& set1, const std::set<std::string>& set2, |
There was a problem hiding this comment.
Move this function up so that more functions defined in this file can call it.
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
…envoyproxy#13875) Signed-off-by: Yuchen Dai <silentdai@gmail.com>
* cluster manager: avoid immediate activation for dynamic inserted cluster when initialize (envoyproxy#12783) Signed-off-by: Shikugawa <rei@tetrate.io> Signed-off-by: Yuchen Dai <silentdai@gmail.com> * cluster: unstuck cluster manager when update the initializing cluster (envoyproxy#13875) Signed-off-by: Yuchen Dai <silentdai@gmail.com> Co-authored-by: Rei Shimizu <rei@tetrate.io>
Commit Message:
Remove existing initializing cluster when updating secondary clusters.
the destroy of existing cluster.
the warming secondary cluster correctly.
Additional Description:
Risk Level: LOW
Testing: integration test
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
fix #13874