Bug fix: deadlocked worker pool threads during a controlplane connection failure. #3487

VirajSalaka · 2024-01-26T13:10:07Z

Purpose

Bug Fix: When the retry limit reaches its maximum limit, the worker threads which calls the control plane gets blocked.

Issues

Fixes #

Automation tests

Unit tests added: No
Integration tests added: No

Tested environments

Locally tested in the following manner.
Started choreo-product-apim locally
then Started choreo-connect locally pointing to local choreo-product-apim and ASB connection string. (I used PDP mode in choreo-connect here)
Stopped the choreo-product-apim so that we could replicate a network unreachable type failure.
Sent deploy events via ASB

Monitored the behavior with and without the fix. And with the fix it did not blocked.

Maintainers: Check before merge

Assigned 'Type' label
Assigned the project
Validated respective github issues
Assigned milestone to the github issue(s)

…hreads which calls the control plane gets blocked

renuka-fernando · 2024-01-26T13:55:42Z

adapter/internal/synchronizer/apis_fetcher.go

+
+			// If API is not found (404), then there is no point in setting the control plane status as unhealthy.
+			if data.ErrorCode != 404 {
+				health.SetControlPlaneRestAPIStatus(false)


Do we set this back to healthy, in somewhere, if this has happened and is resolved?

Idea here is if the status code 401, 403 , we don't retry rather we make the adapter to be killed as it is unrecoverable. That was the initial thought. But now um wondering if we do such thing in first place. May be a log alert is the correct way to move forward.

…tinue to run in its remaining state. If the adapter restarts in the middle, then there is a possibility that all the gateways could be down

choreo-cicd · 2024-01-29T06:29:02Z

[succeeded] Dataplane(NorthEU) cluster : dev-deployment-v2 : 20240129.9

choreo-cicd · 2024-01-29T06:29:03Z

[succeeded] Dataplane(EastUS) cluster : dev-deployment-v2 : 20240129.9

choreo-cicd · 2024-01-29T06:29:03Z

[succeeded] Controlplane cluster : dev-deployment-v2 : 20240129.9

choreo-cicd · 2024-02-01T03:14:38Z

[succeeded] Dataplane(NorthEU) cluster : stage-deployment-v2 : 20240201.1

choreo-cicd · 2024-02-01T03:14:43Z

[succeeded] Dataplane(EastUS) cluster : stage-deployment-v2 : 20240201.1

choreo-cicd · 2024-02-01T03:14:51Z

[succeeded] Controlplane cluster : stage-deployment-v2 : 20240201.1

choreo-cicd · 2024-02-01T10:21:36Z

[succeeded] Controlplane cluster : prod-deployment-v2 : 20240201.3

choreo-cicd · 2024-02-01T10:22:03Z

[succeeded] Dataplane(EastUS) cluster : prod-deployment-v2 : 20240201.3

choreo-cicd · 2024-02-01T10:22:08Z

[succeeded] Dataplane(NorthEU) cluster : prod-deployment-v2 : 20240201.3

Bug Fix: When the retry limit reaches its maximum limit, the worker t…

c6d097d

…hreads which calls the control plane gets blocked

renuka-fernando previously approved these changes Jan 26, 2024

View reviewed changes

renuka-fernando reviewed Jan 26, 2024

View reviewed changes

VirajSalaka dismissed renuka-fernando’s stale review via c235d22 January 29, 2024 04:16

Change behavior such that even after 4xx error code adapter would con…

1a0c256

…tinue to run in its remaining state. If the adapter restarts in the middle, then there is a possibility that all the gateways could be down

VirajSalaka force-pushed the retry-fix branch from c235d22 to 1a0c256 Compare January 29, 2024 04:24

renuka-fernando approved these changes Jan 29, 2024

View reviewed changes

VirajSalaka merged commit 895732a into wso2:choreo Jan 29, 2024
2 of 3 checks passed

choreo-cicd added the Status/DeployedToDev label Jan 29, 2024

choreo-cicd added the Status/DeployedToStage label Feb 1, 2024

choreo-cicd added the Status/DeployedToProd label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: deadlocked worker pool threads during a controlplane connection failure. #3487

Bug fix: deadlocked worker pool threads during a controlplane connection failure. #3487

VirajSalaka commented Jan 26, 2024 •

edited

Loading

renuka-fernando Jan 26, 2024 •

edited

Loading

VirajSalaka Jan 29, 2024

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

Bug fix: deadlocked worker pool threads during a controlplane connection failure. #3487

Bug fix: deadlocked worker pool threads during a controlplane connection failure. #3487

Conversation

VirajSalaka commented Jan 26, 2024 • edited Loading

Purpose

Issues

Automation tests

Tested environments

Maintainers: Check before merge

renuka-fernando Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

VirajSalaka Jan 29, 2024

Choose a reason for hiding this comment

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Jan 29, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

choreo-cicd commented Feb 1, 2024

VirajSalaka commented Jan 26, 2024 •

edited

Loading

renuka-fernando Jan 26, 2024 •

edited

Loading