Improvement for distributed backend throttling #1791

rmsamitha · 2023-05-03T08:48:54Z

rmsamitha · 2023-05-04T08:39:50Z

Meeting 3/05/2023

Discussed on analysis details of current design, design suggested by CS team, Choreo-APIM and other possible solution approaches
Decided to proceed with Sync-Async Hybrid approach

rmsamitha · 2023-05-04T08:47:02Z

Meeting 04/05/2023

Discussed on analysis details of current design, design suggested by CS team, Choreo-APIM and other possible solution approaches with customer success team considering customer use cases as well.
Decided and confirmed to proceed with Sync-Async Hybrid approach

rmsamitha · 2023-05-22T07:03:11Z

Meeting 19/05/2023

Participants: TharinduA, ThiliniS, Samitha, Nandika, IsuruU

Summary:
Decided to use interface/abstraction or separate class/method based implementation rather than using if/else blocks to write the new implementation related code. This will make the code more readable and easier to understand and update later too and also it will make it easier to drop the old behaviur in future releases, if needed.

rmsamitha · 2023-05-29T08:26:14Z

Meeting: 26/05/2023

Participants: TharinduA, ThiliniS, Samitha, IsuruU

Summary:
Discussed on which approach to use to get the GW count and decided to use the channel subscription count based approach.

rmsamitha · 2023-06-27T10:53:14Z

Meeting: 14/06/2023

Participants: TharinduA, ThiliniS, Samitha, IsuruU, Nandika

Summary
Discussed on the current progress. Planned how to test in K8s setup to simulate and test real time use cases.
Discussed and got the confirmation to use a separate channel to count the number of gateways, instead of the sync_mode_change channel. In this approach, when a new GW node is spawned, at the server startup, a message will be sent to this specific channel conveying that a new node has joined. So via pub/sub model, all other GW nodes will be triggered and they will ask for the channel subscription count from the Redis. This solution will not address server shutdown cases where the number of GWs should be reduced.

rmsamitha · 2023-06-29T10:23:25Z

Meeting: 29/06/2023

Participants: TharinduA, Samitha, IsuruU, Nandika

Summary:

Decided to use a request context to pass the current time rather than passing the current time in method arguments. It is ok if there is only the currentTime only in that context object ATM.
Implement the locking mechanism in redis and client side.
Discussed further on the Redis call timeout and importance of setting that

rmsamitha · 2023-08-02T06:53:37Z

Meeting: 28/07/2023

Participants: TharinduA, Samitha, IsuruU, Nandika, ThiliniS

Summary:

Presented the current progress.
Presented the load test results in AKS setup.
Highlighted the high latency issue and decided to investigate more on that. (i.e. get the latency for each specific step and figure out the impact)
Suggested the implementation of clock-time-window based implementation (instead of non clock time based but started timestamp based window) as a POC later.
Decided to have a meeting with project stakeholders to communicate the state.

rmsamitha · 2023-08-09T03:19:44Z

Meeting: 07/08/2023

Participants: Nandika, TharinduA, ThiliniS, SanjeewaM, Dinusha, Ruwan, Janith, Miyuru, Samitha

Summary:

Briefly described the design of the solution
Presented the upto now implementation and testing outcomes
Avoid waiting for requests to acquire the lock for requests more than maximum quota (throttle quota). Those additional requests should be directly rejected without waiting.
Add a configuration whether to reject or allow the "redis lock_acquiring_timeout" exceeded api requests. (suggestion for default behavior was to reject).
Investigate on the premature throttling scenario in test results
Communicate an ETA from the product team once the currently identified obvious issues are fixed.
Implement the clock time window based implementation only once the current implementation is finalized.
CS team will communicate about the current implementation progress to customer

rmsamitha · 2023-08-22T06:25:38Z

Meeting Date: 15/08/2023
Participants: SanjeewaM, Nandika, TharinduA, ThiliniS, Dinusha, IsuruU, Samitha

Summary:

Samitha clarified in-detail the two root causes for the premature throttling scenario (as described in the previous email) and mentioned on how the issue is not of a high impact and how the issue can be not a critical concern.
Sanjeewa described the similar concerns encountered and addressed in hazelcast based older implementations
After further discussions, decided that this premature throttling scenario is not much of a concern and that is not a blocker to proceed.
In the feature docs we may mention that there might be some amount of spill percentage which is acceptable.
Discussed further on what should be done to the redis-lock-acquiring-time-out exceeded requests.
- The main concern was that customers may report that there are intermittent request failures, even at non-throttle-out situations.
- This would be a critical concern to defend. And the root cause is a limitation in our implementation, so it is not correct to fail a request due to that.
- So the decision was that it would allow such requests rather than rejecting.
- Decided also not to add a configuration to change the behavior since the implementation should be consistent and additional configurations are not ideal at this level as per the suggestion by Sanjeewa. We may introduce a configuration in the future if such a request is made by a client.
Decided to proceed to complete the testing efforts by load testing on a auto-scaling deployment (with upto 10 gateways). After that the u2 creation would be started and the delivery will be done.

rmsamitha · 2023-10-16T06:20:57Z

Closing as completed

rmsamitha added the Type/Improvement label May 3, 2023

rmsamitha self-assigned this May 3, 2023

github-actions bot added the Missing/Component label May 3, 2023

rmsamitha added Component/APIM Affected/APIM-4.2.0 and removed Missing/Component labels May 3, 2023

This was referenced Sep 1, 2023

Doc changes for redis distributed throttling improvement feature wso2/docs-apim#7222

Merged

Doc changes for redis distributed throttling improvement feature wso2/docs-apim#7225

Merged

rmsamitha mentioned this issue Oct 16, 2023

Adding redis throttling improvement related docs wso2/docs-apim#7337

Merged

rmsamitha closed this as completed Oct 16, 2023

npamudika added this to the 4.3.0-M1 milestone Jan 3, 2024

npamudika added 4.3.0 4.3.0-M1 4.3.0 M1 Milestone labels Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement for distributed backend throttling #1791

Improvement for distributed backend throttling #1791

rmsamitha commented May 3, 2023 •

edited

Loading

rmsamitha commented May 4, 2023

rmsamitha commented May 4, 2023 •

edited

Loading

rmsamitha commented May 22, 2023 •

edited

Loading

rmsamitha commented May 29, 2023 •

edited

Loading

rmsamitha commented Jun 27, 2023

rmsamitha commented Jun 29, 2023

rmsamitha commented Aug 2, 2023

rmsamitha commented Aug 9, 2023

rmsamitha commented Aug 22, 2023

rmsamitha commented Oct 16, 2023

Improvement for distributed backend throttling #1791

Improvement for distributed backend throttling #1791

Comments

rmsamitha commented May 3, 2023 • edited Loading

Problem

Affected Component

rmsamitha commented May 4, 2023

rmsamitha commented May 4, 2023 • edited Loading

rmsamitha commented May 22, 2023 • edited Loading

rmsamitha commented May 29, 2023 • edited Loading

rmsamitha commented Jun 27, 2023

rmsamitha commented Jun 29, 2023

rmsamitha commented Aug 2, 2023

rmsamitha commented Aug 9, 2023

rmsamitha commented Aug 22, 2023

rmsamitha commented Oct 16, 2023

rmsamitha commented May 3, 2023 •

edited

Loading

rmsamitha commented May 4, 2023 •

edited

Loading

rmsamitha commented May 22, 2023 •

edited

Loading

rmsamitha commented May 29, 2023 •

edited

Loading