-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ProjectTracking]: congestion control #48
Comments
This tool let's us model different workloads and congestion control strategies. See the added READEME.md for how it works. This is part of the first milestone in [near-one-project-tracking/issues/48](near/near-one-project-tracking#48)
This tool let's us model different workloads and congestion control strategies. See the added READEME.md for how it works. This is part of the first milestone in [near-one-project-tracking/issues/48](near/near-one-project-tracking#48)
Maybe I am not super clear on what you meant by 'Either' and 'Or' here. |
I just wanted to make it clear that the two categories of solutions discussed so far are split. One set of solutions involves dropping receipts on the go. The other involves backpressure. Either of those can work independently of the other. But I guess it makes it sounds as if they are incompatible. That's not true, indeed we could combine the two approaches for the final solution. |
Quick status update: Done A few sample workloads and strategies are also already included. But those are more of a demo of the model. The workloads are too simple to give complete picture. And the strategies are mostly just demos or exploring specific ideas in isolation. None of the strategies would be a suitable proposal. Ongoing work And on the way, we will improve the model and the output as necessary. Progress vs Time Estimate |
Status update: Done Based on several workloads and strategies we simulated, we collected ideas and evaluated which of those are good and which are bad or useless.
This has lead to two main strategies we've looked at in more details:
We then compared the two in this document: https://docs.google.com/document/d/1wVQIF0cgilO9m-iI_P5HK6MVc0b6RAxsxtyTZ1nMnBs/edit?usp=sharing The final result is a merge of the two ideas and results in:
Ongoing work
Progress vs Time Estimate:
Projected Solution Quality Initially we defined a set of must-have properties and a set of aspiration properties. Let's check in on them on which we think we can achieve. Must-have:
We will have limits in place. But they won't be explicit limits in bytes that we can guarantee. So this requirement will not be fulfilled as cleanly as we hoped for.
It looks like we will hit those in time. Aspirational Properties
Again, we don't have hard guarantees in our solution. But we believe this trade-off is necessary and will still ensure that in all but the most targeted malicious cases, it will fit into memory. And certainly, it will be a strict improvement over today's system in malicious cases.
We will satisfy this.
We will satisfy this and can still decide what we want to guarantee to be, trading it against utilization in marginal cases.
Satisfied.
Partially satisfied. Backpressure means every shard that is on the path of congesting flows will become congested and experience negative consequences, even if their shard on its own wouldn't be congested. But all other shards are completely unaffected.
This seems fulfilled about as well as we can expect it to.
|
Status update:
|
Overdue update:
|
(Supersedes #11)
Goals
We want to guarantee the Near Protocol blockchain operates stable even during congestion. This is currently impeded due to a lack of cross-shard congestion control. Specifically the delayed receipt queues of each shard may grow indefinitely during congestion, which is bad since those are part of the state tries.
With this project, we want to ensure all queues of receipts have a fixed limit in size. The queues in question are:
Current status: Local Congestion Control
Right now we have fully implemented Local Congestion Control, meaning that shard validators and RPC nodes will not be overloaded by transactions coming into their local transaction pools and receipts generated from local transactions.
Technically, this is achieved by introducing two limits:
For the exact implemented features see https://github.com/near/nearcore/milestone/26?closed=1
Next steps: Cross-Shard Congestion Control
To achieve our goal of bounded queues, we need global congestion control. The solutions currently in consideration fall into two categories.
For both categories, many ideas have been discussed already but there has been no clear winner so far.
To move this issue forward, we plan the following steps:
Likely, this will require protocol changes, so we should also add this step:
Links to external documentations and discussions
NEP-539 Cross-Shard Congestion Control
Congestion control design proposal documentation, March 2024.
Kick-off for global congestion control February 2024
State of congestion control in September 2023. This document also provides links to additional docs.
Current Zulip thread (feel free to drop any questions or comments there or here in GitHub)
Zulip thread September 2023
High-level overview of final proposal, as accepted in NEP-539: Slides
Estimated effort
We aim to solve most of the engineering work by 31st of May, 2024, with @wacban and @jakmeier working on it 50% each.
This is unlikely to be the perfect solution (see Assumptions and Out of scope below) but it should guarantee bounded queues.
Taking NEP approval and the release schedule into account, we expect this to be live in mainnet some time around July or August 2024.
Assumptions
Pre-requisites
Work starts immediately, no pre-requisites needed.
Out of scope
The text was updated successfully, but these errors were encountered: