[ProjectTracking]: congestion control #48

jakmeier · 2024-02-28T17:18:24Z

(Supersedes #11)

Goals

We want to guarantee the Near Protocol blockchain operates stable even during congestion. This is currently impeded due to a lack of cross-shard congestion control. Specifically the delayed receipt queues of each shard may grow indefinitely during congestion, which is bad since those are part of the state tries.

With this project, we want to ensure all queues of receipts have a fixed limit in size. The queues in question are:

Delayed receipts queue
Postponed receipts queue
Any new queues introduced by the congestion control solution itself

Current status: Local Congestion Control

Right now we have fully implemented Local Congestion Control, meaning that shard validators and RPC nodes will not be overloaded by transactions coming into their local transaction pools and receipts generated from local transactions.

Technically, this is achieved by introducing two limits:

The size of the local transaction pool for each shard is limited to 100MB, after the limit is reached, the node stops accepting new transactions. This limit is configurable
The size of the delayed receipts queue is limited to 20,000 receipts, after the limit is reached, the chunk producer will stop including new receipts in the chunk. This limit is not configurable but is not yet enforced by consensus

For the exact implemented features see https://github.com/near/nearcore/milestone/26?closed=1

Next steps: Cross-Shard Congestion Control

To achieve our goal of bounded queues, we need global congestion control. The solutions currently in consideration fall into two categories.

Either: We drop messages that do not fit in the fixed sizes of the queues. Then we provide a solution to deal with dropped (failed) receipts to contract developers.
Or: We apply backpressure from congested shards to other shards. Shards experiencing backpressure need to throttle the amount of outgoing receipts to the congested shards.

For both categories, many ideas have been discussed already but there has been no clear winner so far.

To move this issue forward, we plan the following steps:

Likely, this will require protocol changes, so we should also add this step:

Creat NEP draft describing the intentions well enough to receive feedback by engineers (Cross-Shard Congestion Control NEPs#539)
Make NEP for cross-shard congestion control ready for SME reviewers
Get protocol working group meeting scheduled
Get NEP approved at protocol working group meeting

Links to external documentations and discussions

NEP-539 Cross-Shard Congestion Control

Congestion control design proposal documentation, March 2024.

Kick-off for global congestion control February 2024

State of congestion control in September 2023. This document also provides links to additional docs.

Current Zulip thread (feel free to drop any questions or comments there or here in GitHub)

Zulip thread September 2023

High-level overview of final proposal, as accepted in NEP-539: Slides

Estimated effort

We aim to solve most of the engineering work by 31st of May, 2024, with @wacban and @jakmeier working on it 50% each.
This is unlikely to be the perfect solution (see Assumptions and Out of scope below) but it should guarantee bounded queues.

Taking NEP approval and the release schedule into account, we expect this to be live in mainnet some time around July or August 2024.

Assumptions

We assume all validators can spend a substantial amount of memory on receipts queues. (100s of MBs)
We can make changes to the protocol that will negatively affect the total throughput of shards, as long as it is a reasonable amount that can be justified with the stability guarantees we gain form it.
The number of total shards stays below 100.

Pre-requisites

Work starts immediately, no pre-requisites needed.

Out of scope

Per-contract fairness
Infinitely scalable congestion control in terms of number of shards
Problems around inefficient receipt queue handling in the current nearcore implementation which may lead to unnecessarily large memory consumption.
Reducing receipt size limits.
Client-side implementations for retrying rejected transactions.
Transaction Priority (this will be follow-up work that depends on completion of global congestion control)
Cleaning up how gas pricing works

This tool let's us model different workloads and congestion control strategies. See the added READEME.md for how it works. This is part of the first milestone in [near-one-project-tracking/issues/48](near/near-one-project-tracking#48)

walnut-the-cat · 2024-03-05T19:34:34Z

Either: We drop messages that do not fit in the fixed sizes of the queues. Then we provide a solution to deal with dropped (failed) receipts to contract developers.
Or: We apply backpressure from congested shards to other shards. Shards experiencing backpressure need to throttle the amount of outgoing receipts to the congested shards.

Maybe I am not super clear on what you meant by 'Either' and 'Or' here.

jakmeier · 2024-03-06T05:39:23Z

I just wanted to make it clear that the two categories of solutions discussed so far are split. One set of solutions involves dropping receipts on the go. The other involves backpressure. Either of those can work independently of the other.

But I guess it makes it sounds as if they are incompatible. That's not true, indeed we could combine the two approaches for the final solution.

jakmeier · 2024-03-11T13:58:22Z

Quick status update:

Done
We have a basic model (nearcore/10695) and even the ability for local Grafana dashboard to look at the results (nearcore/10719).

A few sample workloads and strategies are also already included. But those are more of a demo of the model. The workloads are too simple to give complete picture. And the strategies are mostly just demos or exploring specific ideas in isolation. None of the strategies would be a suitable proposal.

Ongoing work
This week we are looking into specific strategies and workloads. @wacban and I discussed general ideas and what exactly we want to try this week. We will share relevant results from the experiments when we have them.

And on the way, we will improve the model and the output as necessary.

Progress vs Time Estimate
The original plan was to come up with a proposed strategy based on model output by next week. I think we are just in time to hit that mark. Then we can start working on a design document in the form of an NEP and start gathering feedback from everyone else.

jakmeier · 2024-03-25T15:17:52Z

Status update:

Done

Based on several workloads and strategies we simulated, we collected ideas and evaluated which of those are good and which are bad or useless.

Analysis on advanced workloads, crafter to be harder to predict where work is executed: congestion: advanced balanced workloads nearcore#10794
Same for workloads with unpredictably large receipts (many bytes per gas): congestion: workloads with big receipts nearcore#10822

This has lead to two main strategies we've looked at in more details:

We then compared the two in this document: https://docs.google.com/document/d/1wVQIF0cgilO9m-iI_P5HK6MVc0b6RAxsxtyTZ1nMnBs/edit?usp=sharing

The final result is a merge of the two ideas and results in:

A first NEP draft: Cross-Shard Congestion Control NEPs#539
A design documentation page for the nearcore guide book: docs: Describe congestion fundamentals nearcore#10873

Ongoing work

Waiting for feedback on our design
Complement missing sections in the NEP
Fine-tune parameters in the proposal: congestion control: fine-tune parametrs based on expected real-world traffic nearcore#10874
Look at behaviour with non-continuous workloads: congestion-model: add new workload with peaks nearcore#10704

Progress vs Time Estimate:

The first milestone has been completed ~3 days after schedule. ("Decide on the high level design with the help of queueing-theory simulations.")
We are now 1 week / 3 weeks into the second milestone: "Write down the detailed design (NEP) and a reference implementation in nearcore". Given that we already have a first draft of the NEP, the timeline is looking okay. But we need to start working on the reference implementation immediately.

Projected Solution Quality

Initially we defined a set of must-have properties and a set of aspiration properties. Let's check in on them on which we think we can achieve.

Must-have:

Queues in the system are bounded in how many bytes they require. (This is per chunk)

We will have limits in place. But they won't be explicit limits in bytes that we can guarantee. So this requirement will not be fulfilled as cleanly as we hoped for.

The NEP for this is approved and merged.

The implementation is merged to the nearcore master branch, stabilized, and ready for testnet deployment.

It looks like we will hit those in time.

Aspirational Properties

Queues are small enough to be kept entirely in memory

Again, we don't have hard guarantees in our solution. But we believe this trade-off is necessary and will still ensure that in all but the most targeted malicious cases, it will fit into memory. And certainly, it will be a strict improvement over today's system in malicious cases.

Once accepted, transactions and all following receipts will not fail.

We will satisfy this.

Bounded latency to resolve a receipt.

We will satisfy this and can still decide what we want to guarantee to be, trading it against utilization in marginal cases.

In a non-congested setting, every shard can run at full speed.

Satisfied.

Transactions that only touch non-congested shards are not affected by congested shards.

Partially satisfied. Backpressure means every shard that is on the path of congesting flows will become congested and experience negative consequences, even if their shard on its own wouldn't be congested. But all other shards are completely unaffected.

The same strategy can be used for any number of shards.

This seems fulfilled about as well as we can expect it to.

We require 4 bytes in chunk header, it seems this puts no practical scaling limits on the number of shards beyond what the chunk header already has (it is already >200 bytes).
The buffer space from one shard to all others is shared.
- positive: Memory requirements do not scale by number of shards, so it can be applied to any number of shards.
- negative: Utilization might suffer with many shard, as the allocatable buffer space per outgoing shard becomes smaller. (Needs further investigation)

jakmeier · 2024-04-26T09:31:11Z

Status update:

SME reviewers have been assigned (@robin-near and @Akashin) a few days ago
To not be blocked on approval, we are already implementing the complete feature against nearcore and started merging PR with the changes behind a feature flag

jakmeier · 2024-07-05T13:57:02Z

Overdue update:

NEP-539 had been accepted in the call on May 24.
The full implementation is in nearcore master.
The release of the feature has been delayed since the next release contains the huge change to stateless validation (SV). This means congestion control needs to work well with SV, which required extra testing and a few changes due to additional features SV requested from congestion control. But it looks like SV will hit testnet next week, which will include cross-shard congestion control.
Beyond the scope defined in this issue, we have also looked at changes on the RPC nodes.
- New error types for a congested shard (feat(congestion): reject new transactions on RPC level nearcore#11419, feat(congestion): More info on shard congested RPC error nearcore#11716, and public documentation for it document ShardCongested error on tx submission docs#2018)
- A new RPC method to query the congestion level feat(rpc): new JSON RPC method congestion_level nearcore#11481.

jakmeier added the project-tracking label Feb 28, 2024

jakmeier mentioned this issue Feb 28, 2024

[ProjectTracking]: congestion control milestone 2 #11

Closed

akhi3030 assigned jakmeier Feb 28, 2024

jakmeier mentioned this issue Mar 4, 2024

congestion: create congestion modelling framework near/nearcore#10695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ProjectTracking]: congestion control #48

[ProjectTracking]: congestion control #48

jakmeier commented Feb 28, 2024 •

edited

Loading

walnut-the-cat commented Mar 5, 2024

jakmeier commented Mar 6, 2024

jakmeier commented Mar 11, 2024

jakmeier commented Mar 25, 2024

jakmeier commented Apr 26, 2024

jakmeier commented Jul 5, 2024

[ProjectTracking]: congestion control #48

[ProjectTracking]: congestion control #48

Comments

jakmeier commented Feb 28, 2024 • edited Loading

Goals

Current status: Local Congestion Control

Next steps: Cross-Shard Congestion Control

Links to external documentations and discussions

Estimated effort

Assumptions

Pre-requisites

Out of scope

walnut-the-cat commented Mar 5, 2024

jakmeier commented Mar 6, 2024

jakmeier commented Mar 11, 2024

jakmeier commented Mar 25, 2024

jakmeier commented Apr 26, 2024

jakmeier commented Jul 5, 2024

jakmeier commented Feb 28, 2024 •

edited

Loading