Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ProjectTracking]: congestion control #48

Open
14 tasks done
jakmeier opened this issue Feb 28, 2024 · 6 comments
Open
14 tasks done

[ProjectTracking]: congestion control #48

jakmeier opened this issue Feb 28, 2024 · 6 comments
Assignees

Comments

@jakmeier
Copy link

jakmeier commented Feb 28, 2024

(Supersedes #11)

Goals

We want to guarantee the Near Protocol blockchain operates stable even during congestion. This is currently impeded due to a lack of cross-shard congestion control. Specifically the delayed receipt queues of each shard may grow indefinitely during congestion, which is bad since those are part of the state tries.

With this project, we want to ensure all queues of receipts have a fixed limit in size. The queues in question are:

  1. Delayed receipts queue
  2. Postponed receipts queue
  3. Any new queues introduced by the congestion control solution itself

Current status: Local Congestion Control

Right now we have fully implemented Local Congestion Control, meaning that shard validators and RPC nodes will not be overloaded by transactions coming into their local transaction pools and receipts generated from local transactions.

Technically, this is achieved by introducing two limits:

  • The size of the local transaction pool for each shard is limited to 100MB, after the limit is reached, the node stops accepting new transactions. This limit is configurable
  • The size of the delayed receipts queue is limited to 20,000 receipts, after the limit is reached, the chunk producer will stop including new receipts in the chunk. This limit is not configurable but is not yet enforced by consensus

For the exact implemented features see https://github.com/near/nearcore/milestone/26?closed=1

Next steps: Cross-Shard Congestion Control

To achieve our goal of bounded queues, we need global congestion control. The solutions currently in consideration fall into two categories.

  • Either: We drop messages that do not fit in the fixed sizes of the queues. Then we provide a solution to deal with dropped (failed) receipts to contract developers.
  • Or: We apply backpressure from congested shards to other shards. Shards experiencing backpressure need to throttle the amount of outgoing receipts to the congested shards.

For both categories, many ideas have been discussed already but there has been no clear winner so far.

To move this issue forward, we plan the following steps:

Likely, this will require protocol changes, so we should also add this step:

  • Creat NEP draft describing the intentions well enough to receive feedback by engineers (Cross-Shard Congestion Control NEPs#539)
  • Make NEP for cross-shard congestion control ready for SME reviewers
  • Get protocol working group meeting scheduled
  • Get NEP approved at protocol working group meeting

Links to external documentations and discussions

NEP-539 Cross-Shard Congestion Control

Congestion control design proposal documentation, March 2024.

Kick-off for global congestion control February 2024

State of congestion control in September 2023. This document also provides links to additional docs.

Current Zulip thread (feel free to drop any questions or comments there or here in GitHub)

Zulip thread September 2023

High-level overview of final proposal, as accepted in NEP-539: Slides

Estimated effort

We aim to solve most of the engineering work by 31st of May, 2024, with @wacban and @jakmeier working on it 50% each.
This is unlikely to be the perfect solution (see Assumptions and Out of scope below) but it should guarantee bounded queues.

Taking NEP approval and the release schedule into account, we expect this to be live in mainnet some time around July or August 2024.

Assumptions

  • We assume all validators can spend a substantial amount of memory on receipts queues. (100s of MBs)
  • We can make changes to the protocol that will negatively affect the total throughput of shards, as long as it is a reasonable amount that can be justified with the stability guarantees we gain form it.
  • The number of total shards stays below 100.

Pre-requisites

Work starts immediately, no pre-requisites needed.

Out of scope

  • Per-contract fairness
  • Infinitely scalable congestion control in terms of number of shards
  • Problems around inefficient receipt queue handling in the current nearcore implementation which may lead to unnecessarily large memory consumption.
  • Reducing receipt size limits.
  • Client-side implementations for retrying rejected transactions.
  • Transaction Priority (this will be follow-up work that depends on completion of global congestion control)
  • Cleaning up how gas pricing works
jakmeier added a commit to jakmeier/nearcore that referenced this issue Mar 4, 2024
This tool let's us model different workloads and congestion control
strategies. See the added READEME.md for  how it works.

This is part of the first milestone in
[near-one-project-tracking/issues/48](near/near-one-project-tracking#48)
github-merge-queue bot pushed a commit to near/nearcore that referenced this issue Mar 5, 2024
This tool let's us model different workloads and congestion control
strategies. See the added READEME.md for how it works.

This is part of the first milestone in
[near-one-project-tracking/issues/48](near/near-one-project-tracking#48)
@walnut-the-cat
Copy link

Either: We drop messages that do not fit in the fixed sizes of the queues. Then we provide a solution to deal with dropped (failed) receipts to contract developers.
Or: We apply backpressure from congested shards to other shards. Shards experiencing backpressure need to throttle the amount of outgoing receipts to the congested shards.

Maybe I am not super clear on what you meant by 'Either' and 'Or' here.

@jakmeier
Copy link
Author

jakmeier commented Mar 6, 2024

I just wanted to make it clear that the two categories of solutions discussed so far are split. One set of solutions involves dropping receipts on the go. The other involves backpressure. Either of those can work independently of the other.

But I guess it makes it sounds as if they are incompatible. That's not true, indeed we could combine the two approaches for the final solution.

@jakmeier
Copy link
Author

Quick status update:

Done
We have a basic model (nearcore/10695) and even the ability for local Grafana dashboard to look at the results (nearcore/10719).

A few sample workloads and strategies are also already included. But those are more of a demo of the model. The workloads are too simple to give complete picture. And the strategies are mostly just demos or exploring specific ideas in isolation. None of the strategies would be a suitable proposal.

Ongoing work
This week we are looking into specific strategies and workloads. @wacban and I discussed general ideas and what exactly we want to try this week. We will share relevant results from the experiments when we have them.

And on the way, we will improve the model and the output as necessary.

Progress vs Time Estimate
The original plan was to come up with a proposed strategy based on model output by next week. I think we are just in time to hit that mark. Then we can start working on a design document in the form of an NEP and start gathering feedback from everyone else.

@jakmeier
Copy link
Author

Status update:

Done

Based on several workloads and strategies we simulated, we collected ideas and evaluated which of those are good and which are bad or useless.

This has lead to two main strategies we've looked at in more details:

We then compared the two in this document: https://docs.google.com/document/d/1wVQIF0cgilO9m-iI_P5HK6MVc0b6RAxsxtyTZ1nMnBs/edit?usp=sharing

The final result is a merge of the two ideas and results in:

Ongoing work

Progress vs Time Estimate:

  • The first milestone has been completed ~3 days after schedule. ("Decide on the high level design with the help of queueing-theory simulations.")
  • We are now 1 week / 3 weeks into the second milestone: "Write down the detailed design (NEP) and a reference implementation in nearcore". Given that we already have a first draft of the NEP, the timeline is looking okay. But we need to start working on the reference implementation immediately.

Projected Solution Quality

Initially we defined a set of must-have properties and a set of aspiration properties. Let's check in on them on which we think we can achieve.

Must-have:

Queues in the system are bounded in how many bytes they require. (This is per chunk)

We will have limits in place. But they won't be explicit limits in bytes that we can guarantee. So this requirement will not be fulfilled as cleanly as we hoped for.

The NEP for this is approved and merged.

The implementation is merged to the nearcore master branch, stabilized, and ready for testnet deployment.

It looks like we will hit those in time.

Aspirational Properties

Queues are small enough to be kept entirely in memory

Again, we don't have hard guarantees in our solution. But we believe this trade-off is necessary and will still ensure that in all but the most targeted malicious cases, it will fit into memory. And certainly, it will be a strict improvement over today's system in malicious cases.

Once accepted, transactions and all following receipts will not fail.

We will satisfy this.

Bounded latency to resolve a receipt.

We will satisfy this and can still decide what we want to guarantee to be, trading it against utilization in marginal cases.

In a non-congested setting, every shard can run at full speed.

Satisfied.

Transactions that only touch non-congested shards are not affected by congested shards.

Partially satisfied. Backpressure means every shard that is on the path of congesting flows will become congested and experience negative consequences, even if their shard on its own wouldn't be congested. But all other shards are completely unaffected.

The same strategy can be used for any number of shards.

This seems fulfilled about as well as we can expect it to.

  • We require 4 bytes in chunk header, it seems this puts no practical scaling limits on the number of shards beyond what the chunk header already has (it is already >200 bytes).
  • The buffer space from one shard to all others is shared.
    • positive: Memory requirements do not scale by number of shards, so it can be applied to any number of shards.
    • negative: Utilization might suffer with many shard, as the allocatable buffer space per outgoing shard becomes smaller. (Needs further investigation)

@jakmeier
Copy link
Author

Status update:

  • SME reviewers have been assigned (@robin-near and @Akashin) a few days ago
  • To not be blocked on approval, we are already implementing the complete feature against nearcore and started merging PR with the changes behind a feature flag

@jakmeier
Copy link
Author

jakmeier commented Jul 5, 2024

Overdue update:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants