Skip to content

v2.0: scheduler opt-in forwarding (backport of #1801)#2285

Merged
apfitzge merged 2 commits intov2.0from
mergify/bp/v2.0/pr-1801
Sep 12, 2024
Merged

v2.0: scheduler opt-in forwarding (backport of #1801)#2285
apfitzge merged 2 commits intov2.0from
mergify/bp/v2.0/pr-1801

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Jul 25, 2024

Problem

  • In scheduler forward packets #898 we re-added forwarding to support those with swqos side-deals to use the central scheduler.
  • This behavior is useless for any operators using the central scheduler if they do not have such side-deals

Summary of Changes

  • SchedulerController is now configurable and can enable forwarding or not.
  • Add CLI argument enable-block-production-forwarding
    • this makes forwarding opt-in. Any operators with side-deals should enable this if they want it to actually do forwarding.

Fixes #


This is an automatic backport of pull request #1801 done by [Mergify](https://mergify.com).

@mergify mergify Bot requested a review from a team as a code owner July 25, 2024 15:29
(cherry picked from commit 61d8be0)
@apfitzge apfitzge force-pushed the mergify/bp/v2.0/pr-1801 branch from 9db45fa to 6b3b5fa Compare July 25, 2024 16:22
@apfitzge
Copy link
Copy Markdown

Rebased to pull in #2235

@apfitzge apfitzge requested review from bw-solana and tao-stones July 25, 2024 20:46
@bw-solana
Copy link
Copy Markdown

bw-solana commented Jul 26, 2024

Making sure my understanding is correct.

Current State

Version Scheduler Forward Behavior Configurability
v1.18 Legacy Scheduler (Default) Forward None
v1.18 Central Scheduler No Forward None
v2.0 Central Scheduler (Default) Forward None
master Central Scheduler (Default) No Forward Can opt into fwd

State after this PR

Version Scheduler Forward Behavior Configurability
v1.18 Legacy Scheduler (Default) Forward None
v1.18 Central Scheduler No Forward None
v2.0 Central Scheduler (Default) No Forward Can opt into fwd
master Central Scheduler (Default) No Forward Can opt into fwd

The configurability works like this:

  • Default is no forwarding
  • Can opt into forwarding by providing a staked overrides yml file (presumably to pretend an RPC node is staked for QUIC connection bandwidth purposes)
  • Can opt back out of forwarding by setting disable-block-production-forwarding. Not sure what the use case is for having staked overrides but no forwarding, but flexibility is good

tao-stones
tao-stones previously approved these changes Jul 26, 2024
Copy link
Copy Markdown

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean backport, approve as original PR reviewer.

Comment thread core/src/banking_stage/transaction_scheduler/scheduler_controller.rs Outdated
Comment thread core/src/validator.rs
@apfitzge
Copy link
Copy Markdown

Making sure my understanding is correct.

Current State

Version Scheduler Forward Behavior Configurability
v1.18 Legacy Scheduler (Default) Forward None
v1.18 Central Scheduler No Forward None
v2.0 Central Scheduler (Default) Forward None
master Central Scheduler (Default) No Forward Can opt into fwd

State after this PR

Version Scheduler Forward Behavior Configurability
v1.18 Legacy Scheduler (Default) Forward None
v1.18 Central Scheduler No Forward None
v2.0 Central Scheduler (Default) No Forward Can opt into fwd
master Central Scheduler (Default) No Forward Can opt into fwd

The configurability works like this:

* Default is no forwarding

* Can opt into forwarding by providing a staked overrides yml file (presumably to pretend an RPC node is staked for QUIC connection bandwidth purposes)

* Can opt back out of forwarding by setting `disable-block-production-forwarding`. Not sure what the use case is for having staked overrides but no forwarding, but flexibility is good

@bw-solana looks correct to me.

@apfitzge
Copy link
Copy Markdown

@bw-solana I manually pulled in the bugfix in #2305 rather than creating 2 separate backports.

Copy link
Copy Markdown

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jstarry
Copy link
Copy Markdown

jstarry commented Aug 21, 2024

I don't really understand how doing swqos side deals is related to wanting to enable forwarding or not. Can you explain the reasoning?

@apfitzge
Copy link
Copy Markdown

I don't really understand how doing swqos side deals is related to wanting to enable forwarding or not. Can you explain the reasoning?

Forwarding (of non-vote) transactions is typically redundant for un-partnered folks sending you (validator) transactions.

graph LR;
user -->|time0| RPC -->|time1| validator1;
RPC -->|time1| validator2;
RPC -->|time1| validator3;
validator1 -->|forward - time2| validator2;
Loading

See above diagram. The RPCs forward to next N leaders. If validator1 is leader now or will be very soon, they will not forward the user's transactions yet. They will wait until after their slots, then start the forwarding process.
By the time validator1 actually forwards to validator2, they will have already received it from the RPC, in the vast majority of cases.

Some time ago, I added metrics to a staked box to do a test. >90% (may have been >95%) of forwarded packets were already received.

SWQOS side-deals are there because people make deals to specifically sell their stake-weighted bandwidth to partners and it was built on top of this forwarding mechanism. Many operators do not have such deals, so the forwarding is still useless for them.

@jstarry
Copy link
Copy Markdown

jstarry commented Aug 22, 2024

Some time ago, I added metrics to a staked box to do a test. >90% (may have been >95%) of forwarded packets were already received.

Was this done before swqos though? In post-swqos, wouldn't RPC fanout have a lower likelihood of success and therefore we should be doing more validator forwards?

@apfitzge
Copy link
Copy Markdown

Was this done before swqos though? In post-swqos, wouldn't RPC fanout have a lower likelihood of success and therefore we should be doing more validator forwards?

I believe that test was run before swqos was used, and the data would look significantly different for many reasons.

  1. swqos side-deals. Validators forwarding from these side deals would skew towards positive forwarding stats since their partners are likely not doing fan out.
  2. Any validator without a side-deal is not forwarding any non-votes (see BankingStage Forwarding Filter #685 - this was to prevent spammers who spammed their transaction to every node in order to get them to forward it to leader). This also skews towards positive forwarding stats since now the only forwarded non-votes are for swqos side-dealing validators.

Basically, post-#685 the forwarding loop for non-vote threads is useless unless there's a side-deal. The loop still runs, but we filter them out and don't actually forward because they are not from staked nodes over TPU (this is how the swqos side-deal partners come through).

@apfitzge
Copy link
Copy Markdown

As discussed in the backport call - #2687 should be backported to remove the CLI flag once this is backported.
cc/ @steviez @t-nelson

@steviez
Copy link
Copy Markdown

steviez commented Aug 26, 2024

As discussed in the backport call - #2687 should be backported to remove the CLI flag once this is backported.

Should we just cherry-pick #2687 into this PR ? I imagine we could land the BP PR's in quick succession, but combining would also avoid the flag from ever existing in v2.0 branch

@apfitzge
Copy link
Copy Markdown

As discussed in the backport call - #2687 should be backported to remove the CLI flag once this is backported.

Should we just cherry-pick #2687 into this PR ? I imagine we could land the BP PR's in quick succession, but combining would also avoid the flag from ever existing in v2.0 branch

In the call @t-nelson said to not do that so that we could revert more easily.

@steviez
Copy link
Copy Markdown

steviez commented Aug 26, 2024

in the call @t-nelson said to not do that so that we could revert more easily.

Hmm ok, I think missed or don't remember that portion. I don't fully follow, but separate commits should be just fine too

@willhickey
Copy link
Copy Markdown

Closing this as stale. If you think this is relevant please re-open and let @anza-xyz/backport-reviewers know why it is important

@willhickey willhickey closed this Sep 4, 2024
@mergify mergify Bot deleted the mergify/bp/v2.0/pr-1801 branch September 4, 2024 16:25
@apfitzge
Copy link
Copy Markdown

apfitzge commented Sep 4, 2024

Closing this as stale. If you think this is relevant please re-open and let @anza-xyz/backport-reviewers know why it is important

This was discussed in last weeks backport meeting. @anza-xyz/backport-reviewers please review.
I can't re-open because branch is now deleted and the restore button doesn't work, not sure if perms.

@willhickey willhickey restored the mergify/bp/v2.0/pr-1801 branch September 4, 2024 16:31
@apfitzge
Copy link
Copy Markdown

apfitzge commented Sep 4, 2024

As discussed there, the additional CLI is removed from master in #2687. That cannot be backported until this is.

@apfitzge
Copy link
Copy Markdown

apfitzge commented Sep 4, 2024

see above comments - reopening.

Copy link
Copy Markdown

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please try to have #2687 ready for bp before merging

@steviez
Copy link
Copy Markdown

steviez commented Sep 12, 2024

please try to have #2687 ready for bp before merging

Don't think that we need to BP that one. The flag that 2687 removes was introduced in #1801, merged on July 13 whereas v2.0 branch was cut in late June

@apfitzge apfitzge merged commit c01560e into v2.0 Sep 12, 2024
@apfitzge apfitzge deleted the mergify/bp/v2.0/pr-1801 branch September 12, 2024 15:43
@steviez
Copy link
Copy Markdown

steviez commented Sep 12, 2024

please try to have #2687 ready for bp before merging

Don't think that we need to BP that one. The flag that 2687 removes was introduced in #1801, merged on July 13 whereas v2.0 branch was cut in late June

Nevermind, I'm an idiot ... the flag came about in THIS pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants