Pipeline broadcast socket transmit and blocktree record#7481
Pipeline broadcast socket transmit and blocktree record#7481solana-grimes merged 22 commits intosolana-labs:masterfrom
Conversation
|
@mvines i added multiple broadcast sockets, and now i see this: |
|
@aeyakovenko Record and Transmit should be strictly ordered right? Otherwise we run the risk of transmitting something we didn't record. It can cause the leader to make double transmissions (and get slashed) if it crashes after tx but before record. |
Codecov Report
@@ Coverage Diff @@
## master #7481 +/- ##
=========================================
+ Coverage 65.8% 80.7% +14.9%
=========================================
Files 245 244 -1
Lines 60342 48911 -11431
=========================================
- Hits 39718 39512 -206
+ Misses 20624 9399 -11225 |
|
@sagar-solana |
How would they know that they've already generated this block? Right now they use blocktree as a signal to determine whether or not a block was made. |
|
@sagar-solana the validator boot sequence forces them to wait for a live tx to be acknowledged by the network, which means that they have to skip their half built block anyways. |
|
@aeyakovenko cool, that should avoid this. Has that already been merged? We'll have to make sure that's done before slashing comes in. |
|
@carllin how is that different if today we corrupted rocks on reset? There is no guarantee of data availability. |
Pull request has been modified.
| } else { | ||
| socket_sender.send((stakes.clone(), data_shreds.clone()))?; | ||
| blocktree_sender.send(data_shreds.clone())?; | ||
| } |
| //Insert the first shred so blocktree stores that the leader started this block | ||
| //This must be done before the blocks are sent out over the wire. | ||
| if data_shreds.len() > 0 && data_shreds[0].index() == 0 { | ||
| let first = vec![data_shreds[0].clone()]; |
There was a problem hiding this comment.
Wouldn't this cause the first shred to be inserted twice? First time here, and then thru blocktree_sender.send()? Even though, the second insert will eventually fail.
There was a problem hiding this comment.
I thnk right now b/c the is_trusted flag iis set, it will jsut overwrite it the seccond tiime XD, it's ok, shreds are small :D
There was a problem hiding this comment.
Sounds like a problem waiting to happen..
There was a problem hiding this comment.
@pgarg66 why? I would expect the database to be sound if the same data is inserted twice. that seems like a simple requirement.
There was a problem hiding this comment.
I think it's less about whether the db can handle it and more about a redundant call to insert the same data. Nbd I think since we'll hopefully get rid of this later.
There was a problem hiding this comment.
@sagar-solana what is the concern? I can't imagine this ever being noticeable for performance.
|
💔 Unable to automerge due to CI failure |
Pull request has been modified.
|
💔 Unable to automerge due to CI failure |
dba7d97 to
e4d38cb
Compare
…serts of that shred
1a861ea to
0049703
Compare
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 solana-labs#7481 which added this code was not inserting coding shreds into blockstore. Starting with solana-labs#8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
#25916) StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 #7481 which added this code was not inserting coding shreds into blockstore. Starting with #8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data.
#25916) StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 #7481 which added this code was not inserting coding shreds into blockstore. Starting with #8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data. (cherry picked from commit eacb918)
#25916) StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 #7481 which added this code was not inserting coding shreds into blockstore. Starting with #8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data. (cherry picked from commit eacb918)
#25916) (#26005) StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 #7481 which added this code was not inserting coding shreds into blockstore. Starting with #8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data. (cherry picked from commit eacb918) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
#25916) (#26006) StandardBroadcastRun::insert skips 1st shred with index zero because the 1st *data* shred is inserted synchronously: https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246 https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339 #7481 which added this code was not inserting coding shreds into blockstore. Starting with #8899 coding shreds are inserted into blockstore as well as data shreds, but the insert logic erroneously skips first coding shred because it does not check if shred is code or data. (cherry picked from commit eacb918) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
Broadcast stage does a sequential
Summary of Changes
split out record and send Into their own threads. Sockets, blocktree, signing touch 3 different pieces of hardware.
new pipeline looks like
Broadcast spikes are gone!!! Confirmation time is 40% better. Seems like we lost 7% of average perf, but peaks are 20% higher. Seems like a win. I am also seeing 200+ shreds to sign, so it might start making sense to go to the GPU.
Before:
After:


Fixes #