Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Pipeline broadcast socket transmit and blocktree record#7481

Merged
solana-grimes merged 22 commits intosolana-labs:masterfrom
aeyakovenko:broadcast_socket_thread
Dec 17, 2019
Merged

Pipeline broadcast socket transmit and blocktree record#7481
solana-grimes merged 22 commits intosolana-labs:masterfrom
aeyakovenko:broadcast_socket_thread

Conversation

@aeyakovenko
Copy link
Copy Markdown
Member

@aeyakovenko aeyakovenko commented Dec 14, 2019

Problem

Broadcast stage does a sequential

  1. generate shreds (a bunch of steps)
  2. send data shreds
  3. record blocktree shrds
  4. send coding shreds

Summary of Changes

split out record and send Into their own threads. Sockets, blocktree, signing touch 3 different pieces of hardware.

new pipeline looks like

1. generate data shreds
2. async send data shreds
3. async record data shreds
4. generate coding shreds
5. async send coding shreds

Broadcast spikes are gone!!! Confirmation time is 40% better. Seems like we lost 7% of average perf, but peaks are 20% higher. Seems like a win. I am also seeing 200+ shreds to sign, so it might start making sense to go to the GPU.

Before:

Screen Shot 2019-12-14 at 1 17 37 PM

Screen Shot 2019-12-14 at 1 17 48 PM

After:
Screen Shot 2019-12-14 at 12 41 07 PM
Screen Shot 2019-12-14 at 12 41 23 PM

Fixes #

@aeyakovenko aeyakovenko marked this pull request as ready for review December 14, 2019 09:17
@aeyakovenko
Copy link
Copy Markdown
Member Author

@mvines i added multiple broadcast sockets, and now i see this:

panicked at 'assertion failed: udp_ports.len() <= msg.udp_ports.len()', net-utils/src/ip_echo_server.rs:26:9
--

@sagar-solana
Copy link
Copy Markdown
Contributor

@aeyakovenko Record and Transmit should be strictly ordered right? Otherwise we run the risk of transmitting something we didn't record. It can cause the leader to make double transmissions (and get slashed) if it crashes after tx but before record.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 16, 2019

Codecov Report

Merging #7481 into master will increase coverage by 14.9%.
The diff coverage is 73.2%.

@@            Coverage Diff            @@
##           master   #7481      +/-   ##
=========================================
+ Coverage    65.8%   80.7%   +14.9%     
=========================================
  Files         245     244       -1     
  Lines       60342   48911   -11431     
=========================================
- Hits        39718   39512     -206     
+ Misses      20624    9399   -11225

@aeyakovenko
Copy link
Copy Markdown
Member Author

@sagar-solana
Leaders shouldn't try to generate the same block. If they reset, they should wait until the next block.

@sagar-solana
Copy link
Copy Markdown
Contributor

@sagar-solana
Leaders shouldn't try to generate the same block. If they reset, they should wait until the next block.

How would they know that they've already generated this block? Right now they use blocktree as a signal to determine whether or not a block was made.

@aeyakovenko
Copy link
Copy Markdown
Member Author

@sagar-solana the validator boot sequence forces them to wait for a live tx to be acknowledged by the network, which means that they have to skip their half built block anyways.

@sagar-solana
Copy link
Copy Markdown
Contributor

@aeyakovenko cool, that should avoid this. Has that already been merged? We'll have to make sure that's done before slashing comes in.

@aeyakovenko
Copy link
Copy Markdown
Member Author

@carllin how is that different if today we corrupted rocks on reset? There is no guarantee of data availability.

@mergify mergify Bot dismissed sakridge’s stale review December 16, 2019 21:02

Pull request has been modified.

} else {
socket_sender.send((stakes.clone(), data_shreds.clone()))?;
blocktree_sender.send(data_shreds.clone())?;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carllin
carllin previously approved these changes Dec 16, 2019
//Insert the first shred so blocktree stores that the leader started this block
//This must be done before the blocks are sent out over the wire.
if data_shreds.len() > 0 && data_shreds[0].index() == 0 {
let first = vec![data_shreds[0].clone()];
Copy link
Copy Markdown
Contributor

@pgarg66 pgarg66 Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this cause the first shred to be inserted twice? First time here, and then thru blocktree_sender.send()? Even though, the second insert will eventually fail.

Copy link
Copy Markdown
Contributor

@carllin carllin Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thnk right now b/c the is_trusted flag iis set, it will jsut overwrite it the seccond tiime XD, it's ok, shreds are small :D

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a problem waiting to happen..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgarg66 why? I would expect the database to be sound if the same data is inserted twice. that seems like a simple requirement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's less about whether the db can handle it and more about a redundant call to insert the same data. Nbd I think since we'll hopefully get rid of this later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagar-solana what is the concern? I can't imagine this ever being noticeable for performance.

@aeyakovenko aeyakovenko added the automerge Merge this Pull Request automatically once CI passes label Dec 16, 2019
@solana-grimes solana-grimes removed the automerge Merge this Pull Request automatically once CI passes label Dec 16, 2019
@solana-grimes
Copy link
Copy Markdown
Contributor

💔 Unable to automerge due to CI failure

@aeyakovenko aeyakovenko added the automerge Merge this Pull Request automatically once CI passes label Dec 16, 2019
@mergify mergify Bot dismissed carllin’s stale review December 16, 2019 21:54

Pull request has been modified.

@solana-grimes solana-grimes removed the automerge Merge this Pull Request automatically once CI passes label Dec 16, 2019
@solana-grimes
Copy link
Copy Markdown
Contributor

💔 Unable to automerge due to CI failure

@aeyakovenko aeyakovenko force-pushed the broadcast_socket_thread branch from dba7d97 to e4d38cb Compare December 16, 2019 23:39
@aeyakovenko aeyakovenko force-pushed the broadcast_socket_thread branch from 1a861ea to 0049703 Compare December 17, 2019 00:09
@aeyakovenko aeyakovenko added the automerge Merge this Pull Request automatically once CI passes label Dec 17, 2019
@solana-grimes solana-grimes merged commit 97589f7 into solana-labs:master Dec 17, 2019
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 11, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 12, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 12, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 12, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 13, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Jun 13, 2022
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

solana-labs#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
solana-labs#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
behzadnouri added a commit that referenced this pull request Jun 16, 2022
#25916)

StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
mergify Bot pushed a commit that referenced this pull request Jun 16, 2022
#25916)

StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.

(cherry picked from commit eacb918)
mergify Bot pushed a commit that referenced this pull request Jun 16, 2022
#25916)

StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.

(cherry picked from commit eacb918)
mergify Bot added a commit that referenced this pull request Jun 16, 2022
#25916) (#26005)

StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.

(cherry picked from commit eacb918)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
mergify Bot added a commit that referenced this pull request Jun 16, 2022
#25916) (#26006)

StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

#7481
which added this code was not inserting coding shreds into blockstore.
Starting with
#8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.

(cherry picked from commit eacb918)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

automerge Merge this Pull Request automatically once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants