-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ivan/http transport upgrades #1732
Conversation
feat: add multistream http download * Add a feature to split http download into multiple chunks. Chunks are downloaded into temporary files, which then get merged into one; * Downnloads over libp2p are always done in one chunk only; * Add NChunks configuration parameter; * Fix tests to handle not just open-ended range requests.
* add configuration option to disable http download from private IPs * don't follow http redirects --------- Co-authored-by: Anton Evangelatov <[email protected]>
fix: add multistream performance test verifies that multistream download is faster than single stream by introducing artificial tcp-level latency
Add control file * Control file captures the number of chunks that download has been started with and ensures that its the same across restarts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Nice work! 🙌
// this test runs a tcp proxy in front of the http web server that adds a latency to each tcp packet. | ||
// This is done to simulate network latency and ensure that multistream transfer is reasonably faster than singlestream. | ||
// rawSize and latency need to be big enough so that data transfer takes a reasonable proportion of time comparing to writing to the disk. | ||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we add t.Skip() at the top of this test as we will only use it occasionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to have that test running for each build so that we know that nothing occasionally broke concurrent transfers. For example if the code gets changed to download chunks serially, then all existing tests would just pass while the functionality would be broken. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok seems fair, let's leave it as-is 👍
This branch has a series of reviewed PRs merged into it. The code introduces the following changes:
NChunks
property that defines the number of concurrent streams for each http download with the minimum of 1, maximum of 15 and the default of 5. That will speed up how fast car files get downloaded up to the factor of theNChunks
value. Libp2p over http is always defaulted to 1 stream only;Multistream downloads have been tested on the main net. With
NChunks=5
the overall download speed averaged at 448MBps, while withNChunks=1
at 169MBps (approx 2.6x improvement). It's expected that the speed-up should be proportional to theNChunks
value if enough bandwidth / IOPs is available as the chunks are downloaded concurrently without blocking each other.After download is complete chunks need to be glued up into the main output file. In our testing that process took approx. 4 minutes to complete for a 32GB car file on HDD. We expect that to be much faster for SSDs. However, even with 4 minutes of extra overhead the performance boost of parallelising HTTP download is still significant.
Fixes #1593