Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ivan/http transport upgrades #1732

Merged
merged 4 commits into from
Oct 3, 2023
Merged

Ivan/http transport upgrades #1732

merged 4 commits into from
Oct 3, 2023

Conversation

ischasny
Copy link
Collaborator

@ischasny ischasny commented Oct 3, 2023

This branch has a series of reviewed PRs merged into it. The code introduces the following changes:

  • Feature: Add NChunks property that defines the number of concurrent streams for each http download with the minimum of 1, maximum of 15 and the default of 5. That will speed up how fast car files get downloaded up to the factor of the NChunks value. Libp2p over http is always defaulted to 1 stream only;
  • Feature: Http downloads persist parameters that they have been started with into the control file. That will allow boost to consistently resume downloads even if the configuration changes between node restarts;
  • Test: Add performance test that simulates network latency and ensures that multi-stream http downloads are reasonably faster than single-stream;

Multistream downloads have been tested on the main net. With NChunks=5 the overall download speed averaged at 448MBps, while with NChunks=1 at 169MBps (approx 2.6x improvement). It's expected that the speed-up should be proportional to the NChunks value if enough bandwidth / IOPs is available as the chunks are downloaded concurrently without blocking each other.

After download is complete chunks need to be glued up into the main output file. In our testing that process took approx. 4 minutes to complete for a 32GB car file on HDD. We expect that to be much faster for SSDs. However, even with 4 minutes of extra overhead the performance boost of parallelising HTTP download is still significant.

Fixes #1593

ischasny and others added 4 commits September 28, 2023 14:14
feat: add multistream http download

* Add a feature to split http download into multiple chunks. Chunks are downloaded into temporary files, which then get merged into one;
* Downnloads over libp2p are always done in one chunk only;
* Add NChunks configuration parameter;
* Fix tests to handle not just open-ended range requests.
* add configuration option to disable http download from private IPs
* don't follow http redirects
---------

Co-authored-by: Anton Evangelatov <[email protected]>
fix: add multistream performance test

verifies that multistream download is faster than single stream by introducing artificial tcp-level latency
Add control file

* Control file captures the number of chunks that download has been started with and ensures that its the same across restarts
Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Nice work! 🙌

// this test runs a tcp proxy in front of the http web server that adds a latency to each tcp packet.
// This is done to simulate network latency and ensure that multistream transfer is reasonably faster than singlestream.
// rawSize and latency need to be big enough so that data transfer takes a reasonable proportion of time comparing to writing to the disk.
ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we add t.Skip() at the top of this test as we will only use it occasionally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to have that test running for each build so that we know that nothing occasionally broke concurrent transfers. For example if the code gets changed to download chunks serially, then all existing tests would just pass while the functionality would be broken. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok seems fair, let's leave it as-is 👍

@ischasny ischasny merged commit 6f4c05f into main Oct 3, 2023
21 checks passed
@ischasny ischasny deleted the ivan/http-transport-upgrades branch October 3, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Multi Stream http download for online deals
4 participants