Skip to content

Conversation

@lexnv
Copy link
Collaborator

@lexnv lexnv commented Feb 13, 2025

This PR avoids an allocation by clearing out the write buffer elements.

While at it, have increased the buffer size from 2000 to 8192 for extra space and cache alignment. This is the default value used by std/tokio implementation for buffered operations.

@lexnv lexnv merged commit f6d02ff into master Feb 18, 2025
8 checks passed
@lexnv lexnv deleted the lexnv/optimize-stream branch February 18, 2025 14:04
lexnv added a commit that referenced this pull request Feb 20, 2025
## [0.9.1] - 2025-01-19

This release enhances compatibility between litep2p and libp2p by
backporting the latest Yamux updates. Additionally, it includes various
improvements and fixes to boost the stability and performance of the
WebSocket stream and the multistream-select protocol.

### Changed

- yamux: Switch to upstream implementation while keeping the controller
API ([#320](#320))
- req-resp: Replace SubstreamSet with FuturesStream
([#321](#321))
- cargo: Bring up to date multiple dependencies
([#324](#324))
- build(deps): bump hickory-proto from 0.24.1 to 0.24.3
([#323](#323))
- build(deps): bump openssl from 0.10.66 to 0.10.70
([#322](#322))

### Fixed

- websocket/stream: Fix unexpected EOF on `Poll::Pending` state
poisoning ([#327](#327))
- websocket/stream: Avoid memory allocations on flushing
([#325](#325))
- multistream-select: Enforce `io::error` instead of empty protocols
([#318](#318))
- multistream: Do not wait for negotiation in poll_close
([#319](#319))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Feb 20, 2025
This PR updates litep2p to version 0.9.1. The yamux config is entirely
removed to mirror the libp2p yamux upstream version.
While at it, I had to bump indexmap and URL as well. 


## [0.9.1] - 2025-01-19

This release enhances compatibility between litep2p and libp2p by using
the latest Yamux upstream version. Additionally, it includes various
improvements and fixes to boost the stability and performance of the
WebSocket stream and the multistream-select protocol.

### Changed

- yamux: Switch to upstream implementation while keeping the controller
API ([#320](paritytech/litep2p#320))
- req-resp: Replace SubstreamSet with FuturesStream
([#321](paritytech/litep2p#321))
- cargo: Bring up to date multiple dependencies
([#324](paritytech/litep2p#324))
- build(deps): bump hickory-proto from 0.24.1 to 0.24.3
([#323](paritytech/litep2p#323))
- build(deps): bump openssl from 0.10.66 to 0.10.70
([#322](paritytech/litep2p#322))

### Fixed

- websocket/stream: Fix unexpected EOF on `Poll::Pending` state
poisoning ([#327](paritytech/litep2p#327))
- websocket/stream: Avoid memory allocations on flushing
([#325](paritytech/litep2p#325))
- multistream-select: Enforce `io::error` instead of empty protocols
([#318](paritytech/litep2p#318))
- multistream: Do not wait for negotiation in poll_close
([#319](paritytech/litep2p#319))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
github-actions bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Feb 20, 2025
This PR updates litep2p to version 0.9.1. The yamux config is entirely
removed to mirror the libp2p yamux upstream version.
While at it, I had to bump indexmap and URL as well.

## [0.9.1] - 2025-01-19

This release enhances compatibility between litep2p and libp2p by using
the latest Yamux upstream version. Additionally, it includes various
improvements and fixes to boost the stability and performance of the
WebSocket stream and the multistream-select protocol.

### Changed

- yamux: Switch to upstream implementation while keeping the controller
API ([#320](paritytech/litep2p#320))
- req-resp: Replace SubstreamSet with FuturesStream
([#321](paritytech/litep2p#321))
- cargo: Bring up to date multiple dependencies
([#324](paritytech/litep2p#324))
- build(deps): bump hickory-proto from 0.24.1 to 0.24.3
([#323](paritytech/litep2p#323))
- build(deps): bump openssl from 0.10.66 to 0.10.70
([#322](paritytech/litep2p#322))

### Fixed

- websocket/stream: Fix unexpected EOF on `Poll::Pending` state
poisoning ([#327](paritytech/litep2p#327))
- websocket/stream: Avoid memory allocations on flushing
([#325](paritytech/litep2p#325))
- multistream-select: Enforce `io::error` instead of empty protocols
([#318](paritytech/litep2p#318))
- multistream: Do not wait for negotiation in poll_close
([#319](paritytech/litep2p#319))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
(cherry picked from commit 42e9de7)
teor2345 pushed a commit to autonomys/polkadot-sdk that referenced this pull request Mar 24, 2025
…ech#7640) - conflicts resolved using --strategy-option=theirs

This PR updates litep2p to version 0.9.1. The yamux config is entirely
removed to mirror the libp2p yamux upstream version.
While at it, I had to bump indexmap and URL as well.

This release enhances compatibility between litep2p and libp2p by using
the latest Yamux upstream version. Additionally, it includes various
improvements and fixes to boost the stability and performance of the
WebSocket stream and the multistream-select protocol.

- yamux: Switch to upstream implementation while keeping the controller
API ([paritytech#320](paritytech/litep2p#320))
- req-resp: Replace SubstreamSet with FuturesStream
([paritytech#321](paritytech/litep2p#321))
- cargo: Bring up to date multiple dependencies
([paritytech#324](paritytech/litep2p#324))
- build(deps): bump hickory-proto from 0.24.1 to 0.24.3
([paritytech#323](paritytech/litep2p#323))
- build(deps): bump openssl from 0.10.66 to 0.10.70
([paritytech#322](paritytech/litep2p#322))

- websocket/stream: Fix unexpected EOF on `Poll::Pending` state
poisoning ([paritytech#327](paritytech/litep2p#327))
- websocket/stream: Avoid memory allocations on flushing
([paritytech#325](paritytech/litep2p#325))
- multistream-select: Enforce `io::error` instead of empty protocols
([paritytech#318](paritytech/litep2p#318))
- multistream: Do not wait for negotiation in poll_close
([paritytech#319](paritytech/litep2p#319))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit that referenced this pull request May 22, 2025
This PR greatly improves the WebSocket connection stability by relying
on the interval buffers of tungstenite instead of buffering at a higher
level. The fix passes through the messages to the tungstenite socket
directly.

This is a long-lasting issue (reproducible on all older versions
silently with IO errors) that manifested as a decryption error after the
state fixes:
- #325
- #327

Issue context:
- node is under stress due to handling multiple substreams
- the issue affected only long running WebSocket substreams and
manifested as an IO error from crypto/noise decoding
- tungstenite `WebSocketStream` already has a 128KiB buffer for writing
- litep2p has a **redundant** 8 KiB buffer for writing 
- litep2p buffered internally multiple packets, tunstenite accepted the
batch. I expect this creates a wrongly framed packet that fails to
decode at the crypto/noise level


## Investigation

We have noted several errors that manifested as crypto/nosie decoding
failures on our Kusama validators:
- paritytech/polkadot-sdk#8525

```rust
litep2p::crypto::noise: failed to decrypt message error=Decrypt
```

Upon further investigation, the errors affected only WebSocket
connections. The issue could be reproduced by running a local node in
Kusama with more than 500 peers in and out. As well as running
subp2p-explorer with adjusted protocols:

```yaml
2025-05-15T14:58:08.095961Z ERROR {peer_id=peer_id=12D3KooWGsDvWrbApFTCpF8h7YCKHuvJbok6HAq5ZnPgE9LGWnsv}:
litep2p::crypto::noise: failed to decrypt message for bigger buffers error=Decrypt peer=PeerId("12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu")

2025-05-15T14:58:08.096419Z DEBUG 
{peer_id=peer_id=12D3KooWGsDvWrbApFTCpF8h7YCKHuvJbok6HAq5ZnPgE9LGWnsv}:
litep2p::websocket::connection: connection closed with error peer=PeerId("12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu") error=Decode(Io(Custom { kind: Other, error: "failed to decrypt message bigger buffers: decrypt error 12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu" }))
```



The issue also reproduced on the zombinet PR, which uses litep2p:
- paritytech/polkadot-sdk#8461

```yaml
2025-05-14 09:37:30.805  INFO tokio-runtime-worker sync: Warp sync is complete, continuing with state sync.    

2025-05-14 09:37:33.189 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt
2025-05-14 09:37:33.283 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt
2025-05-14 09:37:34.764 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt
	
2025-05-14 09:37:35.656  INFO tokio-runtime-worker substrate: ⚙️  State sync, Downloading state, 22%, 2.21 Mib (0 peers), best: #0 (0xc5e7…d059), finalized #0 (0xc5e7…d059), ⬇ 707.8kiB/s ⬆ 0.5kiB/s    
	
2025-05-14 09:37:40.657  INFO tokio-runtime-worker substrate: ⚙️  State sync, Downloading state, 22%, 2.21 Mib (3 peers), best: #0 (0xc5e7…d059), finalized #0 (0xc5e7…d059), ⬇ 1.0kiB/s ⬆ 1.0kiB/s    
```

## Testing Done

### Performance

Tested the performance with litep2p-perf using the following branch:
-
https://github.com/lexnv/litep2p-perf/compare/lexnv/websocket-tests?expand=1

| Status     | Data Size | Time (s) | Bandwidth (Mbit/s) |
|------------|-----------|----------|-------------------|
| **Before** |           |          |                   |
| Uploaded   | 256.00 MiB| 15.1152  | 135.49            |
| Downloaded | 256.00 MiB| 13.2296  | 154.80            |
| **After**  |           |          |                   |
| Uploaded   | 256.00 MiB| 15.7178  | 130.30            |
| Downloaded | 256.00 MiB| 13.2435  | 154.64            |

From the performance table, we are within 3% of the original buggy
implementation. I would lean towards a normal variation in our results.
Therefore, the performance remains unimpacted.

### Repro Case

Have added a custom user protocol as part of our testing to filter out
these errors.

- The protocol opens 16 outbound substreams on the connection
established event. Therefore, it will handle 16 outbound substreams and
16 inbound substreams
- The outbound substreams will push a configurable number of packets,
each of size 128 bytes, to the remote peer. While the inbound substreams
will read the same number of packets from the remote peer.
 
Before this PR, the TCP was unaffected and the websocket reproduces the
decrypt failure. After this PR, the test passes.


Closes: paritytech/polkadot-sdk#8525

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants