Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: Investigate high memory consumption for long-running node #4927

Open
lexnv opened this issue Jul 2, 2024 · 2 comments
Open

network: Investigate high memory consumption for long-running node #4927

lexnv opened this issue Jul 2, 2024 · 2 comments

Comments

@lexnv
Copy link
Contributor

lexnv commented Jul 2, 2024

2 Kusama nodes were started on 23th April and left running.

{__name__="substrate_build_info", chain="ksmcc3", instance="localhost:9615", job="substrate_node", name="gray-vase-1131", version="1.10.0-1a45bd88348"}

The commit was based on a2a049d, from branch:

These are the same nodes as: #4925

Extracted metrics: metrics.txt

ps -eo size,pid,user,start,command --sort -size | grep polka                                                                                                                              

37124040 283194 ubuntu   Apr 23 ./target/release/polkadot -d /home/ubuntu/workspace/Kusama-db-full --chain kusama --port 30355 --pruning=1000 --network-backend litep2p --detailed-log-output

32125892 283378 ubuntu   Apr 23 ./target/release/polkadot -d /home/ubuntu/workspace/kusama-db --chain kusama --pruning=1000 --sync=warp --network-backend litep2p --detailed-log-output

# Comparing to a freshly started node
4649592 3017641 ubuntu 10:38:35 ./target/release/polkadot --prometheus-port 9616 --port 30344

The long-running nodes are roughly consuming 36253.95 Mb and 31372.94 Mb compared to a freshly started node4540.62 Mb

Metrics

CPU metric usage is consumed almost entirely by the network-worker and "libp2p-node" metrics.
(note this is using the litep2p backend).

The "libp2p" metric has almost 38/40 running tasks running at a time.

Total network inbound: 2.6 TiB
Total network outbound 806 GiB
Node oscillated between 0 and 1 syncing peer.

Mpsc_import_notification_stream is the only channel with 160 messages queued (dashboard might be wrong).
Chain_sync and Network_worker are sending messages, with occasional peer-set, network-gossip and transactions-handler-sync.

@lexnv
Copy link
Contributor Author

lexnv commented Jul 2, 2024

Triaging local logs

Count      | Triage report
160155     | Notification block pinning limit reached. Unpinning block with hash = .*
2843       | 🥩 Error: .*. Restarting voter.
775        | .* banned, disconnecting, reason: .*
770        | 💔 Error importing block .*: .*
273        | \(offchain call\) Error submitting a transaction to the pool: .*
171        | Detected prevote equivocation in the finality worker: .*
102        | Detected precommit equivocation in the finality worker: .*
95         | ❌ Error while dialing .*: .*
42         | 🥩 ran out of peers to request justif #.* from
20         | Re-finalized block #.* \(.*\) in the canonical chain, current best finalized is #.*
10         | 💔 Called `on_validated_block_announce` with a bad peer ID .*
2          | Block import error: .*

Unkown
140  |  litep2p::ipfs::identify: inbound identify substream opened for peer who doesn't exist peer=PeerId(\"12D3KooWF3PWbXdGEuT35nBh3MgECtxnHng3s5c5QKapoDZMy38z\") protocol=/ipfs/id/1.0.0
4 | sync: 💔 Ignored block (#22873601 -- 0x649e…eab2) announcement from 12D3KooWBDbBuoE4umuzJnZcUouT4GY6n31BRWHXdAFsThjTKrug because all validation slots for this peer are occupied.

@lexnv
Copy link
Contributor Author

lexnv commented Jul 9, 2024

A similar behavior can be seen with libp2p backend:

  • libp2p backend: 27889.92 Mb reported by ps
  • litep2p backend: 20625.88 Mb reported by ps
ps -eo size,pid,user,start,command --sort -size | grep polka                                                                                                                                                              Tue Jul  9 10:43:29 2024
28559276 473683 ubuntu   Jul 05 ./target/release/polkadot -d /home/ubuntu/workspace/kusama-db-libp2p --chain kusama --in-peers 50 --out-peers 50 --pruning=1000 --sync=warp --network-backend libp2p --prometheus-port 9616 --detailed-log-output
21120904 472124 ubuntu   Jul 05 ./target/release/polkadot -d /home/ubuntu/workspace/kusama-db-litep2p --chain kusama --pruning=1000 --in-peers 50 --out-peers 50 --sync=warp --network-backend litep2p --detailed-log-output

Considering that the node was not terminated after ~90 days by OOM killer, and litep2p backend consumes less memory than libp2p I would treat this issue with a lower priority for now.

@lexnv lexnv changed the title network/litep2p: Investigate high memory consumption for long-running node network: Investigate high memory consumption for long-running node Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant