Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitswap/client: memory leak on BlockPresenceManager #574

Open
Dreamacro opened this issue Feb 1, 2024 · 5 comments
Open

bitswap/client: memory leak on BlockPresenceManager #574

Dreamacro opened this issue Feb 1, 2024 · 5 comments
Assignees
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up

Comments

@Dreamacro
Copy link
Contributor

Recently, I observed that when using bitswap for fetching data, there is a large amount of memory consumption and no release after downloading.

I use pprof and find most heap is occupied here.

bpm.presence[c] = make(map[peer.ID]bool)

I try to print stat after bpm.presence[c] = make(map[peer.ID]bool) and I found bpm.presence would delete on BlockPresenceManager.RemoveKeys, so I added some logs.

I found that there is a race condition problem here.

  1. BlockPresenceManager.RemoveKeys has been called and remove the CidA
  2. BlockPresenceManager.ReceiveFrom has been called and created bpm.presence[CidA]
  3. bpm.presence[CidA] is no longer used or removed, and the map is getting bigger and bigger.
@Dreamacro Dreamacro added the need/triage Needs initial labeling and prioritization label Feb 1, 2024
Copy link

welcome bot commented Feb 1, 2024

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@aschmahmann aschmahmann added need/analysis Needs further analysis before proceeding kind/bug A bug in existing code (including security flaws) and removed need/triage Needs initial labeling and prioritization labels Feb 6, 2024
@aschmahmann
Copy link
Contributor

@Dreamacro can you attach or link to a pprof dump for analysis? Also, which versions of boxo and go-libp2p are you using?

@aschmahmann aschmahmann added the P1 High: Likely tackled by core team if no one steps up label Feb 6, 2024
@hacdias hacdias moved this to 🥞 Todo in IPFS Shipyard Team Feb 6, 2024
@Dreamacro
Copy link
Contributor Author

Dreamacro commented Feb 7, 2024

@aschmahmann

I first provide a few fragments of pprof/heap. I can confirm a memory leak here because the leakage disappears when I replace map[cid.Cid]map[peer.ID]bool with a cache with time. Dreamacro@df0a587

heap fragments: (Although boxo 0.15 is shown below, it has been upgraded to boxo 0.17)

258: 1387008 [2230: 11988480] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

109: 1185920 [3482: 37884160] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

180: 483840 [1333: 3583104] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

@hacdias hacdias removed their assignment May 27, 2024
@gammazero gammazero self-assigned this Jun 22, 2024
gammazero added a commit that referenced this issue Jul 8, 2024
Limit BlockPresenceManager map growth be doing the following:

- Use nil map when BlockPresenceManager CID map is empty.
- Delete peer map when from CID map when peer map is empty.
- Remove peers from BlockPresenceManager when peers are pruned from session.

This allows GC to free memory when maps in BlockPresenceManager become empty.

Additional improvement: Do not look for lists of keys or peers in empty maps; return early instead.

Fix for issue #574
gammazero added a commit that referenced this issue Jul 8, 2024
Limit BlockPresenceManager map growth be doing the following:

- Use nil map when BlockPresenceManager CID map is empty.
- Delete peer map when from CID map when peer map is empty.
- Remove peers from BlockPresenceManager when peers are pruned from session.

This allows GC to free memory when maps in BlockPresenceManager become empty.

Additional improvement: Do not look for lists of keys or peers in empty maps; return early instead.

Fix for issue #574
@gammazero
Copy link
Contributor

gammazero commented Jul 8, 2024

@Dreamacro PR #636 is a simple code modification that should help with unbounded map growth observed in this issue. The change allows GC to free memory when the BlockPresenceManager map becomes empty, and removes peers from BlockPresenceManager that have sent repeated DONT_HAVEs. This is a smaller scoped change than using an LRU cache as you have done in the example fix you provided, and I would be interested in getting your feedback as to whether it is sufficient for your test case.

@Dreamacro
Copy link
Contributor Author

@gammazero I can't quite confirm if this PR fixes the memory leak or not. The root cause of the problem is that after SessionManager.cancelWants is called, there may still be nodes to send HAVE or DONT_HAVE in the future, and it is handled by ReceiveFrom finally.

I'm not sure if the changes here can fix the root cause of the memory leak, if HAVE/DONT_HAVE is no longer processed after the prune node, the problem should be fixed

lidel pushed a commit that referenced this issue Jul 26, 2024
* Fix memory leak on BlockPresenceManager

Limit BlockPresenceManager map growth be doing the following:

- Use nil map when BlockPresenceManager CID map is empty.
- Delete peer map when from CID map when peer map is empty.
- Remove peers from BlockPresenceManager when peers are pruned from session.

This allows GC to free memory when maps in BlockPresenceManager become empty.

Additional improvement: Do not look for lists of keys or peers in empty maps; return early instead.

Fix for issue #574
wenyue pushed a commit to wenyue/boxo that referenced this issue Oct 17, 2024
* Fix memory leak on BlockPresenceManager

Limit BlockPresenceManager map growth be doing the following:

- Use nil map when BlockPresenceManager CID map is empty.
- Delete peer map when from CID map when peer map is empty.
- Remove peers from BlockPresenceManager when peers are pruned from session.

This allows GC to free memory when maps in BlockPresenceManager become empty.

Additional improvement: Do not look for lists of keys or peers in empty maps; return early instead.

Fix for issue ipfs#574
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

4 participants