-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix(core): don't hold on Data #3926
Conversation
b798cab
to
445992a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this PR can be closed in favor of https://github.com/celestiaorg/celestia-node/pull/3915/files#r1834055837
The idea is to make hotfix for this in a nearest patch, as releasing grpc in core and here is gonna take a bunch of time. |
Makes sense 👍 Then I guess applying the change in #3926 (comment) is better to avoid messing with memory ourselves. Instead, make the copy and let the GC do its job without us forcing the freeing of memory. Also, GC might collect additional objects that might be linked to SignedBlock instead of collecting just the Data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per #3926 (comment), I am okay with this hotfix until @rach-id 's solution lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also okay with the explicit approach in #3926 (comment)
@rach-id, I applied your suggestion to get you as a coauthor on the PR. It doesn't matter which solution to go with for the hotfix |
ValidatorSet
and Commit
to not hold on Data
Well, the suggestion has a bug. It does not copy the Copying every field of |
You dont need to copy the raw header. Did you try running it and see the memory usage? |
a2562db
to
445992a
Compare
You need to copy it as well, because it is also a value that we create pointer to, keeping the |
If I remember correctly, I tried that change and ran a bridge node, and it didn't have the issue. Could you double check please in case I missed something when running? |
Well, this is how memory and pointers work. We don't need to test that. |
But I also simply don't have time to run the test during the devcon, and we should go with the solution we know works well, which is setting empty data. |
ValidatorSet
and Commit
to not hold on Data
The `Data` field of `core.ResultSignedBlock` was retained even after we constructed the respective header and moved on. This was due to the header consisting of pointers pointing to fields of `core.ResultSignedBlock`, retaining the whole structure, including the `Data` field. A BN, by default, has a header store cache of size 4096; thus, holding on `Data` with 8 MB blocks took around 32GiB of RAM. Initially, we believed that the issue was with JSON unmarshalling. However, we were wrong about this being the whole story, as while profiles did confirm that the allocations did originate there, the allocated data wasn't cleaned up. Kudos to @rach-id for helping us figuring this out! This is a hotfix and is meant to be replaced with a better solution described on TODO. Also, the origin of this bug is yet to be confirmed by @rach-id by testing RAM usage with disabled header cache. --- 🎱mb certified
The
Data
field ofcore.ResultSignedBlock
was retained even after we constructed the respective header and moved on. This was due to the header consisting of pointers pointing to fields ofcore.ResultSignedBlock
, retaining the whole structure, including theData
field. A BN, by default, has a header store cache of size 4096; thus, holding onData
with 8 MB blocks took around 32GiB of RAM.Initially, we believed that the issue was with JSON unmarshalling. However, we were wrong about this being the whole story, as while profiles did confirm that the allocations did originate there, the allocated data wasn't cleaned up. Kudos to @rach-id for helping us figuring this out!
This is a hotfix and is meant to be replaced with a better solution described on TODO. Also, the origin of this bug is yet to be confirmed by @rach-id by testing RAM usage with disabled header cache.
🎱mb certified