-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: Refactor base/random to avoid memory leak #1945
Conversation
Two things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this does not specifically involve a problem with the GPU, but rather VEGA is just being used to demonstrate a specific problem. So I'll let @powerjg lead the review, but let me know if you need something from me.
if (instances == nullptr) | ||
instances = new Instances(); | ||
|
||
auto randpoint = r ? std::shared_ptr<Random>(r, [](Random*){}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a call to the constructor with a deleter? I think this line deserves some documentation of why we provide a nop deleter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BobbyRBruce make sure you address this concern :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR as it currently stands doesn't provide a nop deleter (I did in a previous iteration but I simplified it). However I have added a comment explaining this if-else setting of randpoint
to alleviate any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is purpose of the second argument to the constructor of shared_ptr
here? It looks like an empty deleter, but maybe I'm not understanding correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lambda function which servers as the deletion operator pointer. The purpose here is for the case where the deletion of the pointer is managed elsewhere, which in this case it is as this is.
@BobbyRBruce can we get this in soon? |
There was a memory leak occuring prior to this fix which can be reproduced with: ```sh scons build/VEGA_X86/gem5.opt -j12 `nproc` /usr/bin/time -l ./build/VEGA_X86/gem5.opt configs/example/ruby_gpu_random_test.py --WB_L2 --test-length 5000000 --num-dmas 0 ``` The `/usr/bin/time -l` will output the maximum memory consumption of this process. On my machines It was getting to 27GB. With this change gem5 now peaks at around 6GB of memory usage. This incorporates two changes: 1) Instances of `static RandomPtr` were not creating singular RNG instances like assumed. Ergo in casees where this was depended upin to stop the over creation of RNGs, multiple were being creates. The `RandomPtr` object was static but the RNG it pointed towards was always changing on every use. 2) The RNG instance list has been refactored to avoid accidental inclusion of the same RNG more than once and better checks have been added as to explicitly remove RNGs from this instance list when they are deleted. Tests updated accordingly.
This commit attempts to fix the compilation failure by restoring an overloaded function that was previously deleted.
b542425
to
42cd6da
Compare
42cd6da
to
61cb45f
Compare
if (instances == nullptr) | ||
instances = new Instances(); | ||
|
||
auto randpoint = r ? std::shared_ptr<Random>(r, [](Random*){}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BobbyRBruce make sure you address this concern :)
I think i addressed all the comments and consider this PR ready for final review and merging. This should fix the Daily Tests. |
There was a memory leak occurring prior to this fix which can be reproduced with: ```sh scons build/VEGA_X86/gem5.opt -j `nproc` /usr/bin/time -l ./build/VEGA_X86/gem5.opt configs/example/ruby_gpu_random_test.py --WB_L2 --test-length 5000000 --num-dmas 0 ``` The `/usr/bin/time -l` will output the maximum memory consumption of this process. On my machines It was getting to 27GB. With this change gem5 now peaks at around 6GB of memory usage. This incorporates two changes: 1) Instances of `static RandomPtr` were not creating singular RNG instances like assumed. Ergo in casees where this was depended upon to stop the over creation of RNGs, multiple were being created. The `RandomPtr` object was static but the RNG it pointed towards was always changing on every use. This has been fixed. 2) The RNG instance list has been refactored to avoid accidental inclusion of the same RNG more than once and better checks have been added as to explicitly remove RNGs from this instance list when they are deleted. Tests updated accordingly. --------- Co-authored-by: Erin Le <ejle@ucdavis.edu>
There was a memory leak occurring prior to this fix which can be reproduced with:
scons build/VEGA_X86/gem5.opt -j `nproc` /usr/bin/time -l ./build/VEGA_X86/gem5.opt configs/example/ruby_gpu_random_test.py --WB_L2 --test-length 5000000 --num-dmas 0
The
/usr/bin/time -l
will output the maximum memory consumption of this process. On my machines It was getting to 27GB.With this change gem5 now peaks at around 6GB of memory usage.
This incorporates two changes:
static RandomPtr
were not creating singular RNGinstances like assumed. Ergo in casees where this was depended
upon to stop the over creation of RNGs, multiple were being
created. The
RandomPtr
object was static but the RNG it pointedtowards was always changing on every use. This has been fixed.
inclusion of the same RNG more than once and better checks have
been added as to explicitly remove RNGs from this instance list
when they are deleted.
Tests updated accordingly.