-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Hi, thanks a lot for your input. We already have an ASAN setup ready in CI, but it is set as non blocking: https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L409 I'd recommend to enable the blocking for memory leaks and then incrementally work in your PR to address these leaks until CI is green. What do you think? |
Yep, agree with the assessment here. If we can ensure properly shutdown we can re-enable these frees and fix these ASAN reports. I also wonder if we can't use smart pointers to manage the lifecycle of the pools so that we don't have to be so careful when shutting down threadpools / engine threads. |
@arcadiaphy Thanks for the contribution! @mxnet-label-bot add [pr-awaiting-review, C API] |
@leleamol Ping for review! |
@marcoabreu Yep, let me have a look on the ASAN tests. |
@marcoabreu @KellenSunderland
To fully address the 1st leak is hard: many memories are hidden in global/thread_local singleton, so it's related to enforce a correct destruction order on static variables (the engine is also a static variable) and make engine to wait for unfinished operations. Actually, I don't find any mechanisms concerning static variable destruction order in mxnet, so I wonder why there is no problem in usage. A easy workaround is to use naive engine in ASAN tests, if leaks still exists, then it's definitely the 2nd leak and we must fix them. Correct me if I'm wrong, especially on static variable part. |
023e1d8
to
3faa584
Compare
Description
When detecting memory leaks on c++ inference code using ASAN, I have found that almost all of the leaks in ASAN reports come from unreleased memory in object pool. The free code is deliberately commented out to avoid program crash from accessing too early destructed objects in global singletons.
The main problem of object pool is fixed in #312, maybe there are still some underlying issues. I have re-added the free operation, and experienced no problems in several weeks' usage.
I think the correct way is to just let problems happen, then we can fix them to approach leak-free codes.
Since issues like #13265 are reported, some ASAN tests are suggested to be added in CI to assure absolutely no leaks in c++ interface.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments