-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge RAM consumption in DSLX interpreter #1897
Comments
Looks like the elements array is uselessly copied instead of moved. |
Thank you @allight for reducing the number of re-allocations. However, the overall RAM consumption is similar to the previous values. Correct me if I'm wrong, but to my understanding, the I wonder why the memory is growing linearly, and why it's not freed between running different test cases. |
Sorry I was on vacation so I didn't see this. I think that this is just WAI (or at least working-as-implemented. At a minimum I don't see any obvious places we are leaking things. The interpreter needs to make copies of values for many operations to avoid having to implement a full GC. Also as long as the values are relatively small the overhead for a gc would be much greater than the overhead of copying. What I think is happening here is just simply that the zstd decoder ends up creating a lot of values which due to the nature of the bytecode interpreter are not deduped and so exist all over the place. Future work could be to implement a real Ref-cnt or gc system in the interpreter though again the overhead this could cause might make it just not worth it. |
Describe the bug
Running a DSLX interpreter can consume a huge amount of RAM for larger designs like ZSTD decoder.
Additionally, we noticed that the resources are not released between different test cases, and RAM consumption increases steadily over time.
To Reproduce
Steps to reproduce the behavior:
top
,htop
)bazel run -- //xls/modules/zstd:zstd_dec_dslx_test --logtostderr
Expected behavior
The interpreter should not consume that much RAM on larger designs.
Ideally, for a correct design that reads data from all its channel queues, it should be possible to run the DSLX interpreter simulation infinitely with a (more or less) constant RAM consumption.
I will try to provide more debug/profiling information and append to this issue.
The text was updated successfully, but these errors were encountered: