Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge RAM consumption in DSLX interpreter #1897

Open
rw1nkler opened this issue Jan 30, 2025 · 5 comments
Open

Huge RAM consumption in DSLX interpreter #1897

rw1nkler opened this issue Jan 30, 2025 · 5 comments

Comments

@rw1nkler
Copy link
Contributor

Describe the bug

Running a DSLX interpreter can consume a huge amount of RAM for larger designs like ZSTD decoder.
Additionally, we noticed that the resources are not released between different test cases, and RAM consumption increases steadily over time.

To Reproduce

Steps to reproduce the behavior:

  1. Checkout to the zstd_compressed_block_dec branch
  2. Run any program for monitoring RAM usage (top, htop)
  3. Run bazel run -- //xls/modules/zstd:zstd_dec_dslx_test --logtostderr
  4. Observe growing RAM consumption

Expected behavior

The interpreter should not consume that much RAM on larger designs.
Ideally, for a correct design that reads data from all its channel queues, it should be possible to run the DSLX interpreter simulation infinitely with a (more or less) constant RAM consumption.

I will try to provide more debug/profiling information and append to this issue.

@rw1nkler
Copy link
Contributor Author

rw1nkler commented Jan 31, 2025

It seems that the BytecodeInterpreter::EvalCreateTuple is responsible for huge allocation of memory.

Image

It can be seen that the interpreter allocates data linearly:
Image

I run the code build with sanitizers (--config=asan) but they haven't returned any additional information.

@allight
Copy link
Collaborator

allight commented Jan 31, 2025

Looks like the elements array is uselessly copied instead of moved.

@allight
Copy link
Collaborator

allight commented Jan 31, 2025

Yeah making the vector and elements be moved seems to fix this:

Image

@rw1nkler
Copy link
Contributor Author

rw1nkler commented Feb 3, 2025

Thank you @allight for reducing the number of re-allocations. However, the overall RAM consumption is similar to the previous values. Correct me if I'm wrong, but to my understanding, the move prevents the creation of copies when operating on theelements vector, but previously, the temporary vector was freed anyway and should not contribute to the overall RAM consumption.

I wonder why the memory is growing linearly, and why it's not freed between running different test cases.

@ericastor ericastor reopened this Feb 3, 2025
@allight
Copy link
Collaborator

allight commented Feb 10, 2025

Sorry I was on vacation so I didn't see this.

I think that this is just WAI (or at least working-as-implemented. At a minimum I don't see any obvious places we are leaking things.

The interpreter needs to make copies of values for many operations to avoid having to implement a full GC. Also as long as the values are relatively small the overhead for a gc would be much greater than the overhead of copying.

What I think is happening here is just simply that the zstd decoder ends up creating a lot of values which due to the nature of the bytecode interpreter are not deduped and so exist all over the place.

Future work could be to implement a real Ref-cnt or gc system in the interpreter though again the overhead this could cause might make it just not worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants