Huge RAM consumption in DSLX interpreter #1897

rw1nkler · 2025-01-30T13:20:02Z

Describe the bug

Running a DSLX interpreter can consume a huge amount of RAM for larger designs like ZSTD decoder.
Additionally, we noticed that the resources are not released between different test cases, and RAM consumption increases steadily over time.

To Reproduce

Steps to reproduce the behavior:

Checkout to the zstd_compressed_block_dec branch
Run any program for monitoring RAM usage (top, htop)
Run bazel run -- //xls/modules/zstd:zstd_dec_dslx_test --logtostderr
Observe growing RAM consumption

Expected behavior

The interpreter should not consume that much RAM on larger designs.
Ideally, for a correct design that reads data from all its channel queues, it should be possible to run the DSLX interpreter simulation infinitely with a (more or less) constant RAM consumption.

I will try to provide more debug/profiling information and append to this issue.

The text was updated successfully, but these errors were encountered:

rw1nkler · 2025-01-31T15:45:21Z

It seems that the BytecodeInterpreter::EvalCreateTuple is responsible for huge allocation of memory.

It can be seen that the interpreter allocates data linearly:

I run the code build with sanitizers (--config=asan) but they haven't returned any additional information.

allight · 2025-01-31T19:11:54Z

Looks like the elements array is uselessly copied instead of moved.

allight · 2025-01-31T20:22:19Z

Yeah making the vector and elements be moved seems to fix this:

rw1nkler · 2025-02-03T10:20:30Z

Thank you @allight for reducing the number of re-allocations. However, the overall RAM consumption is similar to the previous values. Correct me if I'm wrong, but to my understanding, the move prevents the creation of copies when operating on theelements vector, but previously, the temporary vector was freed anyway and should not contribute to the overall RAM consumption.

I wonder why the memory is growing linearly, and why it's not freed between running different test cases.

allight · 2025-02-10T20:05:10Z

Sorry I was on vacation so I didn't see this.

I think that this is just WAI (or at least working-as-implemented. At a minimum I don't see any obvious places we are leaking things.

The interpreter needs to make copies of values for many operations to avoid having to implement a full GC. Also as long as the values are relatively small the overhead for a gc would be much greater than the overhead of copying.

What I think is happening here is just simply that the zstd decoder ends up creating a lot of values which due to the nature of the bytecode interpreter are not deduped and so exist all over the place.

Future work could be to implement a real Ref-cnt or gc system in the interpreter though again the overhead this could cause might make it just not worth it.

copybara-service bot closed this as completed in 432f5e6 Jan 31, 2025

ericastor reopened this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge RAM consumption in DSLX interpreter #1897

Huge RAM consumption in DSLX interpreter #1897

rw1nkler commented Jan 30, 2025

rw1nkler commented Jan 31, 2025 •

edited

Loading

allight commented Jan 31, 2025

allight commented Jan 31, 2025

rw1nkler commented Feb 3, 2025

allight commented Feb 10, 2025

Huge RAM consumption in DSLX interpreter #1897

Huge RAM consumption in DSLX interpreter #1897

Comments

rw1nkler commented Jan 30, 2025

rw1nkler commented Jan 31, 2025 • edited Loading

allight commented Jan 31, 2025

allight commented Jan 31, 2025

rw1nkler commented Feb 3, 2025

allight commented Feb 10, 2025

rw1nkler commented Jan 31, 2025 •

edited

Loading