-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1533: TestSnappy() throws OOM exception with Parquet-1485 change #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@liupc, could you take a look? |
… like it was removed from the decompressor
|
@gszadovszky |
|
The relation between The memory consumption of parquet-mr can be estimated quite precisely based on the page and row-group size. For reading/writing, parquet-mr has to keep the current row-group in memory so the memory consumption is around the row-group size which default is Parquet compress works at page level which default size is It is also my mistake that I haven't discovered this problem during the code review. |
|
@gszadovszky Thanks for explaining, I didn't notice that the extra 128MiB would causes OOMs, because in our spark cluster, the default memoryOverhead(aka the directMemory) minimum is 384M. Yes, it can causes such memory issues when the environment is limited for directMemory, for as what I known, the JVM maximum size of directMemory is 0.8 * XmxSize if no The original idea of these codes is to preallocate memory for compression/decompression, for the So I think maybe we can keep the reset logic and remove the preallocation codes. |
|
|
||
| @Override | ||
| public synchronized void reset() { | ||
| if (inputBuffer.capacity() > initialBufferSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can keep these codes in the reset call and lower the initialBufferSize to 4MiB or else, as event without the preallocation of 128MiB, the inputBuffer and writeBuffer can grow very large if we just see it from the API itself, because the setInput can be called many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see why the buffers would grow that much. We compress/decompress one page at a time so it should not grow much larger than the largest page which is usually much smaller than 64M. By default it is 1M.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know 4M is a good number to be. If multiple compressors/decompressors are instantiated in parallel, the number can go up. Let's say 20 decompressors and each initialized with 2*4M, that results in 160M direct memory. That could be a problem for some applications. I agree with liupc that "keep the reset logic and remove the preallocation codes" unless we see a huge penalty for not doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you instantiates 20 decompressors then you probably tries to read 20 parquet files in the same time in the same jvm. It means you'll need tons of memory to keep 20 row-groups in the memory. Comparing to that 160M would not be too much I guess.
Meanwhile, I am not against keeping the reset logic but what would be the size to be reset to? 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about vectorized parquet reader? Does it instantiate the decompressor per column? I am not an expert of Parquet, so I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shangxinli
I think it's up to the implementation, either instantiating the decompressor per column or not can do this work. If you keep so many column compressor/decompressor in memory, then the memory consumed by the column data itself is much larger than 4MiB.
What's more, I suggest 4MiB here because the compress/decompress buffer size is 4MiB now. so I think it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default page size is 1M(assume the typical usage is default), 4M still seems too large. It will be adjusted later anyway if it is smaller than needs. I would prefer low over high.
|
|
||
| @Override | ||
| public synchronized void reset() { | ||
| if (inputBuffer.capacity() > initialBufferSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comments.
|
@liupc, During re-thinking your code modification (and mine as well) I think, there is another potential memory leak. This is when the |
A small correction: My OOM is on MAC book Pro with OSX 10.14.1. Regarding the relation of -Xmx and directory memory size, I think it is initialized as 64m arbitrarily and then reset, but the reset is tricky. If -XX:MaxDirectMemorySize is set, then it is reset to this size. Otherwise it will be Runtime.getRuntime().maxMemory() which is impacted by -Xmx a lot. Here is the jdk code https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/VM.java#L186. You can check the comments of the code between line 180-188 In production, I only see once out of 50+ Java applications/services which explicitly set MaxDirectMemorySize. In most cases, developers don't set it in the arguments. |
|
Thanks, @shangxinli for the explanation. However you are setting the maximum direct memory size or leave it with the default I think, PS. I've no idea where I got "Windows" :) |
Yes. I agree. |
|
@gszadovszky
|
@gszadovszky Yes, I think it's a potential leak. If the |
|
About the initial size: We are handling pages here meaning that the buffer size should never be much larger than the largest page (reading or writing). A page is usually about About the reset: It might make sense to have a maximum value and if the buffers were grown larger than this value we might reset it back to some value. Based on the logic above this value might be 0 as well. However, it would not solve any possible OOM as if we need several compressors/decompressors in the same time (e.g. writing several parquet files in the same time or vectorization which I don't have deep knowledge about as it is implemented outside of parquet-mr) we would not reset them until all the reading/writing is done. So, we have to consume enough memory at least to fit the largest pages of the files. What do you think? (I'll add the clean to the |
|
Yes, I agree to add the clean to the |
|
I am concerned about the suggestion of resetting the size to 0 every time, because it leads to a guaranteed allocation the next time the buffer needs to be used. This is radically different from how things used to work, as originally an allocation only happened when the buffer needed to grow. This change would mean a lot more allocation, which may affect performance badly. It also doesn't seem necessary to me, because this change would not reduce the peak memory usage of the buffer, it would only shorten the duration of it to a single use of the buffer instead of a single use of the compressor, which does not seem to be a large gain. My suggestion would be to remove the deallocation from |
|
I agree with @zivanfi. |
|
I agree with 1) initialize to 0; 2) add clean to end(). |
|
We are investigating a memory leak issue in our environment where the snappy off-heap consumption has grown to 3-4 GBs in size. Even after using the version of parquet-mr which has this fix we see similar growth. We suspect there is still an open leak in here. We do not see |
Summary: Merge two commits from upstream Revert "PARQUET-1381: Add merge blocks command to parquet-tools (apache#512) apache#621 PARQUET-1533: TestSnappy() throws OOM exception with Parquet-1485 change apache#622 Reviewers: pavi, leisun Reviewed By: leisun Differential Revision: https://code.uberinternal.com/D2544359
No description provided.