Fix int overflow error when spilling large page#15403
Fix int overflow error when spilling large page#15403rschlussel merged 1 commit intoprestodb:masterfrom
Conversation
sachdevs
left a comment
There was a problem hiding this comment.
Nice one, I had recently started seeing this in my tests as well.
Was this a join spilling query or aggregation? Are we concerned about memory usage in page.getRegion() during splitPage()? I wonder if we are temporarily doubling memory usage by doing the page split.
There was a problem hiding this comment.
I think we can split with much moderate size? -- e.g. something like 8M?
Query had multiple joins and aggregations, so I'm not sure which was the culprit.
That's a great question. I think you're right. I'm also not sure that memory is accounted for anywhere. I wonder if it can cause GC issues for the cluster (not just for spill, but anytime we use splitPage()). |
Page serialization requires page size to fin in an integer, so larger pages could hit an int overflow error.
3ae5faa to
e69e066
Compare
sachdevs
left a comment
There was a problem hiding this comment.
Yeah, in my experience testing on T10s, this kinda stuff can cause JVM OOMs since in that one momentary instance the memory doubles and it's not yielding for memory anywhere. I'm good for merging this for now and then deal with the problem when it comes up though. I believe the solution will just be to try to make smaller pages when spilling!
arhimondr
left a comment
There was a problem hiding this comment.
Interesting. How are we ending up with 2GB+ pages?
| spillerStats.addToTotalSpilledBytes(pageSize); | ||
| writeSerializedPage(output, serializedPage); | ||
| // page serialization requires page.getSizeInBytes() + Integer.BYTES to fit in an integer | ||
| splitPage(page, DEFAULT_MAX_PAGE_SIZE_IN_BYTES).stream() |
There was a problem hiding this comment.
We should probably add a similar fix to the storage based spiller
CC: @wenleix
I'm not sure. Follow up task is to look into that and prevent whatever is causing it from generating such large pages. |
Page serialization requires page size to fit in an integer, so larger
pages could hit an int overflow error.
Test plan - Not sure how to write a test because if I create a page that big I run out of java heapspace. I plan to verify with a production query that encountered this issue to check if it fixes it, but need to wait for other testing to finish on that cluster first.