Skip to content

Fix int overflow error when spilling large page#15403

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
rschlussel:spill-large-page
Nov 6, 2020
Merged

Fix int overflow error when spilling large page#15403
rschlussel merged 1 commit intoprestodb:masterfrom
rschlussel:spill-large-page

Conversation

@rschlussel
Copy link
Contributor

@rschlussel rschlussel commented Nov 5, 2020

Page serialization requires page size to fit in an integer, so larger
pages could hit an int overflow error.

Test plan - Not sure how to write a test because if I create a page that big I run out of java heapspace. I plan to verify with a production query that encountered this issue to check if it fixes it, but need to wait for other testing to finish on that cluster first.

== RELEASE NOTES ==

General Changes
* Fix an issue where queries could encounter an int overflow error during spilling

Copy link
Contributor

@sachdevs sachdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, I had recently started seeing this in my tests as well.

Was this a join spilling query or aggregation? Are we concerned about memory usage in page.getRegion() during splitPage()? I wonder if we are temporarily doubling memory usage by doing the page split.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I guess I also need to "backport" it to TempStorageSingleStreamSpiller 😂 -- I will take care of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split with much moderate size? -- e.g. something like 8M?

@rschlussel
Copy link
Contributor Author

rschlussel commented Nov 5, 2020

Was this a join spilling query or aggregation?

Query had multiple joins and aggregations, so I'm not sure which was the culprit.

Are we concerned about memory usage in page.getRegion() during splitPage()? I wonder if we are temporarily doubling memory usage by doing the page split.

That's a great question. I think you're right. I'm also not sure that memory is accounted for anywhere. I wonder if it can cause GC issues for the cluster (not just for spill, but anytime we use splitPage()).

Page serialization requires page size to fin in an integer, so larger
pages could hit an int overflow error.
Copy link
Contributor

@sachdevs sachdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in my experience testing on T10s, this kinda stuff can cause JVM OOMs since in that one momentary instance the memory doubles and it's not yielding for memory anywhere. I'm good for merging this for now and then deal with the problem when it comes up though. I believe the solution will just be to try to make smaller pages when spilling!

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. How are we ending up with 2GB+ pages?

spillerStats.addToTotalSpilledBytes(pageSize);
writeSerializedPage(output, serializedPage);
// page serialization requires page.getSizeInBytes() + Integer.BYTES to fit in an integer
splitPage(page, DEFAULT_MAX_PAGE_SIZE_IN_BYTES).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a similar fix to the storage based spiller

CC: @wenleix

@rschlussel
Copy link
Contributor Author

Interesting. How are we ending up with 2GB+ pages?

I'm not sure. Follow up task is to look into that and prevent whatever is causing it from generating such large pages.

@rschlussel rschlussel merged commit f61df1b into prestodb:master Nov 6, 2020
@caithagoras caithagoras mentioned this pull request Nov 12, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants