Skip to content

Combine spill strategies#16069

Merged
rschlussel merged 4 commits intoprestodb:masterfrom
rschlussel:combine-spill-strategies
May 13, 2021
Merged

Combine spill strategies#16069
rschlussel merged 4 commits intoprestodb:masterfrom
rschlussel:combine-spill-strategies

Conversation

@rschlussel
Copy link
Contributor

@rschlussel rschlussel commented May 10, 2021

Test plan - unit tests

== RELEASE NOTES ==

General Changes
* Remove spilling strategy ``PER_QUERY_MEMORY_LIMIT`` and instead add configuration property ``experimental.query-limit-spill-enabled`` and session property ``query_limit_spill_enabled``.  When this property is set to ``true``, and the spill strategy is not ``PER_TASK_MEMORY_THRESHOLD``, then we will spill whenever a query uses more than the per-node total memory limit in combined revocable and non-revocable memory, in addition to whenever the memory pool exceeds the spill threshold.  This fixes an issue where using the ``PER_QUERY_MEMORY_LIMIT`` spilling strategy could prevent the oom killer from running when the memory pool was full.  The issue is still present for the ``PER_TASK_MEMORY_THRESHOLD`` spilling strategy.

It's not needed since memory revocation is called on every reservation.
@rschlussel rschlussel force-pushed the combine-spill-strategies branch from db394f4 to ce2f3fc Compare May 10, 2021 19:55
Copy link
Contributor

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me modulo a few minor comments added inline. I can't think of a better option to avoid concurrent memory pool and query revoking other than the single threaded executor approach taken here, which should be ok as long as expensive operations don't somehow migrate into the caller thread (potential risk: OperatorContext#memoryRevocationRequestListener and associated ListenableFuture<?> callbacks on Driver#driverBlockedFuture registered in Driver#initialize()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not newSingleThreadExecutor(threadsNamed("memory-revocation"))? Seems like this is serving to enable the .getCorePoolCount() check in the primary constructor, but seems acceptable to omit the assertion if the executor creation is hard coded (maybe just move the comment to the creation site?).

The added benefit is that single threaded executors are wrapped in a FinalizableDelegatedExecutorService which will call shutdown if the executor is leaked without being shutdown first (a likely scenario in tests that might fail to call stop()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was entirely to be able to get the corePoolCount. I'll change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this instance is created in the constructor, it should be shutdown in the stop() method to avoid leaking it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't created in the constructor for the tests, but I'll actually just change the tests to get the executor from here, and that will remove any need to do any checking about the pool size.

@rschlussel rschlussel force-pushed the combine-spill-strategies branch 3 times, most recently from 00db8b0 to e436416 Compare May 12, 2021 14:09
@rschlussel
Copy link
Contributor Author

@pettyjamesm comments addressed and all tests passing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever, I didn't realize that invokeAll() blocked until completion like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing MemoryRevokingScheduler#stop() which would now leak the single threaded executor until GC finalization. Minor, but worth avoiding. Maybe a withMemoryRevokingScheduler(<params>, Consumer<MemoryRevokingScheduler>) helper could help with the boiler-plate of registerPoolListeners() and stop() but would be a problem with the way that InterruptedException is thrown from awaitAsynchronousCallbacksRun()...

Open to your thoughts about the best way to resolve the tension between test sanity and ensuring the revoking executor remains single threaded even with future refactoring changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up going with a simple try/finally pattern in the tests to start and stop the memory revoking scheduler

Always have memory-pool spilling running whenever per-query spilling is
enabled to prevent a situation, where the memory pool could fill up
without spill being triggered. This is a problem because the OOM killer
won't kick in if there is any revocable memory allocated.

The problem exists for the PER_TASK_MEMORY_THRESHOLD spill strategy as
well, but we don't address it here as the fix is more complicated, and
we expect to remove that strategy in the future.
TestJdbcClient.testAlterColumns failed in a ci run.
See prestodb#16081.
@rschlussel rschlussel force-pushed the combine-spill-strategies branch from d475e99 to 06aebfd Compare May 13, 2021 13:14
@rschlussel rschlussel merged commit f434ea1 into prestodb:master May 13, 2021
@sujay-jain sujay-jain mentioned this pull request May 21, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants