Support for killing individual task if cluster is out of memory by losipiuk · Pull Request #11129 · trinodb/trino

losipiuk · 2022-02-21T22:28:14Z

Description

Support for killing individual task if cluster is out of memory.
Individual tasks will be killed for queries which are run with task-level retries.
Currently only supported if default low memory killer policy is used (total-reservation-on-blocked-nodes).

Is this change a fix, improvement, new feature, refactoring, or other?

improvement to new feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

N/A

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Add support for killing individual tasks of queries run with ``retry-mode`` set to ``TASK`` low memory killer.
  Requires ``query.low-memory-killer.policy`` config option to be set to ``total-reservation-on-blocked-nodes``. ({issue}`11129`)

arhimondr

Looks good to me % comments

core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java

core/trino-main/src/main/java/io/trino/memory/MemoryResource.java

core/trino-main/src/main/java/io/trino/execution/RemoteTask.java

core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java

core/trino-main/src/main/java/io/trino/memory/ClusterMemoryManager.java

losipiuk · 2022-03-02T00:44:18Z

AC

losipiuk · 2022-03-02T19:28:44Z

@martint could you maybe take a look?

losipiuk · 2022-03-03T19:09:18Z

Major changes:

Rebased on top of Support for killing individual task if cluster is out of memory #11129.

Dropped mechanism of per-task memory limit. We are not enforcing limits right now. We are estimating tasks size, and bin-packing based on that, but we allow it to go above. If memory pool is exhaust memory kill will intervene and kill one of the tasks

I dropped integration test. With memory-killer approach I was not able to make it work (deterministically enforce that one of the tasks would require full not allocation)

@arhimondr PTAL once again

NVM - comment was meant for different PR

core/trino-main/src/main/java/io/trino/memory/ClusterMemoryManager.java

...no-main/src/test/java/io/trino/memory/TestTotalReservationOnBlockedNodesLowMemoryKiller.java

Extend MemoryInfo data structure passed from workers to coordinator via `v1/memory` resource with memory reservation for all tasks which are running as part of queries with task-level retries enabled. The extra information is needed for implementing task aware cluster out of memory killer.

losipiuk · 2022-03-08T13:20:54Z

CI: #10631 (Table '...' not found)

cla-bot bot added the cla-signed label Feb 21, 2022

losipiuk requested review from arhimondr and findepi February 22, 2022 14:00

linzebing self-requested a review March 1, 2022 18:04

losipiuk force-pushed the lo/kill-by-task branch 2 times, most recently from c818b5a to d868290 Compare March 1, 2022 22:33

losipiuk marked this pull request as ready for review March 1, 2022 22:33

arhimondr reviewed Mar 1, 2022

View reviewed changes

losipiuk force-pushed the lo/kill-by-task branch from d868290 to 8095e10 Compare March 2, 2022 00:44

arhimondr approved these changes Mar 2, 2022

View reviewed changes

losipiuk force-pushed the lo/kill-by-task branch 2 times, most recently from e0545dd to b9797dd Compare March 2, 2022 15:09

losipiuk requested a review from martint March 2, 2022 19:28

losipiuk mentioned this pull request Mar 3, 2022

Allow assigning exclusive node for task execution #10432

Merged

linzebing approved these changes Mar 4, 2022

View reviewed changes

losipiuk force-pushed the lo/kill-by-task branch from b9797dd to e90a233 Compare March 7, 2022 12:52

losipiuk added 5 commits March 7, 2022 13:52

Allow for killing individual query tasks

3f24635

Pass task memory allocation to LowMemoryKiller

d17153a

Support for killing individual task if cluster is out of memory

3b132b6

Prefer killing tasks in TOTAL_RESERVATION_ON_BLOCKED_NODES OOM killer

2cb6895

losipiuk force-pushed the lo/kill-by-task branch from e90a233 to 2cb6895 Compare March 7, 2022 12:52

losipiuk merged commit 14f01f2 into trinodb:master Mar 8, 2022

github-actions bot added this to the 373 milestone Mar 8, 2022

mosabua mentioned this pull request Mar 8, 2022

Add Trino 373 release notes #11290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for killing individual task if cluster is out of memory#11129

Support for killing individual task if cluster is out of memory#11129
losipiuk merged 5 commits intotrinodb:masterfrom
losipiuk:lo/kill-by-task

losipiuk commented Feb 21, 2022 •

edited

Loading

Uh oh!

arhimondr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

losipiuk commented Mar 2, 2022

Uh oh!

losipiuk commented Mar 2, 2022

Uh oh!

losipiuk commented Mar 3, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

losipiuk commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

losipiuk commented Feb 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Documentation

Release notes

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

losipiuk commented Mar 2, 2022

Uh oh!

losipiuk commented Mar 2, 2022

Uh oh!

losipiuk commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

losipiuk commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

losipiuk commented Feb 21, 2022 •

edited

Loading

losipiuk commented Mar 3, 2022 •

edited

Loading