Skip to content

Add LEAST_WASTED task memory killer #12242

Merged
losipiuk merged 7 commits intotrinodb:masterfrom
losipiuk:lo/smarter-memory-killer
May 12, 2022
Merged

Add LEAST_WASTED task memory killer #12242
losipiuk merged 7 commits intotrinodb:masterfrom
losipiuk:lo/smarter-memory-killer

Conversation

@losipiuk
Copy link
Copy Markdown
Member

@losipiuk losipiuk commented May 4, 2022

Fixes #11952

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core

Documentation

( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# General
* Rename configuration property used for selecting a policy for low memory query killer from
  `query.low-memory-killer.policy` to `query.low-memory-query-killer.policy`. ({issue}`12242`)
* Introduce specific configuration property `query.low-memory-task-killer.policy` for selecting a policy 
  for low memory task killer. The policy is used if task level retries are enabled (`retry-policy=TASK`). ({issue}`12242`)
* Introduce `least-wasted` low memory task killer policy. This policy avoids killing tasks that are already
  executing for a long time, so a significant amount of work is not wasted.  ({issue}`12242`)

@cla-bot cla-bot bot added the cla-signed label May 4, 2022
@losipiuk losipiuk requested review from arhimondr and linzebing May 4, 2022 14:21
@github-actions github-actions bot added the docs label May 4, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to have multiple low memory killers, unless we plan to support pipelined execution and fault tolerant execution in the same cluster.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if properly used cluster only one should trigger. But currently, we are not forcibly forbidding users from running both types of workloads in the same cluster, so we need to handle that in code I think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to remove the session property for retry-policy (or at least don't allow hybrid mode) right now? cc @arhimondr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't mind removing the session property since currently we don't support mixed workloads. Then based on the retry-policy value we may inject the killer we need to keep it simple. However we would still probably need two configuration properties, one that defines what killer to use for killing tasks and the other one for queries as they are going to have different defaults. We can pick one based on the retry-policy then. It feels like it would make things simpler.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done - but we would need to rework testing. In FTE tests we currently run query with and without retries. We switch the mode using session property. We would need to migrate approach to have two query runners - can be done but tests will be a bit longer and would require more resources (we need two query runners up and running).
I will see how easy it is to change it in a separate PR.

@losipiuk losipiuk force-pushed the lo/smarter-memory-killer branch from f33410e to 576bc64 Compare May 6, 2022 16:30
@losipiuk
Copy link
Copy Markdown
Member Author

losipiuk commented May 6, 2022

AC

@losipiuk losipiuk requested a review from linzebing May 6, 2022 16:33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't mind removing the session property since currently we don't support mixed workloads. Then based on the retry-policy value we may inject the killer we need to keep it simple. However we would still probably need two configuration properties, one that defines what killer to use for killing tasks and the other one for queries as they are going to have different defaults. We can pick one based on the retry-policy then. It feels like it would make things simpler.

@losipiuk losipiuk force-pushed the lo/smarter-memory-killer branch 2 times, most recently from b1fb8ed to e31b684 Compare May 9, 2022 11:39
@losipiuk losipiuk requested a review from arhimondr May 9, 2022 11:50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no indeed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TestLeastWastedEffortTaskLowMemoryKiller (same for other tests)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx; others were fine

@losipiuk losipiuk force-pushed the lo/smarter-memory-killer branch from e31b684 to a088ee2 Compare May 10, 2022 10:28
@losipiuk
Copy link
Copy Markdown
Member Author

@jhlodin could you PTAL at doc changes?

Copy link
Copy Markdown
Contributor

@jhlodin jhlodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested some changes to the docs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be "Set the query.low-memory-task-killer.policy". It's in the context of "recommended changes for a TASK retry policy" so we can be specific about which one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this situation, are any changes recommended to the query task killer policy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not care about query.low-memory-query-killer.policy after we split properties. Fixed the typo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the interaction between this and the other low memory killer property if both are set?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From context it seems like this property is only used when retry-policy is set to TASK. If that's the case, should add a sentence to this description saying that, and whether the query low memory killer does anything in that case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we should be sayint to users is:
query.low-memory-task-killer.policy is used whenretry_policy is TASK.
query.low-memory-query-killer.policy is used otherwise.

In reality it is that:
query.low-memory-task-killer.policy is used for queries which were run with retry_policy set to tasks and query.low-memory-query-killer.policy for other queries - but we officially do not support mixed workloads so we should not word it like that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O - actualy we use such wording already:

    Does not apply for queries with task level retries enabled (``retry-policy=TASK``)

I will keep it consistent for now then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a value different than NONE

@jhlodin
Copy link
Copy Markdown
Contributor

jhlodin commented May 10, 2022

What will happen to clusters that currently have an explicit configuration of the now-legacy query.low-memory-killer.policy property? Is this a breaking change for them until they specify either the query or task config?

@losipiuk
Copy link
Copy Markdown
Member Author

What will happen to clusters that currently have an explicit configuration of the now-legacy query.low-memory-killer.policy property? Is this a breaking change for them until they specify either the query or task config?

Nope - you can still use legacy config keys - you will just get a warning during startup in logs.

@losipiuk losipiuk force-pushed the lo/smarter-memory-killer branch from a088ee2 to 2c1025e Compare May 11, 2022 14:22
@losipiuk losipiuk merged commit a58fc50 into trinodb:master May 12, 2022
@github-actions github-actions bot added this to the 381 milestone May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Rethink which task to kill on cluster OOM with Tardigrade

4 participants