Implement maxDriversPerQuery by JunhyungSong · Pull Request #10942 · trinodb/trino

JunhyungSong · 2022-02-03T23:50:03Z

Some queries take too much drivers for their leaf splits that causes
issues like ExceededMemoryLimit. It needs to be controllable.

Some queries take too much drivers for their leaf splits that causes issues like ExceededMemoryLimit. It needs to be controllable.

sopel39 · 2022-02-07T10:04:13Z

Some notes:

Does the issues still happens with phased execution policy? It should prevent too many (unproductive) concurrent table scans for a query.
Are you aiming for latency of peak memory utilization? Did you try legacy-phased policy (it should limit number of scans even more)
It would be great for concurrency controller to look at memory utilization rather than just number of running splits, because for example readers (ORC, Parquet) might have different memory usage. Also table scan stage might run partial aggregation.
I wonder if for running queries in low-mem environment the holistic approach wouldn't be to use resiliency effort (cc @arhimondr ). There, if needed, you could limit the number of task to once based on resource utilization.

JunhyungSong · 2022-02-07T19:27:26Z

Some notes:

Does the issues still happens with phased execution policy? It should prevent too many (unproductive) concurrent table scans for a query.

Are you aiming for latency of peak memory utilization? Did you try legacy-phased policy (it should limit number of scans even more)

It would be great for concurrency controller to look at memory utilization rather than just number of running splits, because for example readers (ORC, Parquet) might have different memory usage. Also table scan stage might run partial aggregation.

I wonder if for running queries in low-mem environment the holistic approach wouldn't be to use resiliency effort (cc @arhimondr ). There, if needed, you could limit the number of task to once based on resource utilization.

I haven't checked the new phased execution policy since it was implemented recently. Let me check whether it solves the issues or not.
The legacy-phased execution policy showed some performance regression while this PR doesn't(actually improves TPC-DS performance). Also, some test queries still failed intermittently with the legacy-phased execution policy while there was no failure with this PR.
& 4. I have that version. But, controlling leaf split concurrency with current memory utilization data became so complicated and caused side effects. I found that the approach in this PR is more effective with much simpler codes.

sopel39 · 2022-02-08T08:47:08Z

I haven't checked the new phased execution policy since it was implemented recently. Let me check whether it solves the issues or not.

phased is the new default policy, see #10350 or https://docs.starburst.io/blog/2022-02-03-cpu-utilization.html. It's now enabled by default.

Essentially, both phased and legacy-phased limit number of table scans. phsaed allows only productive table scans (of which data will be consumed) , but legacy-phased tries to schedule singe table scan.

Also, some test queries still failed intermittently with the legacy-phased execution policy while there was no failure with this PR.

I would think with both phased and legacy-phased you probably can control concurrency better with just task.max-drivers-per-task since there are less running table scans.

Questions:

Why don't you increase query-max-memory-per-node to improve query concurrency if there are available driver slots?
Will query still fail if you set task.min-drivers-per-task=0?

JunhyungSong · 2022-02-08T19:14:41Z

I would think with both phased and legacy-phased you probably can control concurrency better with just task.max-drivers-per-task since there are less running table scans.

With legacy-phased, I agree that we can get the similar effect with task.max-drivers-per-task for task.max-drivers-per-query since there will be only a single task running for the corresponding query. But some performance degradation was found. With the new phased execution policy, we still don't know how many tasks will be running at the same time. So, the number of the tasks can still be large. Like I said in the previous comment, I will check with the new phased execution policy to see whether it can cover all the problematic cases that I met without noticeable performance degradation.

Why don't you increase query-max-memory-per-node to improve query concurrency if there are available driver slots?

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

Will query still fail if you set task.min-drivers-per-task=0?

Is the min value 1? When I reduced that number below 4, I noticed that many deadlocks began to occur.

sopel39 · 2022-02-09T15:49:43Z

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

If you still have more driver slots left, there can always be another query that would push memory utilization over node total mem.

Why not to reduce task.min-drivers instead?

JunhyungSong · 2022-02-09T16:42:08Z

have more driver slots left, there can always be another query that would push memory utilization

We can confine certain heavy queries to have less concurrency. That being said, other light queries can run in a better performance/stability.

sopel39 · 2022-02-09T16:59:26Z

We can confine certain heavy queries to have less concurrency.

What prevents user from issuing 2 heavy queries that would push node beyond mem limit?

JunhyungSong · 2022-02-09T19:10:01Z

We can confine certain heavy queries to have less concurrency.

What prevents user from issuing 2 heavy queries that would push node beyond mem limit?

Some heavy queries can exceed memory limit due to accumulated memory in ScanFilterAndProjectOperator. That can be mitigated by confining maxDriversPerQuery. Particularly when the filter's selectivity rate is high, so the output buffer is filled up slowly.

sopel39 · 2022-02-09T20:14:49Z

Some heavy queries can exceed memory limit due to accumulated memory in ScanFilterAndProjectOperator. That can be mitigated by confining maxDriversPerQuery. Particularly when the filter's selectivity rate is high, so the output buffer is filled up slowly.

Yes. However, you mentioned that

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

so what does prevent two such heavy queries to run on a node and exceed node memory cap (one of the queries will fail anyway)?

sopel39 · 2022-02-09T21:08:25Z

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

        }
    }

+    public static int getMaxDriversPerQuery(Session session)


You should only be able to reduce max_drivers_per_query and not increase it vs value in FeatureConfig

bitsondatadev · 2022-11-19T06:02:06Z

👋 @JunhyungSong - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

Implement maxDriversPerQuery

ebbb825

Some queries take too much drivers for their leaf splits that causes issues like ExceededMemoryLimit. It needs to be controllable.

cla-bot bot added the cla-signed label Feb 3, 2022

electrum requested a review from dain February 4, 2022 05:49

findepi requested review from arhimondr, losipiuk and sopel39 February 4, 2022 08:15

sopel39 reviewed Feb 9, 2022

View reviewed changes

bitsondatadev closed this Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement maxDriversPerQuery#10942

Implement maxDriversPerQuery#10942
JunhyungSong wants to merge 1 commit intotrinodb:masterfrom
JunhyungSong:max-drivers-per-query

JunhyungSong commented Feb 3, 2022

Uh oh!

sopel39 commented Feb 7, 2022

Uh oh!

JunhyungSong commented Feb 7, 2022 •

edited

Loading

Uh oh!

sopel39 commented Feb 8, 2022 •

edited

Loading

Uh oh!

JunhyungSong commented Feb 8, 2022

Uh oh!

sopel39 commented Feb 9, 2022 •

edited

Loading

Uh oh!

JunhyungSong commented Feb 9, 2022

Uh oh!

sopel39 commented Feb 9, 2022

Uh oh!

JunhyungSong commented Feb 9, 2022

Uh oh!

sopel39 commented Feb 9, 2022 •

edited

Loading

Uh oh!

sopel39 Feb 9, 2022

Uh oh!

bitsondatadev commented Nov 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

JunhyungSong commented Feb 3, 2022

Uh oh!

sopel39 commented Feb 7, 2022

Uh oh!

JunhyungSong commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sopel39 commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JunhyungSong commented Feb 8, 2022

Uh oh!

sopel39 commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JunhyungSong commented Feb 9, 2022

Uh oh!

sopel39 commented Feb 9, 2022

Uh oh!

JunhyungSong commented Feb 9, 2022

Uh oh!

sopel39 commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sopel39 Feb 9, 2022

Choose a reason for hiding this comment

Uh oh!

bitsondatadev commented Nov 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

JunhyungSong commented Feb 7, 2022 •

edited

Loading

sopel39 commented Feb 8, 2022 •

edited

Loading

sopel39 commented Feb 9, 2022 •

edited

Loading

sopel39 commented Feb 9, 2022 •

edited

Loading