Skip to content

Implement maxDriversPerQuery#10942

Closed
JunhyungSong wants to merge 1 commit intotrinodb:masterfrom
JunhyungSong:max-drivers-per-query
Closed

Implement maxDriversPerQuery#10942
JunhyungSong wants to merge 1 commit intotrinodb:masterfrom
JunhyungSong:max-drivers-per-query

Conversation

@JunhyungSong
Copy link
Copy Markdown
Member

Some queries take too much drivers for their leaf splits that causes
issues like ExceededMemoryLimit. It needs to be controllable.

Some queries take too much drivers for their leaf splits that causes
issues like ExceededMemoryLimit. It needs to be controllable.
@cla-bot cla-bot bot added the cla-signed label Feb 3, 2022
@electrum electrum requested a review from dain February 4, 2022 05:49
@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 7, 2022

Some notes:

  1. Does the issues still happens with phased execution policy? It should prevent too many (unproductive) concurrent table scans for a query.
  2. Are you aiming for latency of peak memory utilization? Did you try legacy-phased policy (it should limit number of scans even more)
  3. It would be great for concurrency controller to look at memory utilization rather than just number of running splits, because for example readers (ORC, Parquet) might have different memory usage. Also table scan stage might run partial aggregation.
  4. I wonder if for running queries in low-mem environment the holistic approach wouldn't be to use resiliency effort (cc @arhimondr ). There, if needed, you could limit the number of task to once based on resource utilization.

@JunhyungSong
Copy link
Copy Markdown
Member Author

JunhyungSong commented Feb 7, 2022

Some notes:

  1. Does the issues still happens with phased execution policy? It should prevent too many (unproductive) concurrent table scans for a query.
  2. Are you aiming for latency of peak memory utilization? Did you try legacy-phased policy (it should limit number of scans even more)
  3. It would be great for concurrency controller to look at memory utilization rather than just number of running splits, because for example readers (ORC, Parquet) might have different memory usage. Also table scan stage might run partial aggregation.
  4. I wonder if for running queries in low-mem environment the holistic approach wouldn't be to use resiliency effort (cc @arhimondr ). There, if needed, you could limit the number of task to once based on resource utilization.
  1. I haven't checked the new phased execution policy since it was implemented recently. Let me check whether it solves the issues or not.
  2. The legacy-phased execution policy showed some performance regression while this PR doesn't(actually improves TPC-DS performance). Also, some test queries still failed intermittently with the legacy-phased execution policy while there was no failure with this PR.
  3. & 4. I have that version. But, controlling leaf split concurrency with current memory utilization data became so complicated and caused side effects. I found that the approach in this PR is more effective with much simpler codes.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 8, 2022

I haven't checked the new phased execution policy since it was implemented recently. Let me check whether it solves the issues or not.

phased is the new default policy, see #10350 or https://docs.starburst.io/blog/2022-02-03-cpu-utilization.html. It's now enabled by default.

Essentially, both phased and legacy-phased limit number of table scans. phsaed allows only productive table scans (of which data will be consumed) , but legacy-phased tries to schedule singe table scan.

Also, some test queries still failed intermittently with the legacy-phased execution policy while there was no failure with this PR.

I would think with both phased and legacy-phased you probably can control concurrency better with just task.max-drivers-per-task since there are less running table scans.

Questions:

  1. Why don't you increase query-max-memory-per-node to improve query concurrency if there are available driver slots?
  2. Will query still fail if you set task.min-drivers-per-task=0?

@JunhyungSong
Copy link
Copy Markdown
Member Author

I would think with both phased and legacy-phased you probably can control concurrency better with just task.max-drivers-per-task since there are less running table scans.

With legacy-phased, I agree that we can get the similar effect with task.max-drivers-per-task for task.max-drivers-per-query since there will be only a single task running for the corresponding query. But some performance degradation was found. With the new phased execution policy, we still don't know how many tasks will be running at the same time. So, the number of the tasks can still be large. Like I said in the previous comment, I will check with the new phased execution policy to see whether it can cover all the problematic cases that I met without noticeable performance degradation.

Why don't you increase query-max-memory-per-node to improve query concurrency if there are available driver slots?

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

Will query still fail if you set task.min-drivers-per-task=0?

Is the min value 1? When I reduced that number below 4, I noticed that many deadlocks began to occur.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 9, 2022

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

If you still have more driver slots left, there can always be another query that would push memory utilization over node total mem.

Why not to reduce task.min-drivers instead?

@JunhyungSong
Copy link
Copy Markdown
Member Author

have more driver slots left, there can always be another query that would push memory utilization

We can confine certain heavy queries to have less concurrency. That being said, other light queries can run in a better performance/stability.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 9, 2022

We can confine certain heavy queries to have less concurrency.

What prevents user from issuing 2 heavy queries that would push node beyond mem limit?

@JunhyungSong
Copy link
Copy Markdown
Member Author

We can confine certain heavy queries to have less concurrency.

What prevents user from issuing 2 heavy queries that would push node beyond mem limit?

Some heavy queries can exceed memory limit due to accumulated memory in ScanFilterAndProjectOperator. That can be mitigated by confining maxDriversPerQuery. Particularly when the filter's selectivity rate is high, so the output buffer is filled up slowly.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 9, 2022

Some heavy queries can exceed memory limit due to accumulated memory in ScanFilterAndProjectOperator. That can be mitigated by confining maxDriversPerQuery. Particularly when the filter's selectivity rate is high, so the output buffer is filled up slowly.

Yes. However, you mentioned that

If query-max-memory-per-node already reaches to the node's memory cap, it cannot be increased.

so what does prevent two such heavy queries to run on a node and exceed node memory cap (one of the queries will fail anyway)?

}
}

public static int getMaxDriversPerQuery(Session session)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only be able to reduce max_drivers_per_query and not increase it vs value in FeatureConfig

@bitsondatadev
Copy link
Copy Markdown
Member

👋 @JunhyungSong - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants