Limit number of nodes that execute writing stages#15987
Limit number of nodes that execute writing stages#15987radek-kondziolka wants to merge 3 commits intotrinodb:masterfrom
Conversation
1152188 to
213b5b6
Compare
5cd907f to
27c9024
Compare
|
Close #15877? |
059c5e6 to
8864c30
Compare
|
Yes, #15877 was closed. Let's wait with merging until preparing PR to documentation staff. |
There was a problem hiding this comment.
Instead of having a separate rule, you can do this directly inside AddExchange#visitTableWriter
There was a problem hiding this comment.
Yes, it could. I do it here because:
- Most probably in the future we can make it adaptive basing on stats.
- There is a rule that is similar in the sense of setting
PartitioningScheme-ApplyPreferredTableExecutePartitioning. This rule is very simple, just basing on configuration toggle and could be part ofAddExchangesas well.
This is why I decided to wrap it within another rule.
There was a problem hiding this comment.
TBH, I don't think we need a separate rule for this since we are just applying the configuration. In ApplyPreferredTableExecutePartitioning we are taking a decision based on estimates to use preferred partitioning or not. So, Its kinda makes sense.
There was a problem hiding this comment.
And in future if we ever make it adaptive based on stats we can always add a new rule and remove from AddExchanges. It shouldn't be a big change.
There was a problem hiding this comment.
I think the easier approach is preferred. If it's just few lines of code, then AddExchange#visitTableWriter is good enough.
Be sure to have proper testing coverage. With AddExchange I think you have to have BasePlanTest kind of tests
There was a problem hiding this comment.
@gaurav8297 , the rule LimitMaxWriterNodesCount is now much complicated. Especially, we skip the rule in some cases that is not skipped in AddExchange rule. I do not think that we should merge them. It is complicated it seems to be at the beginning.
core/trino-main/src/test/java/io/trino/execution/scheduler/TestScaledWriterScheduler.java
Outdated
Show resolved
Hide resolved
0b6408f to
128f1db
Compare
Added an option query.max-writer-nodes-count to QueryManagerConfig and a session option that limits number of nodes that take part in writing tasks.
The option query.max-writer-nodes-count was used as a maximal number of nodes that take part in writing stages when ScaledWriterScheduler is used.
The optimizer rule LimitMaxWriterNodesCount was added to limit number of nodes that take part in executing writer stages.
128f1db to
687ddde
Compare
|
Have we also considered making the target catalog's connector participate in deciding how many writer tasks to use? Clusters can be connected to very homogenous systems so for example one might want all nodes to write to object storage or Kafka but only limited number of writers to databases. |
You mean round-robin writers or partitioned? For partitioned connector can always assigned fixed partition->node mapping |
|
For round-robin writers. EDIT: Is it possible to partition writes on some column which is actually not part of the output? e.g. If I wanted to limit writes to 2 nodes maybe I can partition the output on some generated column and assign to two nodes. That would also achieve what I'm trying to think of. |
|
@hashhar ,
For now it is not possible. |
Description
The option
query.max-writer-node-countwas added to limit number of nodes that take part in executing writer stages.It was implemented by some changes in
ScaledWriterScheduler(for unpartitioned data) and by adding the new ruleLimitMaxWriterNodesCountto the optimizer (for partitioned data).Release notes
( ) This is not user-visible or docs only and no release notes are required.
( *) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Documentation
Documentation update is need and will be done soon.