Skip to content

Limit number of nodes that execute writing stages#15987

Closed
radek-kondziolka wants to merge 3 commits intotrinodb:masterfrom
radek-kondziolka:rk/add_upper_limit_on_writer_scaling_backup
Closed

Limit number of nodes that execute writing stages#15987
radek-kondziolka wants to merge 3 commits intotrinodb:masterfrom
radek-kondziolka:rk/add_upper_limit_on_writer_scaling_backup

Conversation

@radek-kondziolka
Copy link
Copy Markdown
Contributor

@radek-kondziolka radek-kondziolka commented Feb 6, 2023

Description

The option query.max-writer-node-count was added to limit number of nodes that take part in executing writer stages.
It was implemented by some changes in ScaledWriterScheduler (for unpartitioned data) and by adding the new rule LimitMaxWriterNodesCount to the optimizer (for partitioned data).

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( *) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Documentation

Documentation update is need and will be done soon.

@cla-bot cla-bot bot added the cla-signed label Feb 6, 2023
@radek-kondziolka radek-kondziolka force-pushed the rk/add_upper_limit_on_writer_scaling_backup branch from 1152188 to 213b5b6 Compare February 6, 2023 13:23
@radek-kondziolka radek-kondziolka changed the title Rk/add upper limit on writer scaling backup Limit number of nodes that execute writing stages Feb 6, 2023
@radek-kondziolka radek-kondziolka force-pushed the rk/add_upper_limit_on_writer_scaling_backup branch 2 times, most recently from 5cd907f to 27c9024 Compare February 6, 2023 13:39
@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 6, 2023

Close #15877?

@radek-kondziolka radek-kondziolka force-pushed the rk/add_upper_limit_on_writer_scaling_backup branch 4 times, most recently from 059c5e6 to 8864c30 Compare February 6, 2023 14:16
@radek-kondziolka
Copy link
Copy Markdown
Contributor Author

Yes, #15877 was closed. Let's wait with merging until preparing PR to documentation staff.

@radek-kondziolka radek-kondziolka marked this pull request as ready for review February 6, 2023 14:22
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a separate rule, you can do this directly inside AddExchange#visitTableWriter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could. I do it here because:

  1. Most probably in the future we can make it adaptive basing on stats.
  2. There is a rule that is similar in the sense of setting PartitioningScheme - ApplyPreferredTableExecutePartitioning. This rule is very simple, just basing on configuration toggle and could be part of AddExchanges as well.

This is why I decided to wrap it within another rule.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't think we need a separate rule for this since we are just applying the configuration. In ApplyPreferredTableExecutePartitioning we are taking a decision based on estimates to use preferred partitioning or not. So, Its kinda makes sense.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in future if we ever make it adaptive based on stats we can always add a new rule and remove from AddExchanges. It shouldn't be a big change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT? @sopel39

Copy link
Copy Markdown
Member

@sopel39 sopel39 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easier approach is preferred. If it's just few lines of code, then AddExchange#visitTableWriter is good enough.

Be sure to have proper testing coverage. With AddExchange I think you have to have BasePlanTest kind of tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaurav8297 , the rule LimitMaxWriterNodesCount is now much complicated. Especially, we skip the rule in some cases that is not skipped in AddExchange rule. I do not think that we should merge them. It is complicated it seems to be at the beginning.

@radek-kondziolka radek-kondziolka force-pushed the rk/add_upper_limit_on_writer_scaling_backup branch 2 times, most recently from 0b6408f to 128f1db Compare February 7, 2023 13:58
Added an option query.max-writer-nodes-count to QueryManagerConfig and
a session option that limits number of nodes that take part in writing tasks.
The option query.max-writer-nodes-count was used as a maximal number
of nodes that take part in writing stages when ScaledWriterScheduler is used.
The optimizer rule LimitMaxWriterNodesCount was added to limit number of nodes
that take part in executing writer stages.
@radek-kondziolka radek-kondziolka force-pushed the rk/add_upper_limit_on_writer_scaling_backup branch from 128f1db to 687ddde Compare February 9, 2023 13:26
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Feb 13, 2023

Have we also considered making the target catalog's connector participate in deciding how many writer tasks to use? Clusters can be connected to very homogenous systems so for example one might want all nodes to write to object storage or Kafka but only limited number of writers to databases.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Feb 14, 2023

Have we also considered making the target catalog's connector participate in deciding how many writer tasks to use?

You mean round-robin writers or partitioned? For partitioned connector can always assigned fixed partition->node mapping

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Feb 14, 2023

For round-robin writers.

EDIT: Is it possible to partition writes on some column which is actually not part of the output? e.g. If I wanted to limit writes to 2 nodes maybe I can partition the output on some generated column and assign to two nodes. That would also achieve what I'm trying to think of.

@radek-kondziolka
Copy link
Copy Markdown
Contributor Author

radek-kondziolka commented Feb 20, 2023

@hashhar ,

Is it possible to partition writes on some column which is actually not part of the output?

For now it is not possible.

@radek-kondziolka
Copy link
Copy Markdown
Contributor Author

@hashhar , I am closing this one and I've opened this one: #16238
Beyond to that, I've added possibility for connectors to decide on number of tasks / nodes that take part in writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants