Skip to content

Remove minimum_assigned_split_weight Hive and Iceberg session config#12656

Closed
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/remove-minimum-assigned-split-weight-hive-session-config-9916d2
Closed

Remove minimum_assigned_split_weight Hive and Iceberg session config#12656
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/remove-minimum-assigned-split-weight-hive-session-config-9916d2

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Jun 2, 2022

The idea of having some minimum split weight is to prevent very tiny
splits from overloading the cluster, or pumping up task scheduling
request sizes. A safety toggle like this should not be configurable at
session level.

This remains configurable at the catalog property level.

@cla-bot cla-bot bot added the cla-signed label Jun 2, 2022
@findepi findepi marked this pull request as ready for review June 2, 2022 11:44
Copy link
Copy Markdown
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in an offline conversation with @findepi, but having this session property is extremely convenient for exploratory analysis to identify better weight minimums in any given environment and workload. There is also an additional guard-rail in place to reduce the potential for huge task update requests: node-scheduler.max-unacknowledged-splits-per-task (default: 500) sets an absolute maximum on the number of splits, regardless of their weights, that can be "pending and not delivered" to tasks that the scheduler respects.

I propose that instead of removing the session property, we could consider either or both of the following alternatives:

  • Marking the session property as hidden to discourage users from tweaking the value, but leaving it there as a convenience for development and power user's convenience.
  • Adding a validation on the session property to not allow it to bet set to a value lower than the cluster configured value

Ultimately, if you think the risk is still there and insist on removing it then I'm ok with that choice- just pointing out the alternative options that preserve the convenience it provides.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to remove the session property, you could just pass a HiveSplitWeightProvider directly instead of the double value here instead.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jun 3, 2022

  • Marking the session property as hidden to discourage users from tweaking the value, but leaving it there as a convenience for development and power user's convenience.

split assignment tuning parameters should be configurable on the session level, should they?

also, per my understanding, the "min weight" is a safety measure so shouldn't be configurable by the end user, should it?

Adding a validation on the session property to not allow it to bet set to a value lower than the cluster configured value

that would address the safety issue, but then again, why would we want to keep it?
(we would need to identity the why and document)

@pettyjamesm
Copy link
Copy Markdown
Member

split assignment tuning parameters should be configurable on the session level, should they?

I don't see why they shouldn't be, at least to a trusted power user. The parameter controls behavior that attempts to be adaptive to characteristics in the workload, so if someone knows something specific to their workload or wants to experiment with the setting in the context of that specific workload I think the ability to control that without restarting the cluster is valid.

It's not ideal that someone could use it to impact the cluster health, but I would bet that malicious users with sufficient privileges to submit queries with or access to other session property overrides already have more than a few different ways to DoS a cluster. For instance, you could create similar payload bombs by creating a table with lots of columns and very long comments on each, and that's ignoring all the system session properties that can specify overrides on the per query stage limit, etc.

The idea of having some minimum split weight is to prevent very tiny
splits from overloading the cluster, or pumping up task scheduling
request sizes. A safety toggle like this should not be configurable at
session level.

This remains configurable at the catalog property level for both Hive
and Iceberg.
@findepi findepi force-pushed the findepi/remove-minimum-assigned-split-weight-hive-session-config-9916d2 branch from 982f359 to 9c96f92 Compare June 4, 2022 19:08
@findepi findepi changed the title Remove minimum_assigned_split_weight Hive session config Remove minimum_assigned_split_weight Hive and Iceberg session config Jun 6, 2022
@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented Jun 12, 2022

I agree that it is useful to have session properties to configure behaviors that are not necessarily commonly modified by non-power users. Actually we have plenty of those related to Tardigrade project (fault-tolerant execution). They often come helpful in prototyping, or when diagnosing performance issues with some specific query (it is often trial and error changing of configuration via session properties; doing that via config is PITA (and also not possible in some environments).

Maybe instead of dropping the session property, we should provide some mechanism that would allow hiding specific properties from end-users?
Also is access control not enough to handle that? Can we disallow users to set specific session properties?

@findepi findepi closed this Jun 13, 2022
@findepi findepi deleted the findepi/remove-minimum-assigned-split-weight-hive-session-config-9916d2 branch June 13, 2022 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants