Skip to content

Separate CTE materialization session Properties from exchange Materialization#21625

Merged
jaystarshot merged 2 commits intoprestodb:masterfrom
jaystarshot:cte-hash-partition-property
Jan 23, 2024
Merged

Separate CTE materialization session Properties from exchange Materialization#21625
jaystarshot merged 2 commits intoprestodb:masterfrom
jaystarshot:cte-hash-partition-property

Conversation

@jaystarshot
Copy link
Member

@jaystarshot jaystarshot commented Jan 4, 2024

  1. Enchances control of cte materialization hash partition buckets. Previously same property as exchanges were used

Current testing has shown that recommended value is 4x but further performance testing might be needed.

Meta recommends setting hash partition for materialized exchange as 5X - 10X the cluster size in exchange materialization and we do not want that as the hash partition count of non materialized exchanges

https://prestodb.io/docs/current/admin/exchange-materialization.html#using-exchange-materialization

  1. Separates cte_partitioning_provider_catalog from exchange materialization

This is needed since its good separation between features since these are cross used

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Added CTE hash partition count session property

@jaystarshot jaystarshot changed the title Control number of writers for CTE materialization effectively Control number of writers and bucket size for CTE materialization effectively Jan 4, 2024
@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from 61e6304 to d9b0043 Compare January 4, 2024 02:03
@jaystarshot jaystarshot marked this pull request as ready for review January 4, 2024 02:51
@jaystarshot jaystarshot requested a review from a team as a code owner January 4, 2024 02:51
@jaystarshot jaystarshot changed the title Control number of writers and bucket size for CTE materialization effectively Separate CTE materialization session Properties from exchange Materialization Jan 9, 2024
@tdcmeehan tdcmeehan requested a review from arhimondr January 9, 2024 17:46
@tdcmeehan tdcmeehan self-assigned this Jan 9, 2024
@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from a91ee90 to d00e266 Compare January 12, 2024 19:12
@jaystarshot
Copy link
Member Author

Updated CTE tests to use the new property

@steveburnett
Copy link
Contributor

Could documentation for this new session property be added? Possibly in Properties Reference.

@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from d00e266 to 9165e6f Compare January 17, 2024 06:53
@jaystarshot jaystarshot requested a review from tdcmeehan January 17, 2024 06:53
@github-actions
Copy link

github-actions bot commented Jan 17, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 00faa91...ab406a4.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/admin/properties.rst

@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from 9165e6f to 179a9b0 Compare January 17, 2024 06:55
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! One nit only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This setting determines how many buckets/writers should be used when materializing the CTEs, potentially affecting the performance of queries involving CTE materialization.
This setting determines how many buckets or writers should be used when materializing the CTEs, potentially affecting the performance of queries involving CTE materialization.

Avoid slashes. See GitLab doc recommended word list entry for slashes for advice and suggestions.

@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from 179a9b0 to b43067a Compare January 17, 2024 18:38
@jaystarshot
Copy link
Member Author

Added the docs, but isn't this page redundant since we already have @ConfigDescription? Does it make more sense to add @ConfigDescription to all new session properties and pull them in the docs page during releases?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM@ (docs)

Pull branch, local build, docs look great. Thanks!

@steveburnett
Copy link
Contributor

Added the docs, but isn't this page redundant since we already have @ConfigDescription? Does it make more sense to add @ConfigDescription to all new session properties and pull them in the docs page during releases?

That's a great idea! I'll look into this possibility for generating the docs. Thanks very much for bringing it up.

@jaystarshot
Copy link
Member Author

@steveburnett Added documentation for cte-materialization-strategy property, please review

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local build of docs, looks good. Thanks!

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nit on the first commit, and please also add session property documentation.

Please make sure it follows our guidelines. Suggested:

Add cte_hash_partition_count and config property
This commit adds the `cte_hash_partition_count`, and corresponding config property `query.cte-hash-partition-count`, which controls number of writers for CTE materialization effectively

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another look, two minor suggestions.

@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from 4827f25 to a4b68e6 Compare January 22, 2024 21:39
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiniest of nits.

This commit adds the `cte_hash_partition_count`, and corresponding config property `query.cte-hash-partition-count`, which controls number of writers for CTE materialization effectively
Also adds `cte_partitioning_provider_catalog`, and corresponding config `query.cte-partitioning-provider-catalog` which specifies which catalog should be used for CTE materialization and specifies partitioning
@jaystarshot jaystarshot force-pushed the cte-hash-partition-property branch from a4b68e6 to ab406a4 Compare January 22, 2024 22:31
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local build of docs.

@jaystarshot jaystarshot merged commit 7f17913 into prestodb:master Jan 23, 2024
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants