Use stable grouping set symbol orderings#18721
Conversation
d4681c2 to
2b4af81
Compare
There was a problem hiding this comment.
would it be enough to say toImmutableSet here?
There was a problem hiding this comment.
At this particular usage site it would be, but the same logic of producing distinct grouping set symbols is used in a few other places inside of GroupIdNode so I think it makes sense to just use it as a public method.
There was a problem hiding this comment.
i think it would be nice to separate "fix stability" from "reduce code duplication". maybe two commits?
There was a problem hiding this comment.
Unfortunately all of these changes are necessary to avoid the usage of unstable-ordering Set implementations, so breaking the commit into two would just introduce temporary changes that would immediately be deleted by the code de-duplication refactor- so I think it's simpler to leave it as a single commit.
Previously, GroupIdNode grouping set and output symbol orderings were potentially unstable for the same logical sub-plan due to the use of set constructions that were not order-preserving. While this did not affect correctness, it could result in different symbol orderings within grouping sets in differing branches of UNION ALL operations with the same query on both sides. For example, a query like: (SELECT shippriority, custkey, sum(totalprice) FROM orders GROUP BY ROLLUP (shippriority, custkey)) UNION ALL (SELECT shippriority, custkey, sum(totalprice) FROM orders GROUP BY ROLLUP (shippriority, custkey)) could result in logical plans with GroupIdNode grouping sets of either: 1. [[],[“tpch:shippriority$gid”],[“tpch:shippriority$gid”,“tpch:custkey$gid”]] 2. [[],[“tpch:shippriority$gid”],[“tpch:custkey$gid”,“tpch:shippriority$gid”]] This does not affect correctness, but would make the logical plan harder than necessary for a human to interpret unnecessarily.
2b4af81 to
b4f278a
Compare
Description
Previously,
GroupIdNodegrouping set and output symbol orderings were potentially unstable for the same logical sub-plan due to the use of set constructions that were not order-preserving. While this did not affect correctness, it could result in different symbol orderings within grouping sets in differing branches ofUNION ALLoperations with the same query on both sides. For example, a query like:could result in logical plans with GroupIdNode grouping sets of either:
This unnecessary variation could make the logical plan harder than necessary for a human to interpret.
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: