Skip to content

Use Proper query scope in CTE Materialization#21580

Merged
jaystarshot merged 1 commit intoprestodb:masterfrom
jaystarshot:presto-21563
Jan 5, 2024
Merged

Use Proper query scope in CTE Materialization#21580
jaystarshot merged 1 commit intoprestodb:masterfrom
jaystarshot:presto-21563

Conversation

@jaystarshot
Copy link
Member

@jaystarshot jaystarshot commented Dec 20, 2023

Fixes #21563.

Motivation and Context

Fixes an issue with cte refereneces referencing wrong ctes (sample examples mentioned in the issue). This was causing incorrect plans.
Each reference of the cte expands the same namedQuery, hence we can fix all these issues with normalization of that named query object

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • Final Prod Shadow test

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fixed a bug in cte materialization which was causing incorrect plan generation in some cases

@jaystarshot jaystarshot force-pushed the presto-21563 branch 3 times, most recently from 675369f to af48e75 Compare December 22, 2023 02:56
@jaystarshot
Copy link
Member Author

This same issue will happen in CteInformationCollector (wrong reference count will increase), thinking about adding a new field called actualCtePath in CTEInformation. What do you think?

@jaystarshot jaystarshot marked this pull request as ready for review December 22, 2023 02:59
@jaystarshot jaystarshot requested a review from a team as a code owner December 22, 2023 02:59
@jaystarshot jaystarshot force-pushed the presto-21563 branch 2 times, most recently from ba7d1e4 to 3129ea3 Compare December 22, 2023 16:27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use putIfAbsent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping same since its better that the ++ happens only if its new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we convert the scope id to string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that it is easier to debug and add tests, we can use hashCode but it becomes complicated

@tdcmeehan tdcmeehan self-assigned this Jan 2, 2024
@jaystarshot
Copy link
Member Author

jaystarshot commented Jan 2, 2024

SELECT (WITH test_base AS
       (
              SELECT colb
              FROM   (VALUES
                     (
                            1
                     )
                     ) AS temptable(colb)), test_cte AS
       (
              SELECT *
              FROM   test_base)SELECT *
FROM   test_cte ),
       (WITH test_base AS
       (
              SELECT text_column
              FROM   (VALUES
                     (
                            'Some Text',
                            9
                     )
                     ) AS t (text_column, number_column) ), test_cte AS
       (
              SELECT *
              FROM   test_base )SELECT *
FROM   test_cte )

Observed just one more extra edge case in prod where the definition of test_cte stays constant, but test_base undergoes changes, such as referencing a different column. In this case we are still considering both ctes to be the same.
Thinking how to fix this, because seems non trivial. Seems like just adding the set of referenced queries will work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Defination -> Definition

@jaystarshot jaystarshot merged commit 5a70da7 into prestodb:master Jan 5, 2024
@jaystarshot jaystarshot deleted the presto-21563 branch January 5, 2024 03:41
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CTE Materialization creating wrong plans for similar named ctes in completely different scope

4 participants