Skip to content

Fix CTE reference in analyzer#22515

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_cte_ref
Apr 16, 2024
Merged

Fix CTE reference in analyzer#22515
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_cte_ref

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Apr 13, 2024

Description

Fix #22514

Currently when creating CTE Reference node, the NamedQuery used in the CTE is used to uniquely identify the CTE, which can lead to conflicts, which can be demonstrated with example in the linked issue.

Instead of trying to build the scope manually, we can simply rely on the underlying Query object linked to each CTE definition and reference, which refers to the same object if are referring to the same CTE.

Another advantage of this approach is that, it's much faster. I've seen query planning timeout due to the recursive query reference visit when dealing with large queries with nested ctes.

Motivation and Context

Fix a bug in CTE materialization

Impact

Bug fix.

Test Plan

Add unit test.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix a bug in CTE reference node creation, where different CTEs may be incorrectly considered as the same CTE.

@feilong-liu feilong-liu requested review from a team and jaystarshot as code owners April 13, 2024 05:14
@feilong-liu feilong-liu requested a review from presto-oss April 13, 2024 05:14
}

@Test
public void testNesteCteWithSameName()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fail without change here.

else {
// cte considered for materialization
String normalizedCteId = context.getCteInfo().normalize(analysis, namedQuery.getQuery(), cteName);
String normalizedCteId = context.getCteInfo().normalize(NodeRef.of(namedQuery.getQuery()), cteName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NodeRef, because we want to use == rather than equals comparison.

public TreeSet<Query> getReferencedQuerySet()
{
return referencedQuerySet;
String identityString = queryNodeRef.hashCode() + delimiter + cteName;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use hash code to uniquely identify a CTE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if repeated runs of the query will have the same hashcode.
If they do then please ignore.
If they don't, is it better to normalize this hashcode by ID too like done previously?
The goal is to make debugging easier in prod since its better for repeated query runs to have the same id of the cte, also the test framework changes might not be required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Instead of using hash code, now I use an incremental prefix for the mapping purpose, which should be fixed now, and get rid of the massive change of test framework.

String cteName = ((CteConsumerNode) node).getCteId();

return match();
if (cteNameMapping.containsKey(expectedCteName) && cteNameMapping.inverse().containsKey(cteName) && cteNameMapping.get(expectedCteName).equals(cteName)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One to one mapping exists and match.

@feilong-liu feilong-liu force-pushed the fix_cte_ref branch 5 times, most recently from 42f07d3 to 172098f Compare April 13, 2024 05:57
jaystarshot
jaystarshot previously approved these changes Apr 13, 2024
Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, just added couple of comments

public TreeSet<Query> getReferencedQuerySet()
{
return referencedQuerySet;
String identityString = queryNodeRef.hashCode() + delimiter + cteName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if repeated runs of the query will have the same hashcode.
If they do then please ignore.
If they don't, is it better to normalize this hashcode by ID too like done previously?
The goal is to make debugging easier in prod since its better for repeated query runs to have the same id of the cte, also the test framework changes might not be required

@jaystarshot
Copy link
Member

LGTM, however I only have codeowner access for presto main so you will need other reviews.

jaystarshot
jaystarshot previously approved these changes Apr 14, 2024
@feilong-liu feilong-liu merged commit 29fd544 into prestodb:master Apr 16, 2024
@feilong-liu feilong-liu deleted the fix_cte_ref branch April 16, 2024 18:48
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CTE materialization creates the same reference for different CTE

5 participants