fix(optimizer): Fix infinite loop in UnaliasSymbolReferences.canonicalize()#27428
Merged
feilong-liu merged 1 commit intoprestodb:masterfrom Mar 25, 2026
Merged
Conversation
…ze() (prestodb#27427) The canonicalize() method follows alias chains via a while loop but has no cycle detection. When multiple variables map to the same constant expression across different ProjectNodes, a cycle can form in the mapping (e.g., ds → report_date → ds), causing the loop to spin forever. Since UnaliasSymbolReferences is a non-iterative optimizer with no timeout protection, this hangs the query indefinitely during planning. Fix: add a visited set to detect cycles. When a cycle is found, remove the cyclic mapping entry and break.
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds cycle detection to UnaliasSymbolReferences.canonicalize() to prevent infinite loops when the alias mapping contains cycles by tracking visited symbols and breaking/removing the cyclic mapping entry. Class diagram for updated canonicalize logic in UnaliasSymbolReferencesclassDiagram
class UnaliasSymbolReferences {
}
class VariableReferenceExpression {
+getName() String
+getSourceLocation() SourceLocation
}
class SymbolReference {
<<value object>>
}
class SourceLocation {
}
class Map_String_String {
+containsKey(key String) boolean
+get(key String) String
+remove(key String) void
}
class Map_SymbolReference_Type {
+get(key SymbolReference) Type
}
class Set_String {
+add(value String) boolean
}
class Type {
}
UnaliasSymbolReferences o-- Map_String_String : mapping
UnaliasSymbolReferences o-- Map_SymbolReference_Type : types
UnaliasSymbolReferences : -mapping Map_String_String
UnaliasSymbolReferences : -types Map_SymbolReference_Type
UnaliasSymbolReferences : +canonicalize(variable VariableReferenceExpression) VariableReferenceExpression
UnaliasSymbolReferences ..> Set_String : uses for visited
UnaliasSymbolReferences ..> VariableReferenceExpression
UnaliasSymbolReferences ..> SymbolReference
UnaliasSymbolReferences ..> Type
UnaliasSymbolReferences ..> SourceLocation
Flow diagram for cycle detection in canonicalize methodflowchart TD
A["Start canonicalize(variable)"] --> B["canonical = variable.getName()"]
B --> C["visited = new HashSet"]
C --> D["visited.add(canonical)"]
D --> E{"mapping.containsKey(canonical)?"}
E -- "No" --> F["Create SymbolReference from canonical and source location"]
F --> G["Lookup Type from types map"]
G --> H["Return new VariableReferenceExpression"]
E -- "Yes" --> I["canonical = mapping.get(canonical)"]
I --> J{"visited.add(canonical) succeeds?"}
J -- "Yes (not seen)" --> E
J -- "No (already seen)" --> K["Cycle detected"]
K --> L["mapping.remove(canonical)"]
L --> F
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new
visitedset is allocated on everycanonicalizecall even for variables that don’t follow any alias chain; consider creating it lazily only after the first successfulmapping.get(...)to avoid overhead in the common case. - When a cycle is detected you mutate
mappingby removingcanonical, which breaks the cycle but isn’t actually the edge that closed the loop; consider clarifying the comment or restructuring this so the specific back-edge is removed (e.g., by tracking the predecessor) to better match the intention.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `visited` set is allocated on every `canonicalize` call even for variables that don’t follow any alias chain; consider creating it lazily only after the first successful `mapping.get(...)` to avoid overhead in the common case.
- When a cycle is detected you mutate `mapping` by removing `canonical`, which breaks the cycle but isn’t actually the edge that closed the loop; consider clarifying the comment or restructuring this so the specific back-edge is removed (e.g., by tracking the predecessor) to better match the intention.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
feilong-liu
approved these changes
Mar 25, 2026
Contributor
|
Saved that user @kaikalur is from Meta |
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
UnaliasSymbolReferences.canonicalize()when the alias mapping contains a cycleGROUP BYover constant expressions hang indefinitely during planning because this non-iterative optimizer has no timeout protectionFixes #27427
Root Cause
The
canonicalize()method follows alias chains via awhileloop:When multiple variables map to the same constant expression across different ProjectNodes (e.g.,
ds = '2026-01-01'andreport_date = '2026-01-01'), a cycle can form (ds → report_date → ds), causing infinite looping.Reproduction (TPC-H)
Test plan
TestUnaliasSymbolReferences: 2/2 passTestLocalQueries: 523/523 pass, 0 regressionsRelease Notes