Skip to content

Refactor how we manage outer scopes in join iteration#3428

Merged
nicktobey merged 19 commits intomainfrom
nicktobey/outer_scopes
Feb 13, 2026
Merged

Refactor how we manage outer scopes in join iteration#3428
nicktobey merged 19 commits intomainfrom
nicktobey/outer_scopes

Conversation

@nicktobey
Copy link
Copy Markdown
Contributor

There were several problems with the previous join iterator implementation:

  • Each type of iterator was implemented separately, even though 90% of the logic was identical. But slight variations in how they were written led to bugs that only existed in some but not others.
  • The merge join iterator and the full outer join iterator did not correctly handle joins within subqueries.
  • Some iterators would not always close child iterators.

The behavior with subqueries is the main motivation for this PR.

Previously, in order to expose values from outer scopes to iterators, we would dynamically inject PrependNodes into subquery build plans. These nodes would insert values into the beginning of returned rows, allowing parent iterators to read them and use them in expressions. To compensate for this, we would also inject StripRowNodes into joins. StripRowNodes are the opposite of PrependNodes, removing columns from their iterators.

This logic was incredibly difficult to reason about correctly:

  • Injecting values from the outer scope this way inherently complicates iterator logic, particularly for joins.
  • StripRowNodes were inserted prior to plan execution (during the assignExecIndexes pass), while PrependNodes were inserted dynamically during plan execution. Both parts of the code had to agree on how many values were being inserted/removed, which required two different packages to understand each other's inner logic. Changes to one would require changes to the other to prevent subtle bugs.
  • The current implementation had several bugs in the case of multiple nested scopes:
    -- Lateral joins would re-include all the values from the outermost scope in the next scope, effectively doubling the number of columns with each nesting level.
    -- StripRowNodes would only be generated based on the innermost scope, resulting in some injected values not being removed.
    -- StripRowNodes would be generated under each join node, including between join nodes in a multi-table join. Join nodes would thus would need to re-insert these values in order to compensate... but were expected to not re-insert columns corresponding to values defined by a parent join node, except for lateral joins... reasoning about this correctly quickly becomes untenable.

Ultimately, there's no reason why join nodes can't handle this directly. And removing the StripRowNodes type and replacing it with logic in the join iterators actually makes the logic much more consistent: Parent iterators should assume that all child iterators contain prepended values for outer scopes, and values determined by the node's schema, and nothing else. And the parent iterator returns rows that also have this property.

Copy link
Copy Markdown
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a big improvement on what came before. Can't shake the feeling that we need to allocate result rows once at a top level and pin all field indexes at plan construction time.

for _, joinParent := range joinParents {
if sqa.OuterScopeVisibility && joinParent != nil {
if stripChild, ok := joinParent.Right().(*plan.StripRowNode); ok && stripChild.Child == sqa {
// if joinParent.Right() == sqa {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return i.fullRow[i.parentLen : i.parentLen+i.leftLen]
}

// leftColumns returns the values most recently yielded from the secondary/right child node.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

i.secondary = nil
if err != nil {
return nil, err
// Close cleans up the iterator by recusrively closing the children iterators.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@nicktobey nicktobey merged commit 011f655 into main Feb 13, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants