Copy source before modifying it in GroupBy #972
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Copies the sources passed to a GB before applying modifications (normalizing and sanitizing).
There is special handling for join sources because the reference to the
Joinmust be maintained to set the object name correctly from GC referrers.Why / Goal
To prevent unexpected modifications to sources when shared across Joins/GBs.
Sources may be reused across Joins and GBs. The sanitize code here modifies the source in-place, which could cause unintended behaviors for GBs that reuse the source and don't expect those modifications. One specific example:
group_by1usessource1. One of its aggregations hasinput_col1as the input column. If that column is not present insource1,_sanitize_columnswill add it to the query selects.group_by2also usessource1. It does not haveinput_col1in any of its aggregations. Ifgroup_by1was loaded beforegroup_by2, its source would include that column unexpectedly.Test Plan
Checklist
Reviewers