Validate no dangling columns in TableScanNode.enforcedConstraint#6963
Validate no dangling columns in TableScanNode.enforcedConstraint#6963findepi wants to merge 2 commits intotrinodb:masterfrom
Conversation
|
cc @martint @sopel39 @kokosing @losipiuk @erichwang i think we should have a contract about the while it could be handled by defensive coding in that particular place, it feels a wrong thing to do. I am fixing the immediate obstacle in #6959. Questions
|
bc03f4b to
904fd5d
Compare
|
Why do we keep dangling columns there? |
|
@sopel39 that's exactly my question. I don't think there is a reason, but wanted to hear some confirmation. |
|
I don't think we should keep dangling columns, What would be their purpose if they cannot be referenced anywhere from the plan? |
|
The enforced constraint caching was a side effect of previously not having an easy mechanism to retain to retain prior plan information after a given predicate push down optimization. For example, in this contrived demonstration: The hope was that the new iterative optimizer would allow secondary information (such as predicate relationships) to be retained alongside plans (but not as part of them). If this is possible now, then we should totally do that there. If not, than we should see if there is somewhere else to cache this type of information (since it has been some time since the original discussion). If the dangling columns are the main issue, it's possible they can be pruned in the current optimizer order (since I don't think we ever add a column back after it has been pruned), but this is not an impossibility in the future which we need to think about. |
In the case of your (contrived) example query, I would expect optimizer to push the predicate into Join's sources without retaining it, so it becomes once it's a Cross Join, it no longer matters whether Anyway, if there is case where Filter would be applied and re-applied, this would be concerning, as it could lead to overhead or optimizer loop. |
|
@findepi subsequent PPD executions might apply the same predicate over and over again. For example, partitioning predicate can be derived from join sides. See my old PR that tries to address that: prestodb/presto#11537. IIRC the problem there was getting expression equivalence for expressions like |
|
I think we should remove dangling columns from |
|
I think we should remove dangling columns from |
I believe so, but someone would need to go over the code and check the usags.
Thanks @sopel39 . Anyway, this leads us to the second question:
|
Yes, we still need this until we can come up with a better way to represent plan metadata that is not directly part of the plan.
This is more of a philosophical question than a practical one. Practically speaking, I don't think the dangling columns are needed because pruned columns are not currently brought back in any way in our planner. Philosophically, doing so would result in some lost information that in theory could be used again if old pruned columns were added back. However, in later planner developments, it looks like we aren't preserving the full context anyways, so I have no problems just trimming out the dangling columns at this point. If we do so, please remember to clean up the assignments field too (which was intentionally not pruned before to handle the coverage of the unenforcedConstraints). |
Technically we could use |
|
@erichwang thanks for your feedback!
Interesting, i didn't know that.
@martint can you please comment ACK here that I am not removing something dear to you? |
Actually, let me walk that back about the assignments. It looks like the usage has expanded since I last saw this. The assignments are used in EffectivePredicateExtractor now to convert arbitrary table ColumnHandles returned by metadata (which don't know about what projections are needed): https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/planner/EffectivePredicateExtractor.java#L255 |
No, dangling columns should be removed. Once they are not present in the table scan they are, for all intents and purposes, "gone" from the plan. |
|
thanks @erichwang @sopel39 @martint for your comments. I will work on getting the tests succeed with the check and ask for proper review then. |
904fd5d to
8705994
Compare
8705994 to
07d94ea
Compare
There was a problem hiding this comment.
nit: can we move this to a checkNoDanglingColumns function?
There was a problem hiding this comment.
i prefer not to call functions within constructor (in general at least), so if there isn't a compelling reason to do so, would prefer to keep it here
There was a problem hiding this comment.
I mean to make it a static function. We call plenty of those here (like checkArg, checkState, etc). The reason is that (A) conveniently gives a name to the operation for documentation purposes (B) limits the scope of what this block of code is doing so that it doesn't have creep in the future.
There was a problem hiding this comment.
sure, i can do this, if you feel strongly about it.
i'd prefer to keep as is, though
There was a problem hiding this comment.
yea let's do it. I usually prefer that than having to leave more comments, which this would need without that function context)
There was a problem hiding this comment.
let's also add a comment here that assignments can contain dangling columns (for now)
There was a problem hiding this comment.
What do you mean? Also, is it related to checkArgument(assignments.keySet().containsAll(outputs), "assignments does not cover all of outputs"); check above?
There was a problem hiding this comment.
That check verifies that assignments contains outputs, but assignments can still have more symbols in it than what output produces. So in that sense assignments can have symbol to column handle mappings that are no longer referenced. But as I mentioned in a comment above, this is needed for some obscure reasons in the EffectivePredicateExtractor.
07d94ea to
68475a0
Compare
The test was constructing `TableScan` with a `Domain` on a non-projected column, but such columns should be removed from the plan together with associated constraints. The test input is fixed to adhere to this.
68475a0 to
6bc1821
Compare
No description provided.