-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Indicate progress only if limit is applied #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange when given limit is supported we return
Optional.empty()like in the case where limit is not supported.It is not practical case, but in case where you have subsequent limits then only first of the would be pushed down to connector. However this example gets more practical in case of predicate pushdown.
IMO we should handle that in optimizer (engine) code, instead requiring all connectors to have
iflike that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional.empy() just means "applying the provided limit has no effect (so don't try to do it again)". I don't think we need to distinguish between supported and not supported. It makes for awkward implementations that never return
Optional.empty()and indicate no progress with a special flag in the object they return.In the long term (when we connectors can provide custom Rule instances), this will be replaced by a Rule. Connectors would indicate they support limit pushdown by providing a rule, and the rule will return Rule.Result.empty() to indicate it had no effect or couldn't be applied in that particular case.
Subsequent limits will be pushed if they are smaller than the current limit. If, for some reason, you end up with a larger limit on top of a table scan that has a smaller limit applied, the right way to fix this is for the engine to eliminate the Limit after determining that the max number of rows produced by the TableScan is going to be smaller, similar to how we do it in #441
How do you suggest we do that? There has to be a way for the connector to signal that the "applying the provided limit has no effect (so don't try to do it again)", or the optimizer will loop forever. An alternative is to introduce machinery like we have for Rule (patterns, etc), but that would be a lot of work and would probably require duplicating a lot of code due to limitations of what can go in the SPI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rule could get table property before and after
applyLimit. Then it could derive that returned limit is the same as previously. However, currently there is no such thing asLimitPropertyso it might just not be worth it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be nice if the engine could handle this, otherwise it's more difficult and error prone to write connectors. But it sounds like we don't have a feasible way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LimitPropertysounds good to me.Consider an example that connector knows a lot about its tables. It knows things like tables cardinality, partitioning. For a table with a lower cardinality than limit, connector in first rule application would need to return
Optional.empty()(saying no progress), but for engine it looks like connector is not capable to push down limit. In regards to LIMIT is not a big deal, however in case of partitioning or other SQL fragments it might affect query latency.In my opinion if we are going to push down some property to connector, we need to have an ability for connector to say what are actual properties. The there is no need to even call
applyLimitin case where required properties are already satisfied with actual properties.We do not need such machinery. We can handle this manually in Rule code. Like patterns that are too complicated for pattern matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a test. It might not catch all of the cases, but it still may catch some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LimitProperty, or rather, "max row count" or "max cardinality", would work for the case where the limit is guaranteed by the connector. However, if the connector can't guarantee a limit, such as a when pushing the limit to a parallel database, it wouldn't be able to describe the "max cardinality" of the derived table, so the engine would have no way of knowing whether it's safe to callapplyLimitagain.The other downside of relying on that property is that it requires connectors to implement two seemingly disconnected APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a test that would catch this if connectors implemented it wrong:
AbstractTestIntegrationSmokeTest.testLimit().