Conversation
martint
left a comment
There was a problem hiding this comment.
This is a good start. Let's add the corresponding AST node and a semantic check that fails with "unsupported" error. Otherwise, it will parse queries successfully and ignore the clause, which amounts to producing incorrect results.
There was a problem hiding this comment.
This is a semantic check that should be deferred until analysis.
There was a problem hiding this comment.
The semantic difference between LIMIT and FETCH is whether they allow row count equal to 0.
In my approach, in the course of processing the query, FETCH and LIMIT are distinguishable only as far as AstBuilder. That is why I put the semantic check (row count > 0) there.
To avoid that, LIMIT and FETCH could be carried distinctly until analysis which would enable FETCH-specific check there, and glued together afterwards. I consider this over-complicating, as they are essentially the same.
Or I could define a new token in grammar like POSITIVE_INTEGER and let parser do the check.
There was a problem hiding this comment.
I guess that works until we add percentage and WITH TIES.
Also, technically, the number can be arbitrarily large. The spec requires it to be "an exact numeric type with scale 0" (for rows -- for percent it can be any numeric type). So Long.parseLong might not be sufficient. Incidentally, specifying a larger value will fail with a NumberFormatException. Validating this in the analyzer allows us to provide better error messages and error codes.
Another issue is that parsing/formatting doesn't roundtrip. If you have a query with "FETCH FIRST x ROWS" it will be printed as "LIMIT x" by the SqlFormatter.
There was a problem hiding this comment.
OK, I see that there is more to analyse than just row count greater than zero or not. So the right approach is to keep LIMIT and FETCH distinct until analysis.
I think I'll keep them separate until QueryPlanner and there produce a LimitNode for a query with fetch.
There are no incorrect results, as I'm using the existing support for LIMIT so that it takes care of FETCH too. |
Ah, that's right. I overlooked the assigment to the |
|
Applied distinct |
There was a problem hiding this comment.
The feature is officially called "FETCH FIRST" in the specification, so let's refer to it that way in error messages, etc.
There was a problem hiding this comment.
a side note: user's query can use FETCH NEXT syntax. Does mentioning FETCH FIRST in error message be helpful?
Maybe FETCH FIRST/NEXT?
There was a problem hiding this comment.
I think @martint 's comment was about complying with the official naming and not about trying to recall in the error message how the actual query looked like.
Spec mentions "fetch first clause", "fetch first quantity", so probably the full name is "fetch first".
There was a problem hiding this comment.
A better way to do this would be to have the analyzer record the inferred or provided limit in the query Analysis object. The planner would just use that to formulate the LimitNode and not have to deal with distinguishing between these options.
There was a problem hiding this comment.
lower case "t1" for consistency with other usages.
There was a problem hiding this comment.
This approach makes it theoretically possible to construct syntactically illegal queries (i.e., with both LIMIT and FETCH FIRST clauses, potentially conflicting). A better approach is to replace the Optional<String> limit with an Optional<Node> limit and fill it in with either a Limit or a FetchFirst AST node (both of which need to be introduced).
This would also make the analysis more regular -- it just needs to delegate to method to analyze the corresponding node type (there'd be two of them), each of which would be responsible for ensuring the values are legal, etc., and record the inferred limit in the Analysis object to be used later by the planner.
There was a problem hiding this comment.
Please notice that a query mixing LIMIT and FETCH FIRST clauses wouldn't pass through the parser. Despite that I agree that they should share a field in Query / QuerySpecification, as they are mutually exclusive.
Should I introduce a common superclass for Limit and FetchFirst nodes? Or maybe for future Offset node too?
There was a problem hiding this comment.
Should I introduce a common superclass for Limit and FetchFirst
If we can come up with a reasonable name for it, yes -- there may not be a good generic concept that describes both. Otherwise, it's fine to keep as Node for now and maybe do a runtime check to ensure it's one of the supported types.
11353a1 to
3a6d651
Compare
|
Applied comments. |
bc5e7dc to
0a94d2c
Compare
|
No more formatting issues. |
martint
left a comment
There was a problem hiding this comment.
A few more comments. It's getting there. Good stuff!
There was a problem hiding this comment.
return primitive long and maybe check that the an entry exists in the limit map to avoid NPE in the off chance that there's a bug in the analyzer.
There was a problem hiding this comment.
I can't figure out how to store LIMIT ALL with primitive long.
With Long I used null value, but I suppose Optional<Long> would be the neatest solution with Optional.empty for LIMIT ALL.
There was a problem hiding this comment.
That seems reasonable, but I'd use OptionalLong, instead.
There was a problem hiding this comment.
@martint i was under impression that we prefer Optional<Long> over OptionalLong unless there is compelling (a.k.a. performance) reason to use the latter.
prestodb/presto#11302 (comment) & prestodb/presto#11302 (comment)
There was a problem hiding this comment.
I guess it depends on how it’s being used. This is very localized. No strong preference either way in this case, so Optional is ok, too.
There was a problem hiding this comment.
These two methods are unnecessary. Aside from delegating to analyzeLimit(node, clause) they only check to make sure the limit is present. If you inline them, the check is unnecessary, as that's ensured by the if condition in the methods above.
There was a problem hiding this comment.
Use fetchFirstOrLimit.getClass().getName(). getClass() returns a class object, whose toString() method outputs something like "class x.y.Z"
There was a problem hiding this comment.
limit should never be empty(), right? That would syntactically invalid.
There was a problem hiding this comment.
Yes, and at first I thought of private final String limit;, but I decided to use Optional, because originally there was Optional<String> limit field in Query and QuerySpecification.
Shall I add a check in the constructor?
There was a problem hiding this comment.
Let's just make it a private final String limit;. It was Optional<String> in Query/QuerySpecification to indicate that the clause could be missing. If there's a Limit node, the clause is present, so no need to have another level of Optional that will never be empty. The constructor should not take an Optional, either. It should be up to the caller to pass in the required value.
There was a problem hiding this comment.
The node in the exceptions should be the fetchFirst (or limit in the next method). That's used for generating the appropriate location in the error message.
There was a problem hiding this comment.
If you associate the limit value in the analysis with the node for the clause (FetchFirst or Limit) instead of the Query/QuerySpecification, this method can take an Optional<Node> limit and the code below can just say:
if (limit.isPresent()) {
planNode = new TopNNode(..., analysis.getLimit(limit.get()), ...);
}There was a problem hiding this comment.
Same comment as for the sort(...) method above
There was a problem hiding this comment.
Instead of generating a LIMIT ALL, use an Optional<Node> limit = Optional.empty() and assign Optional.of(new Limit("0")) in the if () block below.
There was a problem hiding this comment.
Same comment as in visitDescribeInput above
|
Applied comments. |
|
Merged, thanks! |
Relates to #1