Add TypeSignature Parser using Flex and Bison#7590
Add TypeSignature Parser using Flex and Bison#7590majetideepak wants to merge 4 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
e632be9 to
5dad298
Compare
|
This PR works on MacOS. I will fix the build problem on Ubuntu after #7568 lands. |
mbasmanova
left a comment
There was a problem hiding this comment.
@majetideepak Deepak, this parser is specific to Presto, right? It can parse type strings generated by Presto and supports Presto-specific types like TIMESTAMP WITH TIME ZONE and JSON, right? If that's the case, perhaps, it is better placed in Presto-specific directory. Maybe somewhere under functions/prestosql.
CC: @pedroerp
|
@mbasmanova This PR is to parse TypeSignatures which is invoked when binding function signatures in Velox. This extends the Presto-specific parser you are referring to which is here #7568 |
|
Currently, only the flex file is re-used. But we could decouple both and move #7568 under functions/prestosql. |
addc80e to
3982990
Compare
0871c5a to
cf02cd4
Compare
2e8eaa1 to
13eeea0
Compare
|
|
||
| namespace facebook::velox::exec { | ||
|
|
||
| void toAppend( |
There was a problem hiding this comment.
where is this used? should this be in an anonymous namespace?
There was a problem hiding this comment.
This is required by folly::join. I moved this as is from FunctionSignature.*. I will try an anonymous namespace.
There was a problem hiding this comment.
an anonymous namespace does not work here.
| namespace facebook::velox::exec { | ||
|
|
||
| TypeSignaturePtr inferTypeWithSpaces( | ||
| std::vector<std::string>& words, |
| std::vector<std::string>& words, | ||
| bool cannotHaveFieldName) { | ||
| VELOX_CHECK_GE(words.size(), 2); | ||
| std::string fieldName = words[0]; |
| /// the remaining words are a Velox type. | ||
| TypeSignaturePtr inferTypeWithSpaces( | ||
| std::vector<std::string>& words, | ||
| bool cannotHaveFieldName = true); |
There was a problem hiding this comment.
nit: can we flip the flag to canHaveFieldName instead? Makes it a little easier to understand that the double negative.
|
|
||
| #include "velox/expression/signature_parser/SignatureParser.h" | ||
|
|
||
| facebook::velox::exec::TypeSignaturePtr facebook::velox::exec::parseTypeSignature(const std::string& signatureText) |
There was a problem hiding this comment.
Scanner takes a string_view. Could we also take a string_view here?
There was a problem hiding this comment.
std::istringstream is(signatureText); below does not accept a string_view.
|
|
||
| %% | ||
|
|
||
| type_spec : type { scanner->setTypeSignature($1); } |
There was a problem hiding this comment.
it's a bummer we need to repeat the type grammar here. Is there no way refactor and include that part?
There was a problem hiding this comment.
Each rule returns a TypeSignature here vs. a TypePtr in the Type Parser.
To unify both, we have to make each rule return strings and finally return a parse tree of strings. Then each parser has to recurse the tree again to convert the string to a TypeSignature or TypePtr.
I am planning to use the TypeParser in Prestissimo to replace Antlr here.
I feel the double-pass approach would be inefficient.
There was a problem hiding this comment.
I looked at this further.
flex/bison are not designed to reuse the grammar.
We could use a union/std::variant to encapsulate TypeSignature TypePtr into a single variable and make this work via virtual functions.
I started this and it turned out to be more complex. Some observations below:
- The lex file (.ll) cannot be reused since it implements the
parseTypeorparseSignatureAPIs. This cannot be moved out. This means only the Bison parser file (.yy) can be reused if at all. - There are some differences in the decimal grammar. eg.
DECIMAL(a_precision, b_precision)is a valid function signature but not a valid type. - Row fields layout is different for TypeSignature vs RowType. This will introduce more customization using virtual functions.
The parsing grammar is not going to change in the future. So duplication of the Bison file should be okay.
There was a problem hiding this comment.
Ok, thanks for looking into this!
| ASSERT_EQ(signature.baseName(), "row"); | ||
| ASSERT_EQ(signature.parameters().size(), 1); | ||
| } | ||
|
|
There was a problem hiding this comment.
the missing feature from the existing code was to support quoted named row fields. Could you add a few tests to ensure that works now?
There was a problem hiding this comment.
I added a test for this.
|
FYI - The signature check job seems to be failing because 'DECIMAL' changed to 'decimal' and right now that is taken as a different type :) . I will make the signature check job type insensitive. |
46e8f57 to
d088ca0
Compare
|
@kgpai thank you for looking. I resolved the case insensitive issue in my PR. |
pedroerp
left a comment
There was a problem hiding this comment.
Thank you! Just a small last comment but looks like it's almost ready to go :)
| const auto& fieldName = words[0]; | ||
| auto typeName = words[1]; | ||
| for (int i = 2; i < words.size(); ++i) { | ||
| typeName = fmt::format("{} {}", typeName, words[i]); |
There was a problem hiding this comment.
I guess this will have quadratic time. Could we use folly::join() instead?
| typeName = fmt::format("{} {}", typeName, words[i]); | ||
| } | ||
| const auto allWords = fmt::format("{} {}", fieldName, typeName); | ||
| if (hasType(allWords) || !canHaveFieldName) { |
There was a problem hiding this comment.
maybe you just concatenate all of them, and use a std::string_view to skip the first token? (assuming hasToken() can take a string_view)
There was a problem hiding this comment.
I did this for both TypeParser and SignatureParser. For TypeParser, we will need a substring if there is a field name.
|
|
||
| %% | ||
|
|
||
| type_spec : type { scanner->setTypeSignature($1); } |
There was a problem hiding this comment.
Ok, thanks for looking into this!
ab5ef3c to
376b1c0
Compare
|
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Pull Request resolved: facebookincubator#7590 Reviewed By: kgpai Differential Revision: D51989804 Pulled By: pedroerp fbshipit-source-id: 82ac7a26fa8619ba13288c2fc15d70a6781c0911
No description provided.