ARROW-15582: [C++] Add support for registering tricky functions with the Substrait consumer (or add a bunch of substrait meta functions)#13285
Conversation
|
|
westonpace
left a comment
There was a problem hiding this comment.
I think we can make a pass at simplifying some things. A lot of these lambdas seem to follow a consistent pattern. This is a good start however! Excited to see it.
There was a problem hiding this comment.
Should we return an invalid status as an else clause here?
There was a problem hiding this comment.
Returning an AlreadyExist status.
There was a problem hiding this comment.
Same, perhaps the else should be an invalid status.
There was a problem hiding this comment.
Returning a AlreadyExist status here, will that be better?
a56a556 to
b28e5b1
Compare
41fc8fe to
7e27a7f
Compare
westonpace
left a comment
There was a problem hiding this comment.
Some suggestions from a quick scan.
| return compute::call(func_name, std::move(arguments), std::move(cast_options)); | ||
| } | ||
| case substrait::Expression::kEnum: { | ||
| auto enum_expr = expr.enum_(); |
There was a problem hiding this comment.
Does this convert to the string value of the enum? Can you add a small comment here explaining that.
| } | ||
|
|
||
| Status FunctionMapping::AddArrowToSubstrait(std::string arrow_function_name, ArrowToSubstrait conversion_func){ | ||
| if (arrow_to_substrait.find(arrow_function_name) != arrow_to_substrait.end()){ |
There was a problem hiding this comment.
This logic seems backwards to me...wouldn't umap.find(...) != umap.end() mean the item already existed?
| } | ||
|
|
||
| Status FunctionMapping::AddSubstraitToArrow(std::string substrait_function_name, SubstraitToArrow conversion_func){ | ||
| if (substrait_to_arrow.find(substrait_function_name) != substrait_to_arrow.end()){ |
There was a problem hiding this comment.
Same as above. This seems backwards (but maybe I'm just not thinking right)
| } | ||
| } | ||
|
|
||
| std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ |
There was a problem hiding this comment.
| std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ | |
| std::vector<arrow::compute::Expression> ConvertSubstraitArguments(const substrait::Expression::ScalarFunction& call){ |
|
|
||
| std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ | ||
| substrait::Expression value; | ||
| ExtensionSet ext_set_; |
There was a problem hiding this comment.
| ExtensionSet ext_set_; | |
| ExtensionSet ext_set; |
There was a problem hiding this comment.
This seems strange. Wouldn't this function take in an extension set as an argument?
|
|
||
| substrait::Expression::ScalarFunction arrow_convert_enum_arguments(const arrow::compute::Expression::Call& call, substrait::Expression::ScalarFunction& substrait_call, ExtensionSet* ext_set_, std::string overflow_handling){ | ||
| substrait::Expression::Enum options; | ||
| options.set_specified(overflow_handling); |
There was a problem hiding this comment.
overflow_handling seems like an odd name given this is a generic function
| return arrow::compute::call("abs", substrait_convert_arguments(call)); | ||
| }; | ||
|
|
||
| ArrowToSubstrait arrow_add_to_substrait = [] (const arrow::compute::Expression::Call& call, ExtensionSet* ext_set_) -> Result<substrait::Expression::ScalarFunction> { |
There was a problem hiding this comment.
There's a lot of places where you have ext_set_ and it should probably be ext_set. For the sake of brevity I'm not going to mark them all.
| substrait::Expression::ScalarFunction substrait_call; | ||
| ARROW_ASSIGN_OR_RAISE(auto function_reference, ext_set_->EncodeFunction("extract")); | ||
| substrait_call.set_function_reference(function_reference); |
There was a problem hiding this comment.
All of these calls to EncodeFunction seem pretty repetitive. Is there any way we can move this into the part that calls GetArrowToSubstrait? Also, I don't see anything today that calls GetArrowToSubstrait
| } | ||
| }; | ||
|
|
||
| ArrowToSubstrait arrow_year_to_arrow = [] (const arrow::compute::Expression::Call& call, ExtensionSet* ext_set_) -> Result<substrait::Expression::ScalarFunction> { |
| DCHECK_OK(functions_map.AddSubstraitToArrow(id.name.to_string(), conversion_func)); | ||
| return RegisterFunction(id, id.name.to_string()); |
There was a problem hiding this comment.
It seems a little odd that we need two maps. What happens if two functions exist with the same name but different URIs? Thinking on this longer, maybe substrait_to_arrow should replace the map in the extension id registry (that gets updated by the call to RegisterFunction?)
…ctions (#13613) This picks up where #13285 has left off. It mostly focuses on the Substrait->Arrow direction at the moment. In addition, basic support is added for named tables. This makes it possible to create unit tests that read from in-memory tables instead of requiring unit tests to do a scan. The PR creates some utilities in `test_plan_builder.h` which allow for the construction of simple Substrait plans programmatically. This is used to create unit tests for the function mapping. The PR extracts id "ownership" out of the `ExtensionIdRegistry` and into its own `IdStorage` class. The PR gets rid of `NestedExtensionIdRegistryImpl` and instead makes `ExtensionIdRegistryImpl` nested if `parent_ != nullptr`. Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
[WIP]
This PR adds function mappings for compute functions from substrait to arrow and vice-versa. This introduces a
FunctionMappingclass to register and store the mappings and supply when required. Registering a function includes encoding the various options and arguments in the respective mapping function's definition.