Skip to content

Coral-Spark: Build awareness in Coral for coalsce_struct UDF#193

Merged
ljfgem merged 12 commits intolinkedin:masterfrom
autumnust:udf_swap
Dec 8, 2021
Merged

Coral-Spark: Build awareness in Coral for coalsce_struct UDF#193
ljfgem merged 12 commits intolinkedin:masterfrom
autumnust:udf_swap

Conversation

@autumnust
Copy link
Copy Markdown
Contributor

@autumnust autumnust commented Nov 10, 2021

This patch includes change to swap extract_union UDF with coalesce_struct UDF in Coral-Spark during Coral RelNode -> Spark RelNode.

In #192, we have changed the interpretation of union field in IR to a struct that aligns with Trino's after-explosion schema. The output struct which contains tag field and names each member field as fieldN is backward incompatible with the schema from extract_union UDF. This patch is need to bridge the schema difference between both.


Note that coalesce_struct UDF will be statically registered.

@wmoustafa
Copy link
Copy Markdown
Contributor

Should the coalesce_struct UDF be part of Coral?

Copy link
Copy Markdown
Contributor

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should both Spark and Trino be handled in the same PR?

@autumnust
Copy link
Copy Markdown
Contributor Author

autumnust commented Nov 15, 2021

Should the coalesce_struct UDF be part of Coral?

Why is that if that is going to be a Hive UDF, or do you think it should be statically defined in Coral and being loaded into engines when being used ?

@autumnust
Copy link
Copy Markdown
Contributor Author

Should both Spark and Trino be handled in the same PR?

Before the change, only Spark IR Transformer (during IR -> Spark RelNode) handles the extract_union UDF (by getting rid of it). Trino path simply cannot execute SQL statements with extract_union as part of it. Do we want to preserve that behavior or you believe we should also get rid of extract_union in the trino SQL whenever appearing ?

@wmoustafa
Copy link
Copy Markdown
Contributor

Should the coalesce_struct UDF be part of Coral?

Why is that if that is going to be a Hive UDF, or do you think it should be statically defined in Coral and being loaded into engines when being used ?

I meant the GenericUDF definition, yes. But, I think it is okay if the function is not part of Coral.

@wmoustafa
Copy link
Copy Markdown
Contributor

Should both Spark and Trino be handled in the same PR?

Before the change, only Spark IR Transformer (during IR -> Spark RelNode) handles the extract_union UDF (by getting rid of it). Trino path simply cannot execute SQL statements with extract_union as part of it. Do we want to preserve that behavior or you believe we should also get rid of extract_union in the trino SQL whenever appearing ?

Ok, makes sense. We can restrict extract_union usage to Spark.

@autumnust
Copy link
Copy Markdown
Contributor Author

@wmoustafa Can you take a second look ? thanks !

@autumnust autumnust changed the title [WIP: Don't merge] Coral-Spark: Build awareness in Coral for coalsce_struct UDF Coral-Spark: Build awareness in Coral for coalsce_struct UDF Nov 17, 2021
Copy link
Copy Markdown
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @autumnust for this PR!

@autumnust
Copy link
Copy Markdown
Contributor Author

autumnust commented Nov 23, 2021

Looks like the Github action is failing with weird reason:


> Task :coral-hive:compileTestJava FAILED
    ToRelConverterTestUtils.setup();
                           ^
  required: HiveConf
  found: no arguments
  reason: actual and formal argument lists differ in length
Note: /home/runner/work/coral/coral/coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/parsetree/ParseTreeBuilderTest.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error

@ljfgem have you seen this ? this failure couldn't be reproduced locally and the error message doesn't seem to make sense since there's only one setup method without no arguments. Binding issue ?

@wmoustafa
Copy link
Copy Markdown
Contributor

@ljfgem have you seen this ? this failure couldn't be reproduced locally and the error message doesn't seem to make sense since there's only one setup method without no arguments. Binding issue ?

You probably need to rebase. This method has changed in #198.

Copy link
Copy Markdown
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @autumnust for this patch!

Copy link
Copy Markdown
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks
@wmoustafa @funcheetah PTAL if you get a chance.

… seems to require the PR concerning on IR type first
…on in type conversion within HiveReturnTypes.java
…into coral-hive package;

Addressed the comments by adding nested union unit test cases and verifying the RelDataType of coalesce_struct's output
Copy link
Copy Markdown
Contributor

@funcheetah funcheetah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @autumnust ! LGTM

@ljfgem ljfgem merged commit 04d92ac into linkedin:master Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants