Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-schema] Generalize operand schema inference on ordinal return type UDF calls #548

Merged
merged 4 commits into from
Feb 10, 2025

Conversation

KevinGe00
Copy link
Contributor

What changes are proposed in this pull request, and why are they necessary?

Currently, coral-schema is only set up to correctly infer schemas for UDFs with an ordinal return type, only if the ordinal operand is a reference to an input column. However, this is limiting and we have since encountered views with nested calls on UDFs with an ordinal return type. For example, UDF_A('foo', UDF_A('foo', colA)), where the return type of UDF_A is whatever the return type of it's second operand is.

We correct this by using the SchemaRexShuttle to visit the ordinal operand, which will infer the schema of the operand and add it to the final schema as a field. This way, we leverage the shuttle to categorize and process the operand for us so we don't have to manually categorize the operand.

How was this patch tested?

  • New unit test to schema inference on nested UDF calls with ordinal return types
  • ./gradlew clean build
  • regression test passes
  • tested schema produced on spark-shell for a view with previously incorrect schema (due to incorrect nullabilites)

@simbadzina
Copy link

Did the regression tests introduce any changes to view text?

@KevinGe00
Copy link
Contributor Author

KevinGe00 commented Feb 3, 2025

Did the regression tests introduce any changes to view text?

The regression tests did not introduce any changes. FYI I only ran the regression for avro schema translations since that is the only possible translation path that this change could introduce changes in.

return rexCall;
}
operand.accept(this);
return rexCall;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier return here causes the view doc string to no longer be produce. Discussed this with @KevinGe00 offline and it is a reasonable change given it is now more complex to construct the doc string when we recurse into the UDF.

appendRexInputRefField((RexInputRef) operand);
return rexCall;
}
operand.accept(this);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a change in the capitalization in the field name. This is because the string are now being fetched from the underlying schema instead of being generated programmatically by Coral.

Copy link

@simbadzina simbadzina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Secondary changes to the schema noted in other comments.

@KevinGe00 KevinGe00 merged commit f6ce2be into linkedin:master Feb 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants