-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Coral's RelNode for views from base table schema #409
Conversation
…ess attempted from single union type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @aastha25 for this PR!
Could you please add more details to the PR description and the code comment to clarify the necessity of the uniontype part changes? i.e. why it will fail if we don't make those changes, preferably with simple example?
...hive/src/main/java/com/linkedin/coral/transformers/SingleUnionFieldReferenceTransformer.java
Show resolved
Hide resolved
@ljfgem I have added an example clarifying the usecase in the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @aastha25 !
Should not this PR change the behavior of |
This PR sets the return type of |
Discussed offline. We will create a follow-up PR to address this since the current approach: 1- breaks |
…nkedin#409)" This reverts commit 76789bf.
…nkedin#409) (linkedin#489) * Revert "Generate Coral's RelNode for views from base table schema (linkedin#409)" This reverts commit 76789bf. * ./gradlew spotlessApply * modify workflow action
…hema (linkedin#409) (linkedin#489)" This reverts commit 40f6958. # Conflicts: # .github/workflows/ci.yml
What changes are proposed in this pull request, and why are they necessary?
This PR includes the following fixes:
(1) Includes changes from #394 which aimed to remove CAST in HiveViewTable.toRel to solve Nested View Stale Schema Issue.
(2) Enables avro schema generation for a dataset with map field with NULL key
(3) When a field schema is a UnionType of a single data type, such as
uniontype<array<string>>
, surface the underlying field & its schema in Coral's SqlNode RelNode representation, such asarray<string>
.SingleUnionFieldReferenceTransformer
has been introduced for backward compatibility.For a table
t1
: (f1: uniontype ) and a viewv1
defined on top of itSELECT custom_udf_to_remove_single_type_union(f1) AS f2 FROM t1
v1
has a schema: (f2: int)The RelNode representation, prior to dropping the CAST operator in the RelNode, would force the derived data type to be
int
for f2, as expected. Hence, the generated avro schema could be analyzed by the Spark engine. However, after dropping the CAST, as documented here, the derived type isStruct<tag_0:int>
. The Avro schema is incorrectly generated and cannot be analysed by the Spark engine.The resolution is to derive the data type of single uniontype fields like
f1
as the underlying data type (int in this case).How was this patch tested?
./gradlew spotlessApply
./gradlew build
Updated UTs
Thoroughly tested production views across all supported engines.