-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema. #22619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #96881 has finished for PR 22619 at commit
|
|
Any behavior change? Test cases? |
|
Maybe this is related to #22448. |
|
@gatorsmile There should not be any behaviour change. I was thinking that existing test cases should suffice. Basically we used to duplicate the code of TypeCoercion.findTightestCommonType in here. Here i am just reusing the common function. This is tested in CSVInferSchemaSuite |
| } else { | ||
| Some(DecimalType(range + scale, scale)) | ||
| } | ||
| case (_, _) => None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _ => None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Thanks. Will change.
| Some(DecimalType(range + scale, scale)) | ||
| def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { | ||
| TypeCoercion.findTightestCommonType(t1, t2).orElse { | ||
| (t1, t2) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this out as a private val like the previous and leave a comment that this pattern matching is CSV specific? That will reduce the diff and makes the review easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, let's keep the comments in the original place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Did you have any preference or suggestion on the name of the val ? findCommonTypeExtended ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure. maybe just findCompatibleTypeForCSV
|
Looks okay - I checked a case one by one but it needs another look. |
|
Let's just file a JIRA @dilipbiswal BTW. |
|
@HyukjinKwon Okay. |
| findTightestCommonType(t1, DecimalType.forType(t2)) | ||
|
|
||
| // Double support larger range than fixed decimal, DecimalType.Maximum should be enough | ||
| // in most case, also have better precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments here are ignored in the change. Shall we keep them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya Yeah.. we should keep.. sorry.. got dropped inadvertently.
| Some(DoubleType) | ||
| } else { | ||
| Some(DecimalType(range + scale, scale)) | ||
| def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: findCompatibleType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya i kept the same name used in JsonInferSchema. Change that as well ? Or only change this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibleType is also fine if it is consistent with JsonInferSchema.
|
|
||
| case _ => None | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of new lines changes.
| * is compatible with both input data types. | ||
| */ | ||
| private def compatibleType(t1: DataType, t2: DataType): Option[DataType] = { | ||
| TypeCoercion.findTightestCommonType(t1, t2).orElse (findCompatibleTypeForCSV(t1, t2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: e ( -> e(
|
Test build #96890 has finished for PR 22619 at commit
|
|
Test build #96897 has finished for PR 22619 at commit
|
|
@HyukjinKwon Does this look okay now ? |
|
Yup. Let me leave this open few more days in case. |
|
@HyukjinKwon Sure :-) |
|
Merged to master. |
…Type while inferring CSV schema. ## What changes were proposed in this pull request? Current the CSV's infer schema code inlines `TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of the common type coercion code when applicable. This way we can take advantage of any improvement to the base method. Thanks to MaxGekk for finding this while reviewing another PR. ## How was this patch tested? This is a minor refactor. Existing tests are used to verify the change. Closes apache#22619 from dilipbiswal/csv_minor. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
What changes were proposed in this pull request?
Current the CSV's infer schema code inlines
TypeCoercion.findTightestCommonType. This is a minor refactor to make use of the common type coercion code when applicable. This way we can take advantage of any improvement to the base method.Thanks to @MaxGekk for finding this while reviewing another PR.
How was this patch tested?
This is a minor refactor. Existing tests are used to verify the change.