-
Notifications
You must be signed in to change notification settings - Fork 29k
[MINOR][SQL] Deduplicate huge if statements in get between specialized getters #24016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR][SQL] Deduplicate huge if statements in get between specialized getters #24016
Conversation
| this.handleUserDefinedType = handleUserDefinedType; | ||
| } | ||
|
|
||
| public Object read(SpecializedGetters obj, int ordinal, DataType dataType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even make the class as utility class and no need to make object: just change this to static, and add two boolean parameters. If we prefer not to have object even it's only one for each class, I'll make a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree, a new object isn't needed.
It could plausibly live in the SpecializedGetters interface in Java 8, but that might be weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried to add it to interface via default method, but looked weird so decided to have utility class for this.
|
Test build #103178 has finished for PR 24016 at commit
|
|
Test build #103180 has finished for PR 24016 at commit
|
| } else { | ||
| throw new UnsupportedOperationException("Unsupported data type " + dataType.simpleString()); | ||
| } | ||
| return reader.read(this, ordinal, dataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the first glance it looks odd to pass the this pointer with the same type. Maybe it should be a static function only? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unavoidable unless we add the method to the interface. I tried it but looked weird.
| public Object read(SpecializedGetters obj, int ordinal, DataType dataType) { | ||
| if (handleNull && (obj.isNullAt(ordinal) || dataType instanceof NullType)) { | ||
| return null; | ||
| } else if (dataType instanceof BooleanType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: all the 'else's are redundant. It won't make a difference in the bytecode and isn't a big deal, just feels more readable to me if there are going to be 20 of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I imagine this is brought to reduce lines, but given we return every statement, removing else looks OK to me.
|
|
||
| import org.apache.spark.sql.types.*; | ||
|
|
||
| public class SpecializedGettersReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
| this.handleUserDefinedType = handleUserDefinedType; | ||
| } | ||
|
|
||
| public Object read(SpecializedGetters obj, int ordinal, DataType dataType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree, a new object isn't needed.
It could plausibly live in the SpecializedGetters interface in Java 8, but that might be weird.
| return obj.getInt(ordinal); | ||
| } else if (dataType instanceof TimestampType) { | ||
| return obj.getLong(ordinal); | ||
| } else if (dataType instanceof BinaryType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of evaluating Binary/String/DecimalType varied in the code this replaces. That shouldn't matter right -- none of these types are subtypes of the other.
I wonder, while we're here, is it sensible to put checks for more common types further up? for example, 'short' is pretty rare. String is probably more common than binary or decimal. Those are all maybe more common that date types. Etc. I understand for readability we might not want to move short, but might be worth rearranging the others for some performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, for performance, yes we can move commonly used types to top of huge if statements (though I don't have any statistic for this - we all know string should be placed to one of top), and for readability, this looks to be thoughtfully placed - primitive types first, dates, binary, string, complex types (CalendarInterval could be moved to dates group).
Either is fine for me, and if we want to rearrange to gain performance, we may need to pick some commonly used things according to actual usages. Do you have something to refer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I have nothing but intuition here. I wouldn't make major changes. I think we can probably make the obvious ones, like, String should probably follow primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. That's also intuitive to me. Will address. Please let me know if there're other obvious things.
d049c0f to
b7535df
Compare
|
Test build #103348 has finished for PR 24016 at commit
|
|
Test build #103352 has finished for PR 24016 at commit
|
|
Merged to master |
|
Thanks all for reviewing and merging! |
What changes were proposed in this pull request?
This patch deduplicates the huge if statements regarding getting value between specialized getters.
How was this patch tested?
Existing UT.