Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This patch deduplicates the huge if statements regarding getting value between specialized getters.

How was this patch tested?

Existing UT.

this.handleUserDefinedType = handleUserDefinedType;
}

public Object read(SpecializedGetters obj, int ordinal, DataType dataType) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even make the class as utility class and no need to make object: just change this to static, and add two boolean parameters. If we prefer not to have object even it's only one for each class, I'll make a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, a new object isn't needed.
It could plausibly live in the SpecializedGetters interface in Java 8, but that might be weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried to add it to interface via default method, but looked weird so decided to have utility class for this.

@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103178 has finished for PR 24016 at commit f75b497.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class SpecializedGettersReader

@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103180 has finished for PR 24016 at commit 437dd08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
throw new UnsupportedOperationException("Unsupported data type " + dataType.simpleString());
}
return reader.read(this, ordinal, dataType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first glance it looks odd to pass the this pointer with the same type. Maybe it should be a static function only? Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unavoidable unless we add the method to the interface. I tried it but looked weird.

public Object read(SpecializedGetters obj, int ordinal, DataType dataType) {
if (handleNull && (obj.isNullAt(ordinal) || dataType instanceof NullType)) {
return null;
} else if (dataType instanceof BooleanType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: all the 'else's are redundant. It won't make a difference in the bytecode and isn't a big deal, just feels more readable to me if there are going to be 20 of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I imagine this is brought to reduce lines, but given we return every statement, removing else looks OK to me.


import org.apache.spark.sql.types.*;

public class SpecializedGettersReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

this.handleUserDefinedType = handleUserDefinedType;
}

public Object read(SpecializedGetters obj, int ordinal, DataType dataType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, a new object isn't needed.
It could plausibly live in the SpecializedGetters interface in Java 8, but that might be weird.

return obj.getInt(ordinal);
} else if (dataType instanceof TimestampType) {
return obj.getLong(ordinal);
} else if (dataType instanceof BinaryType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of evaluating Binary/String/DecimalType varied in the code this replaces. That shouldn't matter right -- none of these types are subtypes of the other.

I wonder, while we're here, is it sensible to put checks for more common types further up? for example, 'short' is pretty rare. String is probably more common than binary or decimal. Those are all maybe more common that date types. Etc. I understand for readability we might not want to move short, but might be worth rearranging the others for some performance.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, for performance, yes we can move commonly used types to top of huge if statements (though I don't have any statistic for this - we all know string should be placed to one of top), and for readability, this looks to be thoughtfully placed - primitive types first, dates, binary, string, complex types (CalendarInterval could be moved to dates group).

Either is fine for me, and if we want to rearrange to gain performance, we may need to pick some commonly used things according to actual usages. Do you have something to refer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have nothing but intuition here. I wouldn't make major changes. I think we can probably make the obvious ones, like, String should probably follow primitives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That's also intuitive to me. Will address. Please let me know if there're other obvious things.

@HeartSaVioR HeartSaVioR force-pushed the MINOR-deduplicate-get-from-specialized-getters branch from d049c0f to b7535df Compare March 11, 2019 22:15
@SparkQA
Copy link

SparkQA commented Mar 12, 2019

Test build #103348 has finished for PR 24016 at commit d1f00ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class SpecializedGettersReader

@SparkQA
Copy link

SparkQA commented Mar 12, 2019

Test build #103352 has finished for PR 24016 at commit b7535df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 13, 2019

Merged to master

@srowen srowen closed this in 733f2c0 Mar 13, 2019
@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the MINOR-deduplicate-get-from-specialized-getters branch March 13, 2019 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants