-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[WIP] [HUDI-625] Fixing performance issues around DiskBasedMap & kryo #1351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- This is very rough cut of few things I tried; Just for sharing purposes - Kryo needs serializers and once we add them, the ser/deser is fast and writing finishes 10-20x faster - DiskbasedMap is tracking too many things redundantly and incurring its cost as well. - TODO : Need to break the kryo and map fix in differnt PRs - TODO : For map entry thinning, need to handle compaction, fix code structure, tests - TODO : For kyro, one more pass with good understanding of APIs, tests, null handling, cleanup
|
This is a great start. 👍 IMO, because we already set InstantiatorStrategy, so we needn't register class agian. From kryo guide[1], we also modify this: So, we do bellow changes will be ok
[1] https://github.com/EsotericSoftware/kryo#object-creation |
|
@vinothchandar this is interesting, why does Kryo need serializer implementations ? Ohh, just saw @lamber-ken's response -> the StdInstantiatorStrategy will allow kryo to fall back to Java Serde, is that what we want ? |
|
@lamber-ken has a smaller fix that probably does not involve explicit serializers. but the way we implemented kryo Serialization is definitely problematic. |
|
@lamber-ken Please assign HUDI-625 to yourself, when aapche issues site comes back online.. I will take the @n3nash wdyt ? I think the per-record overhead of HoodieRecord need not be persisted to the DiskBasedMap i.e just have |
|
Thanks @vinothchandar, had opend a new pr #1352 : ) |
Hi @n3nash, By default, an instantiator is returned that uses reflection if the class has a zero argument constructor, an exception is thrown. If a More, the key point is the previous KryoBase in The "super.getInstantiatorStrategy().newInstantiatorOf(type).newInstance();", this will cause call it each time |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will open a new PR with just diskmap changes.. HUDI-635 will track it
| return (T) SERIALIZER_REF.get().deserialize(objectData); | ||
| } | ||
|
|
||
| public static class HoodieKeySerializer extends Serializer<HoodieKey> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are no longer needed
|
Clo |

Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.