[SPARK-41759][CORE] Use weakIntern on string values in create new objects during deserialization#39275
[SPARK-41759][CORE] Use weakIntern on string values in create new objects during deserialization#39275panbingkun wants to merge 6 commits intoapache:masterfrom
weakIntern on string values in create new objects during deserialization#39275Conversation
…bjects during deserialization
|
Primarily use weakIntern for cases where there are a large number of duplicated strings with same value (so app start won't qualify for ex), for the most common values. |
…bjects during deserialization
|
Can one of the admins verify this patch? |
|
@panbingkun We may need to investigate whether this change will change GC behavior |
|
@gengliangwang @mridulm @panbingkun Should we continue or close this one? I think we should complete this ticket before cut branch-3.4, , no matter how. |
|
I think we can use the weakIntern where the method is used before this new project. |
Ok, let me do it. |
…bjects during deserialization
Done |
|
|
||
| private def deserializeJobData(info: StoreTypes.JobData): JobData = { | ||
| val description = getOptional(info.hasDescription, info.getDescription) | ||
| val description = getOptional(info.hasDescription, () => weakIntern(info.getDescription)) |
There was a problem hiding this comment.
I wonder if it's worthwhile to intern strings that aren't likely to be repeated. Descriptions don't seem to be that case. IDs and names also seem more unique-ish. We might want to be more targeted? there is overhead to interning
There was a problem hiding this comment.
+1, I don't think we need weakIntern here.
There was a problem hiding this comment.
+1, we should consider can be repeatedly accessed and used
|
@panbingkun there are 7 usages in from live entities, while there are 13 usages in protobuf serializer.. |
…bjects during deserialization
Base on rule: |
…bjects during deserialization
| val poolData = StoreTypes.PoolData.parseFrom(bytes) | ||
| new PoolData( | ||
| name = weakIntern(poolData.getName), | ||
| name = poolData.getName, |
There was a problem hiding this comment.
it violates the rule: a large number of, so remove it.
…bjects during deserialization
| getOptional(binary.hasFailureReason, () => weakIntern(binary.getFailureReason)) | ||
| val description = | ||
| getOptional(binary.hasDescription, () => weakIntern(binary.getDescription)) | ||
| val failureReason = getOptional(binary.hasFailureReason, binary.getFailureReason) |
|
Thanks, merging to master |
|
Thanks @gengliangwang |


What changes were proposed in this pull request?
The pr aims to use weakIntern on string values in create new objects during deserialization.
Why are the changes needed?
Following guid: #39270.

Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA.