Skip to content

Comments

[SPARK-41759][CORE] Use weakIntern on string values in create new objects during deserialization#39275

Closed
panbingkun wants to merge 6 commits intoapache:masterfrom
panbingkun:SPARK-41759
Closed

[SPARK-41759][CORE] Use weakIntern on string values in create new objects during deserialization#39275
panbingkun wants to merge 6 commits intoapache:masterfrom
panbingkun:SPARK-41759

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Dec 29, 2022

What changes were proposed in this pull request?

The pr aims to use weakIntern on string values in create new objects during deserialization.

Why are the changes needed?

Following guid: #39270.
image

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA.

@mridulm
Copy link
Contributor

mridulm commented Dec 29, 2022

Primarily use weakIntern for cases where there are a large number of duplicated strings with same value (so app start won't qualify for ex), for the most common values.
Not for others

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@LuciferYang
Copy link
Contributor

@panbingkun We may need to investigate whether this change will change GC behavior

@LuciferYang
Copy link
Contributor

@gengliangwang @mridulm @panbingkun

Should we continue or close this one? I think we should complete this ticket before cut branch-3.4, , no matter how.

@gengliangwang
Copy link
Member

I think we can use the weakIntern where the method is used before this new project.
@panbingkun can you resolve conflicts and finish this one?

@panbingkun
Copy link
Contributor Author

I think we can use the weakIntern where the method is used before this new project.

Ok, let me do it.

@panbingkun
Copy link
Contributor Author

I think we can use the weakIntern where the method is used before this new project.

@panbingkun can you resolve conflicts and finish this one?

Done


private def deserializeJobData(info: StoreTypes.JobData): JobData = {
val description = getOptional(info.hasDescription, info.getDescription)
val description = getOptional(info.hasDescription, () => weakIntern(info.getDescription))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worthwhile to intern strings that aren't likely to be repeated. Descriptions don't seem to be that case. IDs and names also seem more unique-ish. We might want to be more targeted? there is overhead to interning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I don't think we need weakIntern here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should consider can be repeatedly accessed and used

@gengliangwang
Copy link
Member

image

@panbingkun there are 7 usages in from live entities, while there are 13 usages in protobuf serializer..

@panbingkun
Copy link
Contributor Author

image

@panbingkun there are 7 usages in from live entities, while there are 13 usages in protobuf serializer..

Base on rule: a large number of duplicated strings with same value; after refactor it, there are 18 usages in protobuf serializer, as follow:
image

val poolData = StoreTypes.PoolData.parseFrom(bytes)
new PoolData(
name = weakIntern(poolData.getName),
name = poolData.getName,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it violates the rule: a large number of, so remove it.

@github-actions github-actions bot removed the SQL label Jan 19, 2023
getOptional(binary.hasFailureReason, () => weakIntern(binary.getFailureReason))
val description =
getOptional(binary.hasDescription, () => weakIntern(binary.getDescription))
val failureReason = getOptional(binary.hasFailureReason, binary.getFailureReason)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@gengliangwang
Copy link
Member

Thanks, merging to master

@panbingkun
Copy link
Contributor Author

Thanks @gengliangwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants