-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Intern hostnames in caching directory lister #17270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sopel39
commented
Apr 27, 2023
|
fyi @romanvainb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we generally avoid using weak-valued caches because this brings less predictibility for the memory management. So this small single line is a big change to me.
however, what about scoping the interner in the caller, eg in InternalHiveSplitFactory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we generally avoid using weak-valued caches because this brings less predictibility for the memory management. So this small single line is a big change to me.
The assumption is there won't be many hosts in the cluster so it should be negligible in practice.
however, what about scoping the interner in the caller, eg in InternalHiveSplitFactory?
I'm not sure what would that change. For example CachingDirectoryLister has global scope and it stores BlockLocations indirectly.
I wanted to avoid extra complexity here since the number of hosts should be small anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is there won't be many hosts in the cluster so it should be negligible in practice.
good point. can you add this as a code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stupid question: what about deployments which scale up and down over time? The number of interned hosts will keep increasing. What causes it to release hostnames which are no longer in use? GC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a stupid question, since I was about to ask the same one :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stupid question: what about deployments which scale up and down over time? The number of interned hosts will keep increasing.
Yes. Even if extreme cases I assume like 1000 hosts. They cannot go up/down very frequently, so the size of this structure will remain relatively stable over time (and small too).
GC will collect weak references eagerly (as soon as no-one references them)
findepi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% add a comment we assume the HOST_INTERNER will intern only very small limited number of names over time.