-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
The current implementation builds an internal cache of dataNode IP/Label. If the request is a non-dataNode IP (which is usually the case), then these IPs are "impersonated" by the datanodes i.e. we first identify the datanode running on the same kubernetes node as the (calling) IP, and then save that to the cache. In other words, “impersonation” happens because the topology lookup for any IP on a Kubernetes node is satisfied using the DataNode pod scheduled on that same node, even if the caller is not a DataNode.
Important
This means that a client/worker running on node N gets assigned the same topology ID as the DataNode on node N.
But this is somewhat inefficient as this resolution step requires at least one call-out to get all Pods in the current namespace. This can be improved by building a cache for all pod IPs in the namespace (with their labels) so that this call is not needed so often. This is shown below, with the top diagram to be replaced with the bottom one:
Also investigate if using https://github.com/fabric8io/kubernetes-client/blob/main/doc/CHEATSHEET.md?rgh-link-date=2024-02-20T14%3A54%3A24Z#sharedinformers will improve performance.
The code is hard to read and understand - we can improve things by better naming, separating concerns more clearly and adding more comments.
Acceptance criteria
- implement an IP-keyed cache in place of the current datanode-based one
- improve readability by adding helper classes for e.g. pod/node resolution
- add/improve code comments
- consider/improve naming
- consider using a shared informer, specifically looking at whether this trades accuracy (primary goal) for performance (secondary goal)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
