Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry k8s can not correctly handle unregister when its scaled #456

Closed
denis-tingaikin opened this issue Sep 20, 2023 · 10 comments
Closed
Assignees

Comments

@denis-tingaikin
Copy link
Member

This problem has been reported on WG call.

Scenario

Steps:
1.Deploy NSM
2. Scale registry-k8s to 4
3.Deploy NSE
4. Try to unregister NSE

Actual: NSE can not be unregistered
Expected: NSE can be unregistered

@NikitaSkrynnik
Copy link
Contributor

@ljkiraly @zolug Hi, it looks like we won't be able to fix this bug in the current release. The problem is more complicated then we thought at the beginning and there is no a quick fix. Could you please tell why you want to scale registry so we can understand better how to rework it

@zolug
Copy link

zolug commented Sep 25, 2023

Hi, we need to run multiple registry instances mostly to avoid API throttling which seemed to pop-up rather early when increasing the number of NSEs. And it also serves redundancy (e.g. in case of node failure).

@NikitaSkrynnik
Copy link
Contributor

@zolug Hi, we can tune kubernetes go-client to handle more requests. Would it solve your problem if we tune parameters of the client or add some ENVs so you can tune them yourself?

About redundancy: registry-k8s is a deployment and if node fails it will be redeployed on another node. All data is stored in etcd so you won't lose it if node with registry fails. I'm not sure if it solves your problem. Could you please tell if you experienced any node failures with data loss?

@zolug
Copy link

zolug commented Oct 3, 2023

@zolug Hi, we can tune kubernetes go-client to handle more requests. Would it solve your problem if we tune parameters of the client or add some ENVs so you can tune them yourself?

About redundancy: registry-k8s is a deployment and if node fails it will be redeployed on another node. All data is stored in etcd so you won't lose it if node with registry fails. I'm not sure if it solves your problem. Could you please tell if you experienced any node failures with data loss?

Hi @NikitaSkrynnik,
Unfortunately, I can only recall problems due to API throttling.
IIRC having registry replicas was also about availability (besides the API throttling):
I guess changing registry deployment's tolerations node.kubernetes.io/unreachable and node.kubernetes.io/not-ready to 0 would greatly improve the speed to schedule a new registry instance upon node failures.

However, it seems there's still a delay introduced by node-monitor-grace-period to mark a node "unresponsive". Now, I don't know if we could have requirements related to this setting, but with a single registry instance, loosing the node hosting the only registry would result in registry service unavailability at least for node-monitor-grace-period. On the other hand, in case of multiple replicas IMHO the service hopefully would be able to serve part of the requests successfully for the first attempt, while for others new NSE dials might result in picking working endpoints after a couple re-tries.

So, probably a short node-monitor-grace-period could invalidate the need for replicas. But correct me if I'm wrong.

@NikitaSkrynnik
Copy link
Contributor

NikitaSkrynnik commented Oct 10, 2023

Problem

There is one main problem with scaling registry - If we have several registries NSE can Register and Unregister in any of them.

Corner Cases

NSE has registered in registry A and tries to unregister itself in registry B. It won't be able to do it because of this problem: networkservicemesh/sdk#1515.

If we rework begin chain element so it allows Unregister without Register we will have another problem with etcd chain element. etcd doesn't allow Unregistration of NSE in a particular registry if the latest Registration of this NSE wasn't in same registry. etcd has a system of versions for NSEs.

This system of versions is necessary for cases when Registry gets Unregister event from expire chain element to prevent early NSE unregistration if it's registered in serveral Registries.

Options

Option 1 - Master-Registry

Create a new component master-registry which acts as a load balancer. It uses sharding for NSEs and always redirects requests from diffrerent NSEs to the registries which are linked with these NSEs.

Changes

  • New repo
    We need to create a new repository (cmd-registry-load-balancer , for example)

  • Chain elements which do sharding

Option 2 - Store connections on registry-clients

We can store connection to a registry on registry-clients and use it to always connect to the same registry on all requests.

Changes

  • Heal
    Heal needs to know when a registry linked with particular NSE dies and go to registry service to receive a connection to a new alive registry.

  • Begin
    Begin should not block Unregister events from NSEs (This problem is described here: begin can't do Unregister requests without Register requests sdk#1515). This behavior is needed in situtations when an initial registry (where NSE was registered for the first time) dies and NSE needs to Unregister itself in a different registry.

@zolug
Copy link

zolug commented Oct 10, 2023

Hi @NikitaSkrynnik,
In case of hashing the hash table (number of registries) might change, and so can the registry associated with a hash. This could be improved by linking the chosen registry to a connection, which is basically the second option. But what happens if the component storing the relation between connection and registry is lost for whatever reason? Let's consider if an Unregister is issued by the NSE before a Register could fill-in the missing info.

I think in general there's a problem in case the registry with the most recent resource ID is lost (e.g. due to scaling), and the NSE unregiters without sending a "new" register.

I had an idea to distinguish between local timeout based Unregister() and Unregister() received from an other NSM component in k8s-registry, and only block the first. (There should be a single NSMgr handling an NSE, or the msgs would be directly coming from an NSE e.g., from a remote vlan nse.)
Not sure if it wold work, but it's prone to the same problem mentioned above (i.e. the registry with the most recent resource ID might not be around when the NSE wants to unregister). And forcing/crafting a Register before an Unregister sounds weird.

@NikitaSkrynnik
Copy link
Contributor

@zolug Hi,
Thanks for the comment. I'm not sure about implementation details in the first option yet. The main idea is to use sharding for NSEs on master-registry to make them always go to the same registry.

About the second option:
I added a bit of details for this. If we lose the relation between connection and registry, NSE will choose another registry to do Unregister. We want to change begin element to allow NSE Unregister itself without Register event beforehand.

We are not sure about distiguishing between Unregister send from NSE and expire chain elements. We've discussed this option internally and decided to not use it because there is no simple way to do that.

@zolug
Copy link

zolug commented Oct 10, 2023

@zolug Hi, Thanks for the comment. I'm not sure about implementation details in the first option yet. The main idea is to use sharding for NSEs on master-registry to make them always go to the same registry.

About the second option: I added a bit of details for this. If we lose the relation between connection and registry, NSE will choose another registry to do Unregister. We want to change begin element to allow NSE Unregister itself without Register event beforehand.

We are not sure about distiguishing between Unregister send from NSE and expire chain elements. We've discussed this option internally and decided to not use it because there is no simple way to do that.

Do you also intend to get rid of the resource version check in the registry? Or how would a registry remove the custom resource if it had no or outdated version info? It might also require removal of the 'expire' element on the registry to avoid premature deletion in case of some event leading to registry re-select.

@NikitaSkrynnik
Copy link
Contributor

@zolug Hello,

Do you also intend to get rid of the resource version check in the registry? Or how would a registry remove the custom resource if it had no or outdated version info?

No, we don't need to get rid of the resourse version checking. We've found a simple way to distiguish between Unregisters as you suggested above. We will let you know when we have an image for testing.

@denis-tingaikin
Copy link
Member Author

Seems like done.

@zolug , @NikitaSkrynnik Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Status: Done
Development

No branches or pull requests

3 participants