-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reserve ports so node retains same address:port across restarts #72
Comments
@seanjensengrey I'm not sure I understand the issue completely. Can you list the order of events as you'd like them to happen in this scenario? Also, would a service discovery mechanism be a more flexible solution for the client issue you mentioned? |
This would advantageous for a couple reasons
Currently when we do a cluster restart, the IPs stay the same, the persistent volumes are retained but the port numbers are different. So that needs to get communicated to every downstream client that is doing a direct connect. In a high throughput environment (bulk loads, Spark, etc), going through a proxy like the RMF Director or HAProxy can be a performance hit. If we did a reservation for those port numbers, they would stay fixed for the life of the cluster. We are currently seeing a handoff issue when adding nodes to a live cluster that requires restarting the existing nodes. By adding reservations, we allow hot node additions via a rolling restart. |
Withholding ports across task restarts like this is something we've typically assumed is not possible in Mesos - or at least, not guaranteed. My instinctive answer to this was "it's simply not how Mesos works", but I'm now struggling to find the documentation where it says about about not assuming specific ports are available. I had thought it was a criterion in DC/OS Certification... There might be actions we can have the scheduler take to try to do this, but I don't know if we can guarantee it succeeding every time. I will need to experiment with this in the coming days to get you a more definitive answer. WRT IPs remaining the same: this is an implementation detail of the current persistent volume machinery in Mesos. A persistent volume, by default, lives on the filesystem of the host it is created upon initially. We re-use that persistent volume when restarting the node, so the node ends up on the same agent after restart simply because that's where the volume is. If we change to using a different persistent volume setup, or the way this one works by default changes, this will no longer hold true. Meanwhile, could you please share some more details on the handoff issue? What are the symptoms? How can I recreate it? |
I have created RTS-1275 to track the handoff issue. If you add nodes to a cluster while it is under load (RMF + Riak TS 1.3.1) the new nodes will not take any ring ownership until the cluster is restarted. It sounded like from what Drew said is if we use reserved ports, they can stay fixed across restarts. If this is any way kludge or is not DC/OS certifiable, lets not implement this. https://gist.github.com/travisbhartwell/4ab563b62b3a48e128fb356806a5df33 |
I found the DCOS certification spec, https://docs.google.com/document/d/1rtuddOSyZwg7gC3Uye3TqdqT4fm1wWuJiMYQZUwLAM8/edit#heading=h.q5rzjg7ij60y And it looks like we can make dynamic reservations for ip ports, and ip port ranges. |
@sanmiguel I think this is the most recent published version of the DC/OS certification spec: https://docs.mesosphere.com/1.7/usage/managing-services/developing-services/certification-requirements/ and confusingly enough, there is also this doc: I do see some differences between the google doc version and the docs.mesosphere.com versions, but I can't find anything about dynamic reservations for ip ports and ip port ranges at first glance. |
You're both right - I think I conflated not being able to reserve a specific port (a la @seanjensengrey it should be possible for us to get this working in the way you wanted. Sorry for the confusion. |
reserve riak ports so that it is retained across a cluster restart
this would allow us to a hot cluster resize where nodes are added (and currently get stuck waiting for handoff) by also doing a rolling restart of the stuck nodes. By retaining the ports, connection information wouldn't also have to be updated in the clients (basho bench).
The text was updated successfully, but these errors were encountered: