reserve ports so node retains same address:port across restarts #72

seanjensengrey · 2016-08-01T20:47:04Z

reserve riak ports so that it is retained across a cluster restart

this would allow us to a hot cluster resize where nodes are added (and currently get stuck waiting for handoff) by also doing a rolling restart of the stuck nodes. By retaining the ports, connection information wouldn't also have to be updated in the clients (basho bench).

mdigan · 2016-08-01T21:40:31Z

@seanjensengrey I'm not sure I understand the issue completely. Can you list the order of events as you'd like them to happen in this scenario? Also, would a service discovery mechanism be a more flexible solution for the client issue you mentioned?

seanjensengrey · 2016-08-01T22:03:46Z

This would advantageous for a couple reasons

cluster could be restarted w/o clients getting updated
allow us to run basho bench against during a node addition / rolling restart scenario

Currently when we do a cluster restart, the IPs stay the same, the persistent volumes are retained but the port numbers are different. So that needs to get communicated to every downstream client that is doing a direct connect. In a high throughput environment (bulk loads, Spark, etc), going through a proxy like the RMF Director or HAProxy can be a performance hit.

If we did a reservation for those port numbers, they would stay fixed for the life of the cluster.

We are currently seeing a handoff issue when adding nodes to a live cluster that requires restarting the existing nodes. By adding reservations, we allow hot node additions via a rolling restart.

sanmiguel · 2016-08-01T23:15:34Z

Withholding ports across task restarts like this is something we've typically assumed is not possible in Mesos - or at least, not guaranteed. My instinctive answer to this was "it's simply not how Mesos works", but I'm now struggling to find the documentation where it says about about not assuming specific ports are available. I had thought it was a criterion in DC/OS Certification...

There might be actions we can have the scheduler take to try to do this, but I don't know if we can guarantee it succeeding every time. I will need to experiment with this in the coming days to get you a more definitive answer.

WRT IPs remaining the same: this is an implementation detail of the current persistent volume machinery in Mesos. A persistent volume, by default, lives on the filesystem of the host it is created upon initially. We re-use that persistent volume when restarting the node, so the node ends up on the same agent after restart simply because that's where the volume is. If we change to using a different persistent volume setup, or the way this one works by default changes, this will no longer hold true.

Meanwhile, could you please share some more details on the handoff issue? What are the symptoms? How can I recreate it?
I feel like the handoff issue ought to be a separate issue...

seanjensengrey · 2016-08-02T00:41:50Z

I have created RTS-1275 to track the handoff issue. If you add nodes to a cluster while it is under load (RMF + Riak TS 1.3.1) the new nodes will not take any ring ownership until the cluster is restarted.

It sounded like from what Drew said is if we use reserved ports, they can stay fixed across restarts. If this is any way kludge or is not DC/OS certifiable, lets not implement this.

https://gist.github.com/travisbhartwell/4ab563b62b3a48e128fb356806a5df33

seanjensengrey · 2016-08-02T13:39:56Z

I found the DCOS certification spec, https://docs.google.com/document/d/1rtuddOSyZwg7gC3Uye3TqdqT4fm1wWuJiMYQZUwLAM8/edit#heading=h.q5rzjg7ij60y

And it looks like we can make dynamic reservations for ip ports, and ip port ranges.

mdigan · 2016-08-02T16:23:48Z

@sanmiguel I think this is the most recent published version of the DC/OS certification spec: https://docs.mesosphere.com/1.7/usage/managing-services/developing-services/certification-requirements/

and confusingly enough, there is also this doc:
https://docs.mesosphere.com/1.7/usage/managing-services/developing-services/service-requirements-spec/

I do see some differences between the google doc version and the docs.mesosphere.com versions, but I can't find anything about dynamic reservations for ip ports and ip port ranges at first glance.

sanmiguel · 2016-08-02T18:12:55Z

You're both right - I think I conflated not being able to reserve a specific port (a la epmd requiring 4369) and not being able to re-use a specific port from an offer.

@seanjensengrey it should be possible for us to get this working in the way you wanted.

Sorry for the confusion.

seanjensengrey added the enhancement label Aug 2, 2016

sanmiguel added the in progress label Aug 3, 2016

sanmiguel self-assigned this Aug 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reserve ports so node retains same address:port across restarts #72

reserve ports so node retains same address:port across restarts #72

seanjensengrey commented Aug 1, 2016 •

edited by sanmiguel

Loading

mdigan commented Aug 1, 2016

seanjensengrey commented Aug 1, 2016

sanmiguel commented Aug 1, 2016

seanjensengrey commented Aug 2, 2016

seanjensengrey commented Aug 2, 2016

mdigan commented Aug 2, 2016

sanmiguel commented Aug 2, 2016

reserve ports so node retains same address:port across restarts #72

reserve ports so node retains same address:port across restarts #72

Comments

seanjensengrey commented Aug 1, 2016 • edited by sanmiguel Loading

mdigan commented Aug 1, 2016

seanjensengrey commented Aug 1, 2016

sanmiguel commented Aug 1, 2016

seanjensengrey commented Aug 2, 2016

seanjensengrey commented Aug 2, 2016

mdigan commented Aug 2, 2016

sanmiguel commented Aug 2, 2016

seanjensengrey commented Aug 1, 2016 •

edited by sanmiguel

Loading