Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] High Availability #144

Open
gregkeys opened this issue Aug 8, 2018 · 9 comments
Open

[feature] High Availability #144

gregkeys opened this issue Aug 8, 2018 · 9 comments

Comments

@gregkeys
Copy link

gregkeys commented Aug 8, 2018

We are currently using crossbar in production but would like to switch to something that offers high availability. Crossbar has HA however it is prohibitively expensive at around $14,000 per node

We run multi-tenanted clusters where each namespace represents a customer website, within each namespace is a deployment of crossbar.

All of our services are capable of scaling up and down however at 65,000 active connections Im expecting that we will begin to see problems with services and connected users as they begin to loose connectivity.

if the router crashes and restarts there is no failover strategy everything just goes into a retry cycle until crossbar comes back online

Nexus looks like a good option with its ability to drop sessions from the que, this could prevent the restarts but still leaves to question how do we scale up to millions of active connections, we'd like to scale up our wamp router so that it can spread the load accross multiple instances of the deployment.

We'd also like to have instances of nexus running on multiple nodes in the event that we loose a node we dont loose connectivity

@gammazero
Copy link
Owner

I have been considering what it would take to provide HA for a cluster of nodes running WAMP routers. Here is a very rough design concept that tries to address what you are asking for. I would be interested in hearing feedback.
https://github.com/gammazero/nexus/wiki/High-Availability

@martin31821
Copy link
Contributor

martin31821 commented Aug 11, 2018

We also thought about HA within WAMP, but came up with a slightly different approach.

Our approach was to minimize the inter-router-communication required to maintain good performance and reliability.

We split our services into smaller "application features", and wanted to use one "backend router" per "application feature". The backend router is where the backend services connect to.
Then we can go on and freely scale the users by adding load-balancing routers infront of the different backend routers.
This approach has the following advantages:

  • We minimize the communication between routers
  • We can perform nearly-zero-downtime updates
  • When a router restarts, not the complete application fails for all users, either
    • the application fails for some users (in case of a frontend router restart)
    • a part of the application fails for all users (in case of a backend router restart)

However, I agree with @gammazero that some kind of federation service would certainly be required, regardless of the chosen implementation strategy.

Additionally, I think nexus will need to open up some internal APIs, since the federation service would require deep interaction with the router itself.

untitled diagram

/cc @johannwagner @fin-ger

@haizaar
Copy link

haizaar commented Aug 15, 2018

@gregkeys can you please provide some info/links about Crossbar HA mode?

@haizaar
Copy link

haizaar commented Aug 16, 2018

BTW, who ever is in search for multi-node router for scale/HA purposes, Wiola seems to fit the bill:

@gregkeys
Copy link
Author

@gammazero Im still considering you're proposal, at first glance it looks like a solid plan of action

@martin31821 are you building custom routers with nexus to accomplish this? if so where is the strong interconnect taking place is that built into the frontend routers?

@haizaar here is the link where crossbar mentions access to HA mode https://crossbario.com/products/enterprise-support/

@martin31821
Copy link
Contributor

@gregkeys

are you building custom routers with nexus to accomplish this? if so where is the strong interconnect taking place is that built into the frontend routers?

At the moment, we have autobahnkreuz , but it has no high-availability included yet.
My idea was to either include multiple local clients connecting to the backend routers (take a look at the autobahnkreuz source code) and use it to register/publish/subscribe or to have this functionality as seperated binaries and implement the connection there.

@mjentsch
Copy link

@gammazero how would you prevent subscription/registration loops using federation agents connecting as normal clients to other routers? There seems to be a problem that clientA connected to routerA subscribes to a topic. The federation agentA (on routerA) detects that and subscribes to that topic on routerB it is connected to. On routerB is federation agentB running who detects this subscription if we do not differentiate between clients and agents and subscribes to the topic on routerA is is connected to. clientA can unsubscribe from the topic but since agentB still is subscribed to the topic, agentA will not unsubscribe from it on routerB.
Having HA would be awesome... but all my drawings end up with unsatisfying results.

@uttaravadina
Copy link

@gammazero has there been any work in HA, Load Balancing and Horizontal Scaling?
I would like to contribute.

@martin31821
Copy link
Contributor

@u774r4v4d1n there has been some research, primarily regarding state synchronization in a distributed scenario (for availability, not for scaling).

cc'ing @fin-ger, his project https://github.com/fin-ger/building-a-distributed-wamp-router gives a nice overview of the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants