Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocket.Chat on mongo replica set and read/write preference consequences #1867

Closed
georgiosd opened this issue Jan 12, 2016 · 30 comments
Closed

Comments

@georgiosd
Copy link

Hey all,

We have recently deployed RocketChat to a 2 node+arbiter mongo setup. Both nodes have a single meteor instance and are in front of a load balancer and the arbiter is a smaller VM there for voting purposes only.

I have currently set the RS read preference to nearest and write preference to majority thinking that it's optimal in terms of guaranteeing writes and it helps balance the database load on the reads with some presumably minor latency effect.

Any thoughts on this setup for or against?

Thanks
Georgios

@geekgonecrazy
Copy link
Contributor

I think this is the first time i've heard of someone setting up like this.

Are you seeing any performance hits / gains? What kind of load?

@RocketChat/core

@georgiosd
Copy link
Author

It depends on what you compare it to.

Our old set up was on a standalone VM/mongo instance on an Azure VM with 2x AMD cores/7GB RAM.

The new setup has the above mongo setup on 2x load balanced Azure VMs with 1x Xeon core+3.5GB RAM each. We've also set up RAID-0 with 2x cloud disks for the data (previously they were on the OS disk) and done several other OS tweaks such as swap on SSD drive etc.

Between the two we have seen at 3-4x improvement on the amount of active concurrent users.

What is the typical kind of setup you are implicitly referring to?

@geekgonecrazy
Copy link
Contributor

@georgiosd I was referring specifically to the ReplicaSet settings. Most don't go so far as to detail those settings. So first i've seen specifically those settings. I personally have not done much with mongo replicasets. So others would be more qualified to answer this. :)

But sounds like a solid setup.

@engelgabriel
Copy link
Member

Thanks for the tip! We are planning some more performance enhancements that should improve the amount of active concurrent users another 10 fold.

@georgiosd
Copy link
Author

Oh ok :)

It seems to me like the minimum configuration for having any reliability if a VM fails for any reason.

What we have noticed is that the "user is typing" feature is limited to the users connected to the particular node. Meaning, if I can connected to node B, I will only see "x is typing" for users also connected to node B.

I presume that this is because of the read preference being set to nearest but perhaps you know something different?

@engelgabriel
Copy link
Member

That is strange.. that should not be the case. All nodes broadcast the "user is typing" to the other running nodes via DDP... it that is not working, we have a bug.

@georgiosd
Copy link
Author

If it does not depend on the database, then yes, I would agree :)

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

Are you saying that you've git clone RC-source-code and running meteor on each of the node, with each one also running a local MongoDB instance?

@georgiosd
Copy link
Author

Yes, and all mongodb instances are connected to the same replica set (2x nodes + arbiter)

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

The round-robin load balancer is in front of the RC/meteor instances (port 3000)?

@engelgabriel
Copy link
Member

@georgiosd are you running git clone RC-source-code and running meteor rather than downloading the latest version from https://rocket.chat/releases and running it using pm2?

The meteor command is only meant for development.. it will run much slower.

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

Also - we might not have tested the typing... message when running multiple independent meteor instances operating in development mode on separate machines.

@georgiosd
Copy link
Author

Might have rushed my responses a little.

The production instances are executed on node 0.10.40 like so:

node $APP_DIR/bundle/main.js

Package is built with meteor build

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

Ah. OK 😃 👍 Is the load balancer in front of the 2 x RC instances ?

@georgiosd
Copy link
Author

Yes, so both nodes have 8080 listening for requests, there's an nginx reverse proxy (redirects HTTP requests to HTTPS and proxies HTTPS requests to localhost:8080). The load balancer will distribute the load on port 443 for both nodes based on ClientIP (sticky).

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

And the RC instances loop back again, via IP, round-robin to the two mongo instances co-located on the same 2 x VMs?

@georgiosd
Copy link
Author

Not if I understood the question correctly.

This is the mongo config that I've found for replica set meteor environments:

MONGO_URL="mongodb://primary:27017,secondary:27017,arb:27017/$DB?replicaSet=rs0&readPreference=nearest&w=majority"
MONGO_OPLOG_URL="mongodb://primary:27017,secondary:27017,arb:27017/local"

Basically mongo is accessed directly by virtual network ip, it doesn't go through a load balancer.
Each node should pick the local mongo instance to read from given readPreference=nearest

Makes sense?

@Sing-Li
Copy link
Member

Sing-Li commented Jan 12, 2016

Yep 😄 Very interesting. Thanks for sharing!

We find that number of active users handled scales with number of mid-tier RC instances. On demo.rocket.chat - we're running 4 x RC instances.

Certainly the repl set + arbiter config provide additional up-time availability benefits.

@georgiosd
Copy link
Author

That makes sense. Is that one instance per core? It's the recommendation I've found in my research for node/meteor.

Given that at least on Azure, 1-core VM is half the price of a 2-core VM and 1/4 of the price of a 4-core VM, I thought it's better to increase uptime and keep cost the same.

If we need more juice in the near-term, I will turn the arbiter VM into a full node so we should gain another 1-2x.

The other problem with the kind of setup that you're describing, at least on Azure, is that the load balancer has a fixed destination port so if you have multiple VMs with multiple instances each, you'd have to have a load balancer on the virtual network level and another on the VM level to distribute the load between instances and that's when things start getting iffy. Too bad node can't just spawn workers.

@Sing-Li
Copy link
Member

Sing-Li commented Jan 13, 2016

Yes. One instance per core. Server-side node.js is single-threaded 😄

Thanks for sharing. Yes, a mix of horizontally scaled + vertically scaled nodes typically results in super brittle configs.

@georgiosd
Copy link
Author

I am happy to "donate" my Azure resource scripts for this setup and some basic bash scripts I've made to set things up if they'd help you guys at all.

Are you planning on having automated testing that will include environment testing (beyond unit/integrations tests)?

@Sing-Li
Copy link
Member

Sing-Li commented Jan 13, 2016

@georgiosd -- That will simply be A-W-E-S-O-M-E ! Thanks in advance.

Please create a page here on our wiki https://github.com/RocketChat/Rocket.Chat/wiki

Any format is fine, and our documentation specialist team member will tidy it up and integrate it into our soon-to-be-available documentation website.

Testing - yes, including distributed load testing - but only in the long term plans. Is that what you mean by environment testing?

Thanks again.

@georgiosd
Copy link
Author

Ok, sure. However because it's multiple files, I'd recommend something in the form of a pull request?

Environment testing: yes, distributed load testing would be a part of it. With software like RC it's often useful to deploy different configurations and test features against them.

Is unit/integration testing (local) part of your plans? If so, what kind of timeframe are we looking at?

@Sing-Li
Copy link
Member

Sing-Li commented Jan 14, 2016

Great idea. Please submit a PR here, with an Azure subdirectory.

https://github.com/RocketChat/Deploy.to.Cloud

re: environment testing. If you mean different networking/clustering topologies and automated, I think we might be a generation away yet in terms of capabilities and resources available. k8s and docker swarm hold some promise, albiet still semi-simulation. It is only in distributed load testing - where Rocket.Chat can possibly deliver some breakthroughs as the command and payload switching fabric.

Unit and integration testing are not only in our plans, but (some) already in our existing source code. It is no secret that testing is the Achilles heel of Meteor based reactive systems in general; we're ready for MDG's new testing integration in 2016 when it becomes ready.

@georgiosd
Copy link
Author

Ok, will do! Give me a few days, currently swamped.

Re unit testing: are you referring to spotify tests? They're the only tests I can see on the repo - am I missing anything?

What's the "MDG testing integration" you're referring to? Not sure if I've come across it.

We are actually having some problems with the current config too. I can't be sure what exactly is going on but I'd guess it's something to do with the Azure load balancer and the health probe.

The load balancer will point to an nginx process on each node (necessary to offload SSL) which will reverse proxy to the node instance. The health probe points to the node instances however because node could be down and nginx will still respond with a 502.

So in this setup, when something goes wrong and I say have to restart nginx, all hell breaks loose. The site gets unresponsive but without errors. Go figure.

I wanted to avoid using a paid load balancer which will offload SSL at that point so you can point it directly to the node instances but it seems like the only way to go.

Any ideas welcome.

@engelgabriel
Copy link
Member

@marceloschmidt marceloschmidt modified the milestone: 0.37.0 Sep 5, 2016
@vikas0121
Copy link

Hi i'm setting the MONGOURL in the forever service
export MONGO_URL=mongodb://10.xx.xx.xxx:27017,10.xx.xx.xxx:27017/Chat?replicaSet=rs1&readPreference=nearest
not working for me..help needed

@richardwlu
Copy link

@vikas0121 It may help if you can explain what error you are receiving?

Are you pointing to the correct database name and replicaSet name?

This is taken from our environment:
"MONGO_URL": "mongodb://mongochat01:27017,mongochat02:27017,mongochat03:27017/rocketchat?replicaSet=001-rs&readPreference=primaryPreferred&w=majority"

@vikas0121
Copy link

@richardwlu thanks for the reply.
actually, im using forever service to run the application and using export command.
And not using double quotes for the url as shown above. The error is improper MONGOURL.
May be it is not accepting the parameters like "&" and all the data after "&" sign.

@geekgonecrazy
Copy link
Contributor

Just saw this again. Might be worth following: #8064

Basically some issues in a few cases when reading from a secondary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants