-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocket.Chat on mongo replica set and read/write preference consequences #1867
Comments
I think this is the first time i've heard of someone setting up like this. Are you seeing any performance hits / gains? What kind of load? @RocketChat/core |
It depends on what you compare it to. Our old set up was on a standalone VM/mongo instance on an Azure VM with 2x AMD cores/7GB RAM. The new setup has the above mongo setup on 2x load balanced Azure VMs with 1x Xeon core+3.5GB RAM each. We've also set up RAID-0 with 2x cloud disks for the data (previously they were on the OS disk) and done several other OS tweaks such as swap on SSD drive etc. Between the two we have seen at 3-4x improvement on the amount of active concurrent users. What is the typical kind of setup you are implicitly referring to? |
@georgiosd I was referring specifically to the ReplicaSet settings. Most don't go so far as to detail those settings. So first i've seen specifically those settings. I personally have not done much with mongo replicasets. So others would be more qualified to answer this. :) But sounds like a solid setup. |
Thanks for the tip! We are planning some more performance enhancements that should improve the amount of active concurrent users another 10 fold. |
Oh ok :) It seems to me like the minimum configuration for having any reliability if a VM fails for any reason. What we have noticed is that the "user is typing" feature is limited to the users connected to the particular node. Meaning, if I can connected to node B, I will only see "x is typing" for users also connected to node B. I presume that this is because of the read preference being set to |
That is strange.. that should not be the case. All nodes broadcast the "user is typing" to the other running nodes via DDP... it that is not working, we have a bug. |
If it does not depend on the database, then yes, I would agree :) |
Are you saying that you've |
Yes, and all mongodb instances are connected to the same replica set (2x nodes + arbiter) |
The round-robin load balancer is in front of the RC/meteor instances (port 3000)? |
@georgiosd are you running The |
Also - we might not have tested the |
Might have rushed my responses a little. The production instances are executed on node
Package is built with |
Ah. OK 😃 👍 Is the load balancer in front of the 2 x RC instances ? |
Yes, so both nodes have 8080 listening for requests, there's an nginx reverse proxy (redirects HTTP requests to HTTPS and proxies HTTPS requests to localhost:8080). The load balancer will distribute the load on port 443 for both nodes based on ClientIP (sticky). |
And the RC instances loop back again, via IP, round-robin to the two mongo instances co-located on the same 2 x VMs? |
Not if I understood the question correctly. This is the mongo config that I've found for replica set meteor environments:
Basically mongo is accessed directly by virtual network ip, it doesn't go through a load balancer. Makes sense? |
Yep 😄 Very interesting. Thanks for sharing! We find that number of active users handled scales with number of mid-tier RC instances. On demo.rocket.chat - we're running 4 x RC instances. Certainly the repl set + arbiter config provide additional up-time availability benefits. |
That makes sense. Is that one instance per core? It's the recommendation I've found in my research for node/meteor. Given that at least on Azure, 1-core VM is half the price of a 2-core VM and 1/4 of the price of a 4-core VM, I thought it's better to increase uptime and keep cost the same. If we need more juice in the near-term, I will turn the arbiter VM into a full node so we should gain another 1-2x. The other problem with the kind of setup that you're describing, at least on Azure, is that the load balancer has a fixed destination port so if you have multiple VMs with multiple instances each, you'd have to have a load balancer on the virtual network level and another on the VM level to distribute the load between instances and that's when things start getting iffy. Too bad node can't just spawn workers. |
Yes. One instance per core. Server-side node.js is single-threaded 😄 Thanks for sharing. Yes, a mix of horizontally scaled + vertically scaled nodes typically results in super brittle configs. |
I am happy to "donate" my Azure resource scripts for this setup and some basic bash scripts I've made to set things up if they'd help you guys at all. Are you planning on having automated testing that will include environment testing (beyond unit/integrations tests)? |
@georgiosd -- That will simply be A-W-E-S-O-M-E ! Thanks in advance. Please create a page here on our wiki https://github.com/RocketChat/Rocket.Chat/wiki Any format is fine, and our documentation specialist team member will tidy it up and integrate it into our soon-to-be-available documentation website. Testing - yes, including distributed load testing - but only in the long term plans. Is that what you mean by environment testing? Thanks again. |
Ok, sure. However because it's multiple files, I'd recommend something in the form of a pull request? Environment testing: yes, distributed load testing would be a part of it. With software like RC it's often useful to deploy different configurations and test features against them. Is unit/integration testing (local) part of your plans? If so, what kind of timeframe are we looking at? |
Great idea. Please submit a PR here, with an Azure subdirectory. https://github.com/RocketChat/Deploy.to.Cloud re: environment testing. If you mean different networking/clustering topologies and automated, I think we might be a generation away yet in terms of capabilities and resources available. k8s and docker swarm hold some promise, albiet still semi-simulation. It is only in distributed load testing - where Rocket.Chat can possibly deliver some breakthroughs as the command and payload switching fabric. Unit and integration testing are not only in our plans, but (some) already in our existing source code. It is no secret that testing is the Achilles heel of Meteor based reactive systems in general; we're ready for MDG's new testing integration in 2016 when it becomes ready. |
Ok, will do! Give me a few days, currently swamped. Re unit testing: are you referring to spotify tests? They're the only tests I can see on the repo - am I missing anything? What's the "MDG testing integration" you're referring to? Not sure if I've come across it. We are actually having some problems with the current config too. I can't be sure what exactly is going on but I'd guess it's something to do with the Azure load balancer and the health probe. The load balancer will point to an nginx process on each node (necessary to offload SSL) which will reverse proxy to the node instance. The health probe points to the node instances however because node could be down and nginx will still respond with a 502. So in this setup, when something goes wrong and I say have to restart nginx, all hell breaks loose. The site gets unresponsive but without errors. Go figure. I wanted to avoid using a paid load balancer which will offload SSL at that point so you can point it directly to the node instances but it seems like the only way to go. Any ideas welcome. |
Hi i'm setting the MONGOURL in the forever service |
@vikas0121 It may help if you can explain what error you are receiving? Are you pointing to the correct database name and replicaSet name? This is taken from our environment: |
@richardwlu thanks for the reply. |
Just saw this again. Might be worth following: #8064 Basically some issues in a few cases when reading from a secondary. |
Hey all,
We have recently deployed RocketChat to a 2 node+arbiter mongo setup. Both nodes have a single meteor instance and are in front of a load balancer and the arbiter is a smaller VM there for voting purposes only.
I have currently set the RS read preference to
nearest
and write preference tomajority
thinking that it's optimal in terms of guaranteeing writes and it helps balance the database load on the reads with some presumably minor latency effect.Any thoughts on this setup for or against?
Thanks
Georgios
The text was updated successfully, but these errors were encountered: