-
Notifications
You must be signed in to change notification settings - Fork 763
High Availability deployment #2037
Comments
RocketChat/Rocket.Chat#3540 (comment) We need to ensure we mention keeping time in sync between instances also. |
inorder to run two instances of rocket chat on two VMs (web1 and web2) and with a loadbalancer infront of them (haproxy1), do I need to take care of setting up a shared storage for the session info. I have a seperate mongo DB replica setup with another three Vms: so if the session info stored in mongo; I dont need any shared storage I suppopse; please suggest me how I should move from this point |
if you have mongo set up in replicaset mode, you do not need anything else for shared session storage |
@geekgonecrazy thanks for your response. |
@geekgonecrazy And to add on to @srihas619's question, does the I currently have 4 instances of RC running on one server and 3 Mongo replicas and my However, I will have 8 instances of RC running between 2 servers while adding 2 additional replica set members on the new server(s) (to be mongochat04 and mongochat05) and I am trying to determine if I need to change |
@richardwlu I configured |
@srihas619 Thanks for the tip. What is your value of And the OpLog, according to the docs:
|
@richardwlu thanks for the link to docs. It explains clearly. My |
@srihas619 I think we will need to bring in @geekgonecrazy to help verify the questions above for us :) |
Just like with the MONGO_URL you will want to add all nodes into the MONGO_OPLOG_URL incase a primary election happens. But it will only actually be tailing the oplog on the primary node. |
also as far as naming used in the connection string. You will want to make sure to use the same name that the nodes identify them selves as in the replicaset. localhost for example is something I would avoid using when you have a multiple node mongo replicaset. If you attempt to connect and the primary is localhost it will address attempt to connect to localhost and address it as such. If internally its referencing its self as something else, you will have issues. Also the other mongo nodes will try to lookup localhost when trying to talk to this peer, it will of course always resolve to its self. So will cause all kinds of issues. tl;dr Good practice to always use a hostname thats reachable by all other nodes in replicaset |
@geekgonecrazy Thank you for clearing this up! |
@geekgonecrazy Just to clarify, is it necessary to specify the replica set name by adding What I intend to have is:
|
@richardwlu I don't know that its absolutely necessary. But i typically always do this Looks like what I would use 👍 |
@geekgonecrazy After editing the following values for each instance (I have 8 total instances running on 2 servers with 4 on each), we have noticed an issue where users are not receiving desktop notifications and alerts consistently (more failed than not). Would you happen to know why this would occur and if it is related to the mongodb config? We are on version 0.55.0 (older I know), but everything was running fine prior to me adding 4 instances and editing the mongodb. The version has stayed the same.
|
@richardwlu sorry got a bit behind on github notifications 😁 If you are getting issues like that, typically because the instances cannot talk to each other. Usually something with instance ip or firewall conditions. They need to be able to talk otherwise when it fires event only users on same instance as the one that fired the event will receive it. |
@geekgonecrazy No worries, found out it was the INSTANCE_IP |
@geekgonecrazy @Sing-Li @georgios Are you aware of any RC use-cases with very high concurrency (say, 100k + concurrent users)? Most things I have read were much smaller in scale than this. |
Ulimit increased, node heap increased and good hardware you can do some amazing stuff. :). At that size of deployment I'd say reach out and talk to us. |
@geekgonecrazy I assume you mean to contact you via support? I think I shot an email out to sales and support a day or so ago. If there is a more fluid way of contacting RC that would be great -- maybe meet on your chat server for a brief discussion? Load Testing: We started doing baseline load testing with no subscriptions and just sending messages and we get 100% usage around 60 rocketchat_messages/second. Writes/CPU usage seems to be the largest bottleneck -- the amount of users doesn't seem to really have much affect. We tested the same throughput with varying number of users and it had negligible affect on the system. Our setup involves oplog tailing and the hardware as of now is greater than what RC documented as "minimal specs". If we can have a brief discussion as stated above, we can get more specific if you have other questions. Oplog Tailing vs RedisOplog: From what I have read, oplog tailing is a very expensive task when there are a lot of writes and redisOplog solution will relieve a lot of this CPU stress. Has Rocket Chat looked into this solution and if so, do you have any data on the results? I am currently trying to setup redisOplog with oplog tailing disabled on another machine but running into snags. Hopefully I get that running soon with Theodor's help. Lastly, when you mentioned "node heap increased", did you mean the process memory limit or maybe, giving more RAM to mongo? Update: The tests described above was just for a single instance. We are testing our server (12 cores) to have 10 instances. Very early testing showed 150 messages(writes)/second to take 65% CPU. It peaked at over 75% from bursts. Is it normal for the application to be make 6-7x more disk read / writes than mongo does? Or is this a possible misconfig on my end? Thanks, |
Have you had any conclusive results on how RC scales at your level of users @AFrangopoulos ? I'm evaluating RC for a use case with more than 1 million registered users and hundreds of thousands of concurrent users. Is there any documentation or experience on scaling RC to this level? |
Without getting redisOplog integrated, it can't scale to large numbers of concurrent users (did not have the time to spend on adding this solution to RC, so we abandoned it). The more you scale horizontally, the more your gains diminish due to oplog tailing (and I imagine some other things). Also, it seems the number of packages are also starving the app for resources from what I could tell. I suspect that if RC were to integrate the redis-oplog feature successfully, it could scale to a very large number of concurrent users (unless there are other bottlenecks that appear after oplog tailing is removed from the equation). Lastly, if you do choose to use RC and try scale it will not be cost-effective -- and eventually you will hit a point where scaling stops and you will likely need to find a different solution. This is all based on our load/performance testing. It is not fact per se, but I haven't seen anything out there that contradicts our conclusions. Hope this helps! I urge the RC devs to try to integrate the redis-oplog package to see this applications full potential. GL |
I know we talked a bit in the support channel. But curious what specifically is pointing to oplog as the limiting factor here? We are definitely experimenting with redis oplog and a few others trying to overall increase performance. |
@AFrangopoulos Thanks for your answer, even if it is not what I'd hoped. Are you willing / allowed to hand out the load-tests? |
@AFrangopoulos Can you provide some more detail on the load testing you've done (what techniques you've used etc.)? My institution is really interested in setting up their own load testing. |
@sr258 I used both meteor down and meteor-load-testing by allaning. I found the latter to be more consistent in our tests. Our main concern was 'writes' & having 20k+ concurrent users. Scaling horizontally just wasn't cutting it -- diminished returns the more you scaled. Was not cost effective. Also, it seems RC has a LOT of packages and there is a continuous fight for CPU. Our goal was to reach 1k writes/second through scaling and it couldn't within reason ($/# boxes needed). I think somewhere on these forums pertaining to RC/Meteor I have a detailed writeup of the several setups we tried to get the best performance. In the end, we concluded this product couldn't scale to what we needed. It seems RedisOplog would be a great candidate to fix the scaling issues. GL to you |
See
RocketChat/Rocket.Chat#520
RocketChat/Rocket.Chat#847
RocketChat/Rocket.Chat#1867
RocketChat/Rocket.Chat#2964
The text was updated successfully, but these errors were encountered: