Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-tenant uwazi core #2393

Closed
9 of 13 tasks
konzz opened this issue Jun 20, 2019 · 16 comments · Fixed by #2917
Closed
9 of 13 tasks

Multi-tenant uwazi core #2393

konzz opened this issue Jun 20, 2019 · 16 comments · Fixed by #2917
Assignees
Labels

Comments

@konzz
Copy link
Member

konzz commented Jun 20, 2019

The idea of this issue is to discuss and track all the info regarding the idea of multiple Uwazi instances running in one node.
Issues detected so far:

  • Sockets\
    • multi node process
    • multi tenant
  • Isolate requests per project
  • Cron jobs (sync, evidence vault)
  • system keys on startup ? deprecate in favor of migration if we need to add any system key.
  • separated databases? Possible issues with scalability
  • separated folders for files?
  • Super lengthy development 👌🤔😬
  • separate db to add tenants, node process to watch changes.
  • Migrations should take into account multiple possible databases
    this is not being addressed, migrations will work like they do now, env variables for the db index etc and script launch.
  • does semantic search and topic classification need to be aware of this ?
  • tests using filesystem are a bit flaky, i am assuming due to changes on the branch to use the same testing folder (can not replicate, maybe not as serious as i thought)
@txau
Copy link
Collaborator

txau commented Jul 1, 2019

@txau
Copy link
Collaborator

txau commented Jul 1, 2019

Apparently this can also be solved via async hooks: https://nodejs.org/docs/latest-v10.x/api/async_hooks.html

@daneryl
Copy link
Collaborator

daneryl commented Jul 1, 2019

@txau, the solution with continuation-local-storage looks really nice, we should do a quick spike, should be "easy" to integrate into our app, maybe not (http://asim-malik.com/the-perils-of-node-continuation-local-storage/)

with virtual hosts it looks like it does instantiate as many express apps as needed ? not so good for memory consumption ? not sure though

did not read the async hooks but it is experimental apparently, maybe not yet.

@habbes
Copy link
Contributor

habbes commented Jul 5, 2019

Regarding sockets. One approach is to separate the server operations that respond to user-requests and long-running async tasks. We could use a background worker for the async tasks like processing pdfs and semantic search. The server and worker would be decoupled, and a message queue would be put in place to handle communication between the two.

Here's a simple workflow:

  • the user makes a request, creating a document with a pdf
  • an uwazi server picks up the request, stores the necessary info in the db, adds a task to the job queue in the message broker, and sends a response to the client
  • the worker node is notified by the message broker of this new task, it performs the task, i.e. process the pdf, and when it's done it sends a broadcast message to the message broker
  • the message broker notifies all the API server nodes that the pdf document has been processed
  • each server nodes broadcasts this notification to its clients via sockets

Which this approach, it will not matter whether the server that the user is connected to is the same one from which they made the request, because each server node will receive the notification and send to its clients.

Here are some pros and cons that I can think of:

Pros

  • You can scale the server nodes and workers independently. You don't have to run as many workers as servers.
  • You can have workers retry tasks when they fail
  • All server nodes get notified when tasks are complete
  • Because nodes are decoupled from each other, you can add nodes and remove nodes on the fly (on paper)

Cons

  • You have to run both the server and the worker instances separately. But these can be just two instances of the same codebase with different entry points. But maybe from an operations perspective, it's more work and more resources.

  • You have more infrastructure to manage, in this case the message broker will require deployment and maintenance in addition to Mongodb and elasticsearch.

  • I don't know how, or whether this affects the current cron jobs cause I've not looked into how those work.

In terms of implementation, the worker and the server could just be the same uwazi codebase, but with different entrypoints, e.g. node app/server.js to start the server and node app/worker.js to start a worker.

Some popular message brokers

RabbitMQ is a popular message broker that supports multiple messaging approaches like broadcasting, publish/subscribe, task queues that distribute incoming tasks across different workers, durable messages (e.g. messages stored on disk in case of crash), etc.

Redis is an in-memory key-value database that's usually used for caching. It also supports messaging I believe. I've never deployed it though. But it's quite popular. If we're also considering using a shared in-memory storage for storing things like user sessions that could be shared across all nodes, it could be worth looking into.

Kafka is also widely popular I think. But I've never used it and don't have much to say about it.

I could read more about the topic to see what the current best practices are and important considerations I may not be aware of.

@habbes
Copy link
Contributor

habbes commented Jul 11, 2019

I assume that the user account model will work like slack. That is, if one user has two accounts in two separate instances. Those two accounts are separate and isolated even if they have the same email address. Each instance is an isolated namespace. Which is different from services that have one namespace for users and the same user account can be "added" to different organizations/tenants (e.g. like GitHub).

Is this assumption correct?

@txau
Copy link
Collaborator

txau commented Jul 11, 2019

@habbes that is correct. No centralization of user auth, its a per instance auth.

@txau txau changed the title One Node to rule them all Multi-tenant uwazi core Jul 11, 2019
@kjantin kjantin removed the Question label Mar 6, 2020
@kjantin kjantin added this to the Roadmap for Q2 2020 (Apr, May, Jun) milestone Mar 6, 2020
@kjantin
Copy link
Contributor

kjantin commented May 4, 2020

This issue is now a priority for the team so that we can free up resources to improve performance and responsiveness for the Uwazi instances -- to address scalability. This is expected to take at least one month.

@daneryl
Copy link
Collaborator

daneryl commented May 5, 2020

Isolate requests per project

To solve this problem i have tested AsyncLocalStorage (this feature is on node14, it is being backported to 12.17.0) on some routes in uwazi to check for the performance impact.

This is an overview of how it looks on some of our most used endpoints:

benchmarks

some key points on this:

  • impact is minimal and more or less consistent with other people benchamrking the usage of AsyncLocalStorage on http endpoints.
  • we probably have a lot worse performance problems in our app that we should solve (i have no where to compare now but i think 30 request/second on create entity is bad performance, its true that is not that impactfull unless you want to batch create)
  • it is an Experimental feature but, AsyncLocalStorage provides a simple, high-level API, which hides the complexity and low-level details of async hooks, Thus, it is more stable (in terms of compatibility) than APIs it builds upon, more usefull info here
  • after achieving Multitenancy we can scale by spawning paralell node processes as a way to mitigate certain performance problems
  • its probably the easier and simpler solution for this, allowing us to have context of the tenant anywhere on the app without passing it down all the way.

@RafaPolit
Copy link
Member

RafaPolit commented May 7, 2020

Some things to keep in mind

  • Completely figure out migration and upgrading process
  • Discuss static file serving
  • Adding instances on the fly
  • Option to run cron-jobs on specific nodes
  • Ensure that, for simple / single users, this works seamless without the need to configure several instances, a NAS or file system, and a load balancer to work normally

@daneryl daneryl mentioned this issue May 13, 2020
7 tasks
@daneryl
Copy link
Collaborator

daneryl commented May 13, 2020

Mongoose schema middleware hooks break async context, i had to reimplement updatelogs hook into the methods itself for this to work. this is a known issue that can be affecting other areas, more info here

my approach is going to be as follows:

  • by default uwazi is single tenant and works like before.
  • we have to explicitly activate multi-tenancy with an env variable.
  • if at some point there is no tenant on the context when asked and multy-tenancy is activated we throw an error to detect posible context loses.

@daneryl
Copy link
Collaborator

daneryl commented May 19, 2020

We are changing the implementation so that there is no special case with no tenant, single tenant instance its going to be a normal instance with one tenant, instead of multitenant flag to activate/deactivate, #2393 (comment)

this means every process, tests, migrations, reindex etc should be run inside an async context and with the default tenant.

  • default tenant is going to be added by the uwazi process with name: default and to try to have the most compatibility backwards, its going to be built with the current db, elastic and files configurations based on environments variables.

@daneryl
Copy link
Collaborator

daneryl commented May 25, 2020

We are changing the implementation so that there is no special case with no tenant, single tenant instance its going to be a normal instance with one tenant, instead of multitenant flag to activate/deactivate, #2393 (comment)

this means every process, tests, migrations, reindex etc should be run inside an async context and with the default tenant.

default tenant is going to be added by the uwazi process with name: default and to try to have the most compatibility backwards, its going to be built with the current db, elastic and files configurations based on environments variables.

not true anymore, the current approach is:

  • uwazi is always multitenant.
  • the process adds a default tenant with db configured from the same env variables we had to maintain backwards compatibility
  • running a tenant async context with undefined as the tenant will asume that you are running with the default one
  • if the current tenant is undefined when asking for it, this will always throw an exception.
  • scripts like migrate, reindex etc, are single tenant, they always use the default tenant for backwards compatibility.

@RafaPolit
Copy link
Member

  • Research if Node Cluster Mode is something we want to explore right now

@RafaPolit
Copy link
Member

@daneryl @konzz With multitenant: how would we know if an "instance" is not responding or not working? One thing is the Node instance, we can monitor that. But if the node instance is "healthy" but the Uwazi Instance is failing... how would we know?

@daneryl
Copy link
Collaborator

daneryl commented Jun 15, 2020

@RafaPolit, when we have a multitenant/multiprocess architecture, what does "not responding or not working" mean ? i think if one is not working that will mean all instances are not working.
what can happen are specific instance errors like:

  • error based on data for that instance
  • tenant does not exists or not ready yet

this kind of errors will work normally, the monitoring will depend on that and on how does it monitor ? is it expecting a 200 and that is it ?
the idea is that there is no offline anymore, unless everything is.

@daneryl
Copy link
Collaborator

daneryl commented Jun 16, 2020

there will be 2 new flags, one is i think temporary:

  1. multitenant to indicate this is a multitenant node process, this is to be able to deactivate some tasks that do not work or do not make sense with multitenant.
  2. multiprocess this means we are running a cluster, this flag is going to be used for processes that make and do not make sense on this scenario, like use redis for socket communication.

System keys: this is completely removed this are the alternatives now:

  • the majority of the cases do not need anything special, just the user translating a key will set everything up on the db.
  • for the special cases where this is not possible ej: translate a key that only is being shown as a title of an anchor, therefore the user can not click to edit it. we can write a quick migration.

the next list of "cron jobs" are only going to work with multitenant and multiprocess as false.

  • automatic migration on startup
  • sync
  • evidences vault
  • semantic searchManager
  • topic classification sync

@nwyu nwyu modified the milestones: Roadmap for Q2 2020 (Apr, May, Jun), Roadmap for Q3 2020 (Jul, Aug, Sep) Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants