docs: guide on horizontal scaling & auto-scaling #1183

tirumaraiselvan · 2018-12-11T11:50:45Z

Guide on horizontally scaling Hasura, setting up auto-scale and benchmarking how fast auto-scale works on a substrate like GKE.

coco98 · 2018-12-11T11:52:23Z

@tirumaraiselvan Can you suggest a full list of topics / a skeleton that we need to cover? Would be easier for someone to pick this up then.

tirumaraiselvan · 2018-12-11T12:07:29Z

Here is the list ( leaving it to @rikinsk to structure them ) :

Can we run more than one Hasura instance?
Attaching reverse proxy/ load balancer to run multiple Hasura instances
Scalability of queries/APIs
Scalability of Event Triggers
Managing schema across multiple instances

amesas · 2018-12-15T19:56:06Z

How to add more/remove hasura nodes without Downtime
How to add/remove/upgrade Postgres replicas without Downtime
Strategies for HA / Horizontal Scaling: backups, monitoring ( maybe prometheus? ), etc..
For complex setups, the hard part is not deploying it, but maintaining it: logs, upgrades, incidents, monitoring, etc...
Deploying it, is as easy a "copying and pasting" a blog post

rikinsk-zz · 2019-01-18T13:46:20Z

Tracked here #940

dsandip · 2019-03-20T09:17:50Z

Now that #1182 is live, this seems to be a better-documented issue to documenting a guide on scaling.

litchfield · 2019-03-28T06:14:21Z

Hasura seems to scale great, and easy enough to load balance and auto-deploy.

It's the load on Postgres caused by subscriptions that concerns me. Each unique, open subscription query generates an sql query every second.

My initial thoughts --

Read/write splitting would be helpful for horizontal scaling (easy and saves using a proxy)
Could subscriptions use temporary(?) triggers instead of polling? If not, maybe we could tune the polling frequency on a per-table basis?
Subscriptions should be optional on a per-table basis
Subscription permissions per-role ie "Allow role 'X' to use subscriptions"

0x777 · 2019-03-28T06:51:36Z

Read/write splitting would be helpful for horizontal scaling (easy and saves using a proxy)

#1847

Subscriptions should be optional on a per-table basis
Subscription permissions per-role ie "Allow role 'X' to use subscriptions"

I would prefer the per (role, table) at the server as it sits well with the existing system. The console can have UX for 'allow subscriptions for role 'x' on all tables'. #1892

It's the load on Postgres caused by subscriptions that concerns me.

This is changing very soon (end of this week, work in progress). The new architecture will cut down load on postgres significantly. We'll document the architecture.

massimiliano-mantione · 2019-05-12T22:08:04Z

This is changing very soon (end of this week, work in progress). The new architecture will cut down load on postgres significantly. We'll document the architecture.

Is this #1934 or something else?

I am evaluating Hasura and I have concerns about subscriptions scalability, I'd like to understand how they work... my use case could be subscription-heavy.

dsandip · 2019-05-13T10:55:58Z

@massimiliano-mantione Yes, the optimisations to subscriptions did go as part of #1934 and a couple of other small PRs. These changes make subscriptions highly scalable. We are a few days away from publishing numbers from performance benchmarks.

What kind of scale are you expecting? (You can DM @coco98 /@tanmaig#8316 or me / @sandip#8048 on the community Discord server if that's preferable)

pjoe · 2019-05-16T06:54:51Z

Very interested in this too, also have a potential workload that would be very subscription heavy and require high scalability.

Are subscriptions still polling the db after latest optimizations?

riccardogiorato · 2019-08-14T08:53:37Z

Any news on the subscriptions polling the db as @pjoe already asked?

marionschleifer · 2019-08-14T10:07:23Z

@pjoe @giorat sorry for the delay in answering your question.

I would like to refer you to this article: https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md. This talks about the current architecture and about how we can handle 1M concurrent subscriptions in our benchmarks.

Let us know if you have any additional questions 🙂

hrgui · 2020-05-02T18:16:25Z

Is there a way to scale horizontally / vertically in kubernetes w/ the cli-migrations container?

Or should the process that does the migrations should be separated from the scaling? Because I imagine, if we run 3 of the cli-migrations container, they will all attempt to lock the DB due to migrations.

shahidhk · 2020-05-11T08:32:10Z

@hrgui You can scale the cli-migrations image to multiple replicas. When a new rollout happens, the default strategy on kubernetes is that it will replace pods one by one. Hence, only one pod of the new rollout will be running at any point and once that finishes applying migrations, other pods will just skip migrations.

This will need health checks to be configured too.

beepsoft · 2020-10-10T09:42:40Z

Hi,

is this guide still in the making? I am specifically interested in whether running Hasura on Google App Engine Flexible Environment (#1550) or Google Cloud run is possible now? There are these other issues #1078 and #940 that point here but then I could not find any reference to a scaling guide.

So, is this/will this be available somewhere?

Thanks!

kenptr · 2020-10-29T17:42:35Z

For anyone wondering how to run graphql-engine on GAE (Google App Engine):

Dockerfile

FROM hasura/graphql-engine:v1.3.2

app.yaml

runtime: custom
env: flex
service: hasura

network:
  session_affinity: true

liveness_check:
  path: "/healthz"

readiness_check:
  path: "/healthz"

env_variables:
  HASURA_GRAPHQL_DATABASE_URL: postgresql://USER:PASS@IP_ADDR/DATABASE

mdemierre · 2020-11-05T11:42:40Z

@hrgui You can scale the cli-migrations image to multiple replicas. When a new rollout happens, the default strategy on kubernetes is that it will replace pods one by one. Hence, only one pod of the new rollout will be running at any point and once that finishes applying migrations, other pods will just skip migrations.

This will need health checks to be configured too.

@shahidhk Is it still valid for the v2 migrations? Now that metadata is separate, I can see the following scenario:

We have 3 instances of the cli-migrations image running (pods, VMs, whatever).
We deploy a new version with modified metadata and migrations
The migrations start to auto-apply from the first updated instance, it updates the migrations and metadata
A random instance crashes and is restarted with the old config
- It tries to reapply the old migrations: OK, no conflict because of transactions + will not apply already applied stuff
- It tries to reapply its metadata: BOOM, we have old metadata
The next upgraded instance will fix it

Between 4 and 5 we risk unavailability I think due to incorrect metadata. Is this a possible scenario?

I think an horizontal scaling guide covering migrations and so on would be useful to have.

jync · 2020-11-12T03:59:57Z

Hi... would absolutely love to know exactly what Hasura is thinking of in terms of horizontally scaling. A guide will go a long way to help the community understand.

I'm also looking at this - #1182 and #1574

It appears the metadata update once it hits the DB will trigger all instances connected to the DB to update? Is this correct?

How does this work with a rolling update across a cluster? Will it signal ALL instances to update then? Won't that break the rolling deployment?

mdemierre · 2020-11-12T07:50:06Z

@jync Thanks for the links to other issues. It indeed seems that #1574 made it so the metadata will be automatically reloaded by all running instances.

Does that make the scenario described in my other comment (#1183 (comment)) even worse (a faulty instance restarting and overriding the metadata with an old version)?

jync · 2020-11-12T09:00:01Z

I've seen Hasura not start if the hdb_catalog is from an older version of hasura.

If metadata versioning is incorporated, I'd imagine you do something similar? (ie no-op if the incoming schema version is higher than the version found in the metadata files)

But I'm not sure if there is metadata is being versioned. There is a metadata/version.yaml file but I think this captures the metadata schema version (ie the version that hasura understands).

serefarikan · 2021-08-27T17:20:50Z

After spending the last 30 to 45 minutes trying to figure out what is meant by horizontal scaling and auto scaling, I gave up and decided to ask under this issue :)

according to https://hasura.io/learn/graphql/hasura-advanced/performance/2-horizontal-scaling/

"Hasura Cloud lets you scale your applications automatically without having to think about the number of instances, cores, memory, thresholds etc. You can keep increasing your number of concurrent users and the number of API calls and Hasura Cloud will figure out the optimizations automagically. But you could have a bottleneck at the database level which is when you might want to scale the database."

Based on the rest of the text on that page: horizontal scaling is hasura figuring out read reqs should be routed to read only replicas, while write reqs. end up routed to master.

What is auto scaling then? It is listed as a feature of cloud version here: https://hasura.io/pricing/ but this issue clearly separates horizontal scaling from auto scaling as per its title, so where is the definition of auto scaling?

By the way, the discussion jumps to #940 then to #1078 and then comes back to this issue, with no references to some documents I can read, at least as far as I can see.

I know it's Friday and I ran out brain cells, so if someone can point me at some documentation that explains the above, it'd would be much apprecaited :)

tirumaraiselvan added the c/docs Related to docs label Dec 11, 2018

ecthiender changed the title ~~[Docs] Guide on horizontal scaling~~ Guide on horizontal scaling Dec 12, 2018

rikinsk-zz mentioned this issue Jan 14, 2019

add horizontal scaling guide #940

Closed

rikinsk-zz closed this as completed Jan 18, 2019

dsandip reopened this Mar 20, 2019

marionschleifer self-assigned this Jul 10, 2019

marionschleifer added p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints e/intermediate can be wrapped up in a week labels Jul 10, 2019

coco98 changed the title ~~Guide on horizontal scaling~~ Guide on horizontal scaling & auto-scaling Sep 5, 2019

rrjanbiah mentioned this issue Sep 18, 2019

Hasura vs Dgraph comparison hypermodeinc/dgraph#3997

Closed

marionschleifer changed the title ~~Guide on horizontal scaling & auto-scaling~~ docs: guide on horizontal scaling & auto-scaling Dec 8, 2020

marionschleifer removed their assignment Dec 8, 2020

marionschleifer removed e/intermediate can be wrapped up in a week p/medium non-urgent issues/features that are candidates for being included in one of the upcoming sprints labels Dec 8, 2020

vaishnavigvs added the a/monitoring label Jun 25, 2021

vaishnavigvs assigned tirumaraiselvan Jun 25, 2021

ajohnson1200 added the t/infra-tools label Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: guide on horizontal scaling & auto-scaling #1183

docs: guide on horizontal scaling & auto-scaling #1183

tirumaraiselvan commented Dec 11, 2018 •

edited by coco98

Loading

coco98 commented Dec 11, 2018

tirumaraiselvan commented Dec 11, 2018

amesas commented Dec 15, 2018

rikinsk-zz commented Jan 18, 2019

dsandip commented Mar 20, 2019

litchfield commented Mar 28, 2019 •

edited

Loading

0x777 commented Mar 28, 2019

massimiliano-mantione commented May 12, 2019

dsandip commented May 13, 2019

pjoe commented May 16, 2019

riccardogiorato commented Aug 14, 2019

marionschleifer commented Aug 14, 2019 •

edited by ecthiender

Loading

hrgui commented May 2, 2020

shahidhk commented May 11, 2020

beepsoft commented Oct 10, 2020

kenptr commented Oct 29, 2020

mdemierre commented Nov 5, 2020 •

edited

Loading

jync commented Nov 12, 2020

mdemierre commented Nov 12, 2020 •

edited

Loading

jync commented Nov 12, 2020

serefarikan commented Aug 27, 2021

docs: guide on horizontal scaling & auto-scaling #1183

docs: guide on horizontal scaling & auto-scaling #1183

Comments

tirumaraiselvan commented Dec 11, 2018 • edited by coco98 Loading

coco98 commented Dec 11, 2018

tirumaraiselvan commented Dec 11, 2018

amesas commented Dec 15, 2018

rikinsk-zz commented Jan 18, 2019

dsandip commented Mar 20, 2019

litchfield commented Mar 28, 2019 • edited Loading

0x777 commented Mar 28, 2019

massimiliano-mantione commented May 12, 2019

dsandip commented May 13, 2019

pjoe commented May 16, 2019

riccardogiorato commented Aug 14, 2019

marionschleifer commented Aug 14, 2019 • edited by ecthiender Loading

hrgui commented May 2, 2020

shahidhk commented May 11, 2020

beepsoft commented Oct 10, 2020

kenptr commented Oct 29, 2020

mdemierre commented Nov 5, 2020 • edited Loading

jync commented Nov 12, 2020

mdemierre commented Nov 12, 2020 • edited Loading

jync commented Nov 12, 2020

serefarikan commented Aug 27, 2021

tirumaraiselvan commented Dec 11, 2018 •

edited by coco98

Loading

litchfield commented Mar 28, 2019 •

edited

Loading

marionschleifer commented Aug 14, 2019 •

edited by ecthiender

Loading

mdemierre commented Nov 5, 2020 •

edited

Loading

mdemierre commented Nov 12, 2020 •

edited

Loading