-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve expensive tasks in gatsby (a.k.a. Jobs API) #19831
Comments
This was referenced Nov 27, 2019
wardpeet
added a commit
that referenced
this issue
Jan 21, 2020
Added a new action called createJobV2 to support the new api. createJob is still available to keep backward-compatibility. I'll add a deprecation message when creatJobV2 is fully operational. The createJobV2 needs a job object that contains 4 arguments name, inputPaths, outputDir & args. These args are used to create a unique content digest to make sure a job is deterministic. InputPaths are converted into relative paths before sending it to the worker as they need to be filesystem agnostic. More info on why can be found in #19831
Closing because it has been done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Jobs api
Why/what?
Gatsby has some cpu/io intensive tasks while compiling your website to being blazing fast. Some examples of tasks I'm talking about are image processing, html generation, query running. Currently, it's up to the plugin author to cache & coordinates these tasks which can be burdensome.
We want to make it simpler and more robust, heavy tasks should only be handled once as they are expensive. Jobs should also be deterministic so we can save them to disk and re-run them when the gatsby process got interrupted. We're converting the old createJob API to a new one that handles most of the above-described issues.
Implementation details
How would this api look like?
Some notes about these properties:
All arguments need to be serializable which leads to no functions, classes,... InputPaths & outputDir need to live inside the gatsby root. We'll have some validation checks for this.
Now we have a job, how does a job know what action to execute? A plugin needs a worker.js that has an exported function as the name of the event. The function will receive inputPaths, outputDir & args as an argument. When the worker's promise is resolved we mark the job as complete.
What benefits does this Job API provide?
Well, we'll be able to create a deterministic hash based on the arguments, outputDir and inputPaths (content hash). Having a deterministic hash per job makes it super easy to cache results and avoid double work. Coupling a job with its process allows us to control the full flow of a job, applying backpressure, making sure the job is done and more. Saving jobs to disk is also a big win as we can re-run jobs when the process was interrupted.
Simple flowchart:

Enough jibber-jabber, here's an example.
Gatsby-plugin-sharp will have a worker that looks like this
inside our plugin we can do
TODO:
The text was updated successfully, but these errors were encountered: