-
-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: directed acyclic graph (DAG) #1563
Merged
+400
−191
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andreynering
approved these changes
Apr 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it look a lot of time. Thanks for your perseverance! This is going to be a great improvement. 👏 👏 👏
@pd93 kudos, great work! |
Is #852 resolved with this? |
@ReillyBrogan Unfortunately not. I've responded in #852 to keep everything together. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR overhauls how Task reads, parses and merges Taskfiles.
An example project
The following document will get quite technical, so I'm going to use an example to help explain. Imagine a project with the following Taskfile structure:
We have three "services", each with their own Taskfile, and a root Taskfile that includes all of them. This allows us to call
task
from the project root and get access to all the tasks in each of the services. In addition to this, we also have ataskfiles
directory with autils.yml
file that contains some common tasks that are shared between the services and the tasks in the root file.This is a fairly typical, if not simple, project structure for a Taskfile-based project and on the surface, Task handles it pretty well. However, as we are about to see, under the hood, Task does not process this very efficiently, and when you add more services and more shared utilities, or run on hardware with limited processing capability, the inefficiencies can become problematic.
The current process
For the sake of brevity, I'm not going to write up the contents of each file. Instead, I am going to illustrate the
includes
by drawing a graph:You can see that our root Taskfile (
Taskfile.yml
) is including each of the service Taskfiles and the utilities Taskfile. Each of the service Taskfiles is then also including the utilities Taskfile. The total setup consist of 5 files.When Task is called, the first thing it does it look for the root Taskfile (aka. the "entrypoint"). The contents of this files are then read into memory and parsed using the
taskfile/ast
package. So far, so good.Once the entrypoint has been parsed, we can now start to evaluate any
includes
that may be specified in the file. In this example, we have 4 includes:services/serviceA.yml
services/serviceB.yml
services/serviceC.yml
taskfiles/utils.yml
Now, for each of these taskfiles, we will recursively repeat the process of reading the files into memory and parsing them. Once all the includes have been followed, we then unwind the stack and merge the ASTs together. This results in one giant Taskfile the contains all the tasks from all Taskfiles in the project.
If you haven't spotted it yet, there is a major problem with this approach. Let's jump into the code and add a log line at the beginning of each call to the function that reads a Taskfile that print the name of the file being read. Here are the results:
As you can see, despite only having 5 files in the project, Task has read a file 8 times. The
taskfiles/utils.yml
file has been read 4 times! The number of times a file is read will correspond directly to the total number ofincludes
in all your Taskfiles combined (or the number of arrow heads in the graph diagram) +1 for the entrypoint.Analysing the problem
There are many simple solutions to the problem described above. For example, we could just add a simple file cache so that a file is not read multiple times. However, I think that the problems with the current approach go deeper than just the inefficiency of reading files multiple times.
I briefly mentioned that the ASTs are merged together before any tasks are run. There are a few disadvantages to this approach:
It doesn't matter how good our file caching is if we end up calling a merge method for each and every include. Finding a way to store the entire Taskfile tree in memory without merging them together would be a much better solution.
A new approach
Taskfiles are no different to other programming languages or config-based tools that have
includes
orimports
. We usually refer to the structures that store dependencies as graphs (as-per the example diagram). Specifically, these are dependency graphs, which are a type of directed acyclic graph (or DAG for short).DAGs have a number of properties that make them very useful:
This PR
An important point to make - This PR does not remove AST merging. The DAG implementation has taken a lot of work to complete. The first DAG changes were made by me nearly a year ago and I have been slowly cherry-picking refactors and improvements from the
dag
branch intomain
. This PR represents the final piece of work to bring the DAG implementation to release.However, this is still just a stepping stone. Currently, the DAG is still resolved into a single AST before tasks are run. The next step would be to refactor the code that fetches/compiles tasks to run off the DAG instead of the merged AST. From here, there are a number of things that we can do...
Future work
This work enables or makes easier a number of future improvements: