Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform: new apply graph builder based on the "diff" #9388

Merged
merged 82 commits into from
Oct 20, 2016

Conversation

mitchellh
Copy link
Contributor

@mitchellh mitchellh commented Oct 15, 2016

This introduces a new graph builder for the Apply operation that builds a graph based on the diff primarily, adding on to that thereafter.

All prior tests pass, with many new tests added. The prior graph builder is kept and a toggle is available to switch between the old and new so that we can continue to use the old by default. A new command line flag has been added to terraform apply called -Xapply-new that users can use to use this new graph. The reason why you may want to do that is because this new graph results in a much cleaner graph that is much less likely to have erroneous cycles. This graph will become the default for Terraform 0.8.

NOTE: Don't be concerned/worried about the vast unbalance between additions/removals (+4,000/-150) on this PR for a refactor. All the old behavior was retained and can be flagged on/off as explained below. Once we remove the old behavior we'll have a huge removal commit.

Why? ❓ ❓ ❓

Historically (and currently) Terraform builds one giant graph for any operation: refresh, plan, validate, apply, etc. The original design thesis was: Terraform should be a single worst-case all-encompassing graph, and depending on the operation we'll prune it down to a more efficient and correct shape.

This has actually worked quite well up to this point. But it has a couple major problems:

Difficult to understand logic that may be irrelevant: Because the graph is generated for all operations, when you're doing a terraform apply, the graph goes through a series of transforms and has nodes that may make no sense for your operation. This makes it difficult to both understand as a user and developer when debugging.

Issues can cascade across operations: When fixing a bug for terraform apply, it is quite easy to introduce an unexpected bug (an unknown unknown, hence untested) in other operations, such as terraform plan. We've done this many times.

Erroneous cycles: Cycles are created that shouldn't be there since the nodes themselves shouldn't be there (are no-ops). We have a complex no-op pruning transform currently, but the logic is unclear and the pruning is very fragile. We'd be better off not having these nodes in the graph in the first place!

Solution 📈

As a solution, we decided to write a new graph builder! terraform apply would therefore create a graph using a unique builder that only builds graphs for the apply operation, and isn't used for refresh, plan, etc. For now, we'll continue to use the original graph builder for other operations.

The new graph builder has one important difference: it starts building the graph based on the diff. Only resources that are present in the diff are originally added to the graph. This has a major safety improvement: it becomes much more difficult for apply to modify resources that aren't in the plan. This is another layer of security with the plan.

In addition to the safety improvement, the resulting graph is much easier to understand.

Note that this isn't the first "unique" graph builder we've built: the terraform import feature is built entirely on an import-only graph builder. This small-ish example is what gave me the confidence that this was doable, and talking with the TF team at HashiCorp pushed me to believe it was the right thing to do for the other operations.

Stability and Testing ⚠️

This PR didn't touch any of the existing apply tests, of which there are more than 100, covering various use cases such as create before destroy, nested modules, provider inheritance, and more. This PR did add a significant number of new unit tests to cover its various components.

The old graph is retained, and all the same tests pass for both the old and new graph.

The new graph is disabled by default, but can be enabled with a new flag to terraform apply: -Xnew-apply. Users can optionally choose to run the new graph using this undocumented flag. Fair warning that this is an experimental feature. Users however may want to try running it in certain cases since it does avoid some cycles that can occur with the old graph, and any new apply graph fixes will be done on the new graph and not the old.

When you run terraform apply -Xnew-apply, the following warning also shows up:

2016-10-15 at 3 20 pm

After the shadow graph (#9334) feature is merged, the new apply graph will be set as the experimental graph that is run as the shadow against the old graph. This will help safely ensure that the end states always remain the same. We'll enable this sometime in 0.7.x.

After a period of shadow graph and -Xnew-apply guarding in 0.7.x, this graph will become the default in 0.8.0.

Guide for Review 😎

  • New graph builder is in terraform/graph_builder_apply.go
  • New transforms are all prefixed with transform_ (filename)
    • New nodes are all prefixed with node_ (filename)
    • We now use custom dag.Edge implementations! Those exist in edge_*.go files.

I recommend starting with the graph builder and following the transformations. I was able to reuse many transformations, but had to write many new ones (primarily due to removing graph flattening). In future new graphs for other operations, I believe we'll be able to reuse quite a lot of the transforms created here, and over time remove the older transforms.

For go test, it will currently only run against the old graph. You have to set -Xnew-apply on go test as well to get the test to run against that. I've tested that it passes locally, and I've updated .travis.yml to run against both.

🙏 ⛪ 📈

@kwilczynski
Copy link
Contributor

@mitchellh this is AMAZING! I love it.

I am not sure if you have the time, but would you be able to capture shadow graph and the new graph builder concepts in a blog post? We need to get early adopters/testers now :)

@mitchellh
Copy link
Contributor Author

@kwilczynski I'm hoping to do just that!

This doesn't explicitly set `rs.Provider` on destroy nodes.

To be honest, I'm not sure why this was done in the first place (git
blame points to 6fda7bb). Tests always
passed without it, and by adding it it causes other tests to fail. I
should've never changed those other tests.

Removing it now to get tests passing, this also reverts the test changes
made in 8213824.
A public API TestNewRawConfig was added to easily create a raw config
for testing, but this conflicted with the test. Just rename it.
This adds a proper unique extra field so that the shadow graph can
properly compare values.
This was happening if the shadow initializes a provider that is never
used by the real side. We need to make sure it starts closed.
This introduces failing tests. How many is unknown since shadow graph
errors cause a panic.
This adds the proper logic for "disabling" providers to the new apply
graph: interolating and storing the config for inheritance but not
actually initializing and configuring the provider.

This is important since parent modules will often contain incomplete
provider configurations for the purpose of inheritance that would error
if they were actually attempted to be configured (since they're
incomplete). If the provider is not used, it should be "disabled".
Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait to see what we can uncover with this!

@mitchellh
Copy link
Contributor Author

:O Merging. After the merge I'm going to:

  • Disable the shadow graph when running WITHOUT -Xnew-apply. I think for 0.7.next we'll just roll out the shadow graph in isolation with explicit -Xnew-apply. After that, we'll enable the new apply graph to shadow the original at all times.

@ghost
Copy link

ghost commented Apr 21, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants