Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we protect against breaking changes in construct libraries? #145

Closed
eladb opened this issue Jun 20, 2018 · 10 comments · May be fixed by MechanicalRock/account-reaper#6
Closed

How do we protect against breaking changes in construct libraries? #145

eladb opened this issue Jun 20, 2018 · 10 comments · May be fixed by MechanicalRock/account-reaper#6
Assignees
Labels
ops-excellence Operational Excellence p0 package/awscl Cross-cutting issues related to the AWS Construct Library package/tools Related to AWS CDK Tools or CLI

Comments

@eladb
Copy link
Contributor

eladb commented Jun 20, 2018

@ccfife wrote:

Users expressed a concern that CDK constructs, that are routinely updated to fix issues and add functionality, will not be deterministic. i.e. A developer updates the construct libraries on their machine via a package manager. These constructs contain changes to IAM policy that is not obvious from just inspecting the repo. They then push changes to a stack that results in unexpected security permission changes.

The cdk diff command can help identify unexpected modifications to infrastructure, but that seems like half the story. Is tight versioning of CDK constructs the other part of the story; so that CDK stacks are more deterministic?

@eladb
Copy link
Contributor Author

eladb commented Jun 20, 2018

The short answer is, "yes". The long answer is "it's complicated".

Versioning is the mechanism to ensure that we don't break our customers unexpectedly. Post-1.0, we are committed to semver, which means that breaking changes in any of our modules will be communicated by bumping the "major" component of the version. This is the industry-standard and package managers usually protect consumers from accidentally taking unsolicited major versions.

So the question is: how do we decide what is a breaking change? That's where things get complicated.

Academically, a component is considered backwards-compatible if for the same inputs, it produces the same outputs as a previous compatible version. A common practice to protect against breaking changes is regression tests: when you release a new version you execute the tests of previous compatible versions against it and verify that they all pass. For examples the cracks plugin for semantic-release can do that automatically.

How do we translate these practices to construct libraries?

Theoretically, CDK constructs should be extremely easy to test because they are purely functional: there are no side-effects by design (output is always a synthesized artifact). And at the moment, indeed many of our tests today follow this pattern:

  1. Define a stack, with the constructs you wish to test, configured in a certain way (i.e. define a Bucket with KMS encryption enabled).
  2. Synthesize a CloudFormation template from the stack
  3. Compare the result 1:1 with an expected CloudFormation template

This allows us to very quickly write tests that validate the behavior of the construct under certain conditions and protect against unexpected regressions.

The problem is that these tests are too rigid. It practically means that if we change anything in how the construct behaves (i.e. add a resource to the construct), the test will fail and we will have to update the expected result. In some of these cases, these failures will indicate a breaking change and in other cases, they won't. This means that we can't use these naive testing approach to automatically determine if a change was breaking.

Moreover, the heuristics for determining if a change was breaking are quite elaborate. Here are some examples:

  • An added resource might be fine, but may also open up unwanted security surface.
  • Changes to logical IDs will result in tearing down deployed resources, but this might be fine resources that don't have any persistent state behind them (i.e. event rules) and incredibly bad for resources with state (i.e. a database).
  • More specific IAM permissions could be breaking, but might not (maybe they were overly permissive).
  • etc.

So what should we do today?

This is not a unique challenge reserved for construct libraries. It's a generally complex problem in software engineering. The holly grail is to have good tools to write robust tests that can be used to automatically discover breaking changes in future versions, and that's exactly why we started the aws-cdk-assert library.

There's a long way to go until we can build enough confidence in our understanding of the various use cases and semantics of breaking changes in synthesis, such that we can codify them into aws-cdk-assert and utilize them across our codebase.

In the meantime, no unlike many other software projects, it is our responsibility to review every change to construct libraries critically, and make sure to call-out breaking changes, both in inputs (API shapes) and outputs (synthesized templates), but reason about the actual nature of the change. If a change is breaking, the commit message must include the text BREAKING CHANGE (per conventionalcommits) so that when the software is released, a major-version bump will be included.

How do we reach nirvana?

In parallel, we should try to get to the point where aws-cdk-assert lets us author robust tests and have built-in heuristics to identify suspects for breaking changes. Perhaps we can setup some pipelines to automatically execute old tests against new should-be-compatible versions and open issues to that will help us improve the robustness of the test and add to the knowledge base of aws-cdk-assert.

@eladb eladb added the ops-excellence Operational Excellence label Oct 9, 2018
@fulghum fulghum added p0 package/awscl Cross-cutting issues related to the AWS Construct Library package/tools Related to AWS CDK Tools or CLI labels Mar 6, 2019
@rix0rrr rix0rrr self-assigned this Mar 25, 2019
@rix0rrr
Copy link
Contributor

rix0rrr commented Mar 27, 2019

Plan of attack:

  • Parse stability annotations from docs
  • Expose stability annotations through jsii-reflect
  • Write jsii tool to compare two assemblies for compatibility (jsii-compat?)
  • Update awslint to require people to write docs and add stability annotations to all constructs, no experimental types used as arguments to stable methods.
  • Reflect stability in docs
  • Use jsii-compat in integ tests

@eladb
Copy link
Contributor Author

eladb commented Mar 27, 2019

My thought for a name for this tool was jsii-diff, that can be used to generally compare two APIs, but can also work in a "superset/subset" mode.

I am thinking that perhaps we should run this as an integration test in the pipeline and not during bump. It would cause the pipeline to stop and then people would need to fix the error and commit a fix to revert any breaking changes.

@rix0rrr
Copy link
Contributor

rix0rrr commented Mar 28, 2019

You're right, when we're in post-1.0 we should be finding incompatibilities as early on as possible.

The big question will be: where are we going to get the previous release's .jsii file?

Check it in? Download from GitHub?

@eladb
Copy link
Contributor Author

eladb commented Mar 28, 2019

I've been contemplating on this. I think download from npm because that's that's the authoritative last published version by definition, no?

@rix0rrr
Copy link
Contributor

rix0rrr commented Mar 28, 2019

Yes. It's just not so nice to do a download from NPM for every build.

But I guess we can make it an option.

@eladb
Copy link
Contributor Author

eladb commented Mar 28, 2019

That's why I felt it should be an integration test. Also, it should be possible to just download the tar.gz file from npm without actually installing all the dependencies.

@rix0rrr
Copy link
Contributor

rix0rrr commented Mar 28, 2019

That's not working because jsii-reflect borks if it can't find a package's dependencies.

rix0rrr added a commit that referenced this issue Apr 23, 2019
Add a check to `cdk-test` to confirm that no breaking API changes
have been introduced with respect to the most recently published
version.

Because this will incur an NPM fetch, add a flag (`--quick`/`-q`) to
disable it.

Addresses #145.
rix0rrr added a commit that referenced this issue Apr 26, 2019
Add a check to `cdk-test` to confirm that no breaking API changes
have been introduced with respect to the most recently published
version.

Because this will incur an NPM fetch, add a flag (`--quick`/`-q`) to
disable it.

Addresses #145.
SanderKnape pushed a commit to SanderKnape/aws-cdk that referenced this issue May 14, 2019
Add a check to `cdk-test` to confirm that no breaking API changes
have been introduced with respect to the most recently published
version.

Because this will incur an NPM fetch, add a flag (`--quick`/`-q`) to
disable it.

Addresses aws#145.
@skinny85
Copy link
Contributor

Is this done? :)

@eladb
Copy link
Contributor Author

eladb commented Jul 28, 2019

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ops-excellence Operational Excellence p0 package/awscl Cross-cutting issues related to the AWS Construct Library package/tools Related to AWS CDK Tools or CLI
Projects
None yet
4 participants