Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lambda): updating addToRolePolicy to avoid circular dependency potential #33291

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

IkeNefcy
Copy link
Contributor

@IkeNefcy IkeNefcy commented Feb 4, 2025

Issue

Closes #7016
I did NOT mean to drop the branch which closed #33268 🤦
My bad

Reason for this change

When using the function Function.addToRolePolicy() from lambda, behind the scenes it's calling Role.addToPrincipalPolicy(). What this does is it adds to the existing Lambda Role that is defined for the function, whether that is a predefined Role, or the default Role created from including no Role in the props.
The issue with using specifically this method however, is if the policy statement that you are adding is related to a resource that this same Lambda function is added to as a prop, it causes a circular dependency.

The simplest example of this is from the original issue, but it's not limited to this use case. (I tested with API Gateway also and got the same results)

In the original issues, a user is creating a cognito UserPool, and adding a lambda trigger, which would run the lambda function after a new user has authorized. So in order to allow this lambda to perform operations related to the cognito user pool itself, it needs to add to it's execution role's policy, some kind of cognito permissions to that UserPool.

So;
Function -> Cognito Trigger -> IAM Policy Statement with UserPool reference -> Function.addToRolePolicy()

This snippet from the integ I added shows what this would look like. See also the original issue.

const fn = new lambda.Function(stack, 'MyLambda', {
  code: new lambda.InlineCode('foo'),
  handler: 'index.handler',
  runtime: STANDARD_NODEJS_RUNTIME,
});

const userPool = new UserPool(stack, 'myUserPoolTest', {
  lambdaTriggers: {
    fn,
  },
});

const cognitoPolicy = new iam.PolicyStatement({
  actions: ['cognito:*'],
  resources: [userPool.userPoolArn],
});

fn.addToRolePolicy(cognitoPolicy);

The reason why this causes an issue is because when using Role.addToPrincipalPolicy(), it adds a dependency check to ensure that the "PrincipalPolicy" actually exists first. This causes a circular reference.

Now;

  • Lambda depends on the Policy
  • The policy has a reference call to GetAtt something from the UserPool
  • The UserPool has the lambda in the trigger props thus a dependency on lambda
  • repeat

This logically makes no sense, because why would lambda depend on the policy? It really should just depend on the IAM Role and the policy should also depend on the Role. In fact if you build a template using this error, you can just delete the policy dependency in the Function, and upload by hand to CFN and it works just fine.

So the question is, how can we avoid creating this dependency without some insane fundamental change to aws-iam?

Description of changes

Use Role.attachInlinePolicy() instead behind the scenes of Function.addToRolePolicy().
Role.attachInlinePolicy() will define a 2nd new policy. This means that we will no longer depend on the original policy existing in the first place. Instead we can just use these outside references in their own inline policy.

Although this seems like it's changing a lot about this feature, functionality wise the permissions granted to the lambda function will not change because of this.

  /**
   * The number of permissions added to this function
   * @internal
   */
  private _policyCounter: number = 0;

A counter was added to help dedup the policies that are added, since you should be able to call this more than once without it exploding.

  public addToRolePolicy(statement: iam.PolicyStatement) {
    if (!this.role) {
      return;
    }

    const policyToAdd = new iam.Policy(this, `inlinePolicyAddedToExecutionRole-${this._policyCounter++}`, {
      statements: [statement],
    });
    this.role.attachInlinePolicy(policyToAdd);
  }

Of course the input from the user should remain the same, so a policy statement is passed in, and since Role.attachInlinePolicy() requires a Policy (not just a statement), we can rebuild from the statement input to allow for Role.attachInlinePolicy() to function properly.

Description of how you validated changes

Unit tests needed a few edits to match this, mainly just removing the policy dependency from the lambda function, then changing the reference name of the policy.

For Integ similar changes were made to some snapshots. All integ updates related to this, are "destructive" updates. This is by design and should be reviewed but not changed. The reason is that none of the policies are actually being destroyed, rather, their logical IDs are just being renamed / new policies are being added. So if the primary policy had 3 statements before, now it has 3 policies with one statement each.

I also added integ.lambda-circular-test.ts to specifically check for this circular dependency. I left a comment that this test's snapshot cannot be updated by hand since only CFN throws the error during validation for the circular dependency, so locally building you won't be able to tell if it works or not without using --update-on-failure to update it in the future.

Edit: It's hard to determine which integs are okay to fail locally and not. During my first push the build failed on the PR, and it was showing an integ for an alpha construct. Since just using yarn integ on it's own makes it unrealistic to find the tests I need. I'm just solving each integ fail one at a time using the PR builder instead.

Edit 2: After updating a dump truck of integs, I'm thinking that we might need a few people to review this first.

Edit 3: Turns out everything is made from lambda

Edit 4: I did some tests to ensure that if you update a template with this new structure of policy that it wouldn't break, and they worked just fine. However I only did this on areas of aws I was familiar with. Due to how many integs I had to update, there are clearly a lot more things than I was aware of that use Lambda, and replacement tests like the one I did are probably needed from anyone who is willing.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team February 4, 2025 19:41
@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/large Large work item – several weeks of effort p1 labels Feb 4, 2025
Copy link

codecov bot commented Feb 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.83%. Comparing base (3490d2e) to head (8aeb2b4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #33291   +/-   ##
=======================================
  Coverage   80.83%   80.83%           
=======================================
  Files         236      236           
  Lines       14251    14251           
  Branches     2490     2490           
=======================================
  Hits        11520    11520           
  Misses       2446     2446           
  Partials      285      285           
Flag Coverage Δ
suite.unit 80.83% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
packages/aws-cdk 79.56% <ø> (ø)
packages/aws-cdk-lib/core 82.20% <ø> (ø)

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 4, 2025
@QuantumNeuralCoder QuantumNeuralCoder self-assigned this Feb 4, 2025
@IkeNefcy
Copy link
Contributor Author

IkeNefcy commented Feb 4, 2025

@QuantumNeuralCoder I wasn't dodging you I swear haha

@QuantumNeuralCoder
Copy link
Contributor

Heads up: We are working on merging another large PR in this area. I will start this review once thats out of the way.

@IkeNefcy
Copy link
Contributor Author

IkeNefcy commented Feb 5, 2025

no worries and no rush, if the branch update notifies me that it's conflicting I will solve

@IkeNefcy
Copy link
Contributor Author

IkeNefcy commented Feb 6, 2025

pfft I'm guessing we commited the other PR lmao. Give me some time to verify everything needed.
TIL there is a message "These conflicts are too complex to resolve in the web editor" in gh haha.

@IkeNefcy
Copy link
Contributor Author

IkeNefcy commented Feb 7, 2025

Back in business

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 8aeb2b4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@QuantumNeuralCoder QuantumNeuralCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of top level comments while I dig deeper.

  1. Note the limitation on the number of inline policies per entity iam-quota
  2. Depending on the stack, this might be a breaking change for customers. To play it safe, lets introduce this feature under a feature flag. The feature flag would allow the use-case of circular dependency to be resolved and keep current stacks safe from this impactful change. Let me know if any questions.

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 10, 2025
@IkeNefcy
Copy link
Contributor Author

IkeNefcy commented Feb 10, 2025

  1. The limitation is not on the number of policies, it's the number of characters;
    "You can add as many inline policies as you want to an IAM user, role, or group. But the total aggregate policy size (the sum size of all inline policies) per entity can't exceed the following limits:". Limit for role is 10,240 characters. Unsure if this counts the parts that are always the same, or only the statements.

The number limit is on managed policies looks like. Technically by creating more inline policies it does add characters just from structure, so like the version number etc which is <50 additional characters per. Probably a non issue since it's not going to be adding a significant % of 10,240. And characters can be avoided by using the function less / bundling more if truly needed, but if a user is approaching the cap, they have a different problem, this change would not be the cause of that.

  1. Good idea, I will read over the feature flag instructions and get back to you this week.

@QuantumNeuralCoder
Copy link
Contributor

Sample unit test ref if it helps. Refer the implementation for ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/large Large work item – several weeks of effort p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cognito circular reference when setting lambda trigger permissions
3 participants