-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Hi,
CloudFormation's rollback process has a bug regarding its sequence so that the DynamoDB replication rollback fails due to a missing IAM permission. During the rollback process of creating a DynamoDB global table, CloudFormation tries to delete the global table with the reverted IAM role, so it throws DELETE_FAILED status with a status reason below.
(masked confidential data)
Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at invokeUserFunction (/var/task/framework.js:85:19) at process._tickCallback (internal/process/next_tick.js:68:7) Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27) at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/tmp/nod
This was found out while creating an additional DynamoDB replica(global table).
Reproduction Steps
In order to reproduce this issue, the same reproducing scenario of this issue(#10249) should be followed.
What did you expect to happen?
If creating a DynamoDB replica(global table) fails for whatever reason, then it should've reverted to the previous status without any failures.
What actually happened?
The CloudFormation's events are listed as below,
-
CREATE_IN_PROGRESS - Resource creation Initiated
DynamoDB Table replication starts. -
CREATE_FAILED - Failed to create resource. Operation timed out
It fails due to 30 minutes timeout limit. -
UPDATE_ROLLBACK_IN_PROGRESS - The following resource(s) failed to create: [***]
The rollback process initiates. -
UPDATE_IN_PROGRESS
awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***nested stack resource and***DataLambda***are reverted, which contain the IAM policy for generating the global table. -
UPDATE_COMPLETE
Resouces are well reverted which are necessary for deleting the global table. -
UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS
-
DELETE_IN_PROGRESS
It tries to remove the global table which has been interrupted due to the 30 minute default totalTimeout limitation ofCustomResource. -
DELETE_FAILED
Deleting the global table fails.
Please have a look at the CF log that I partially captured from the console, (due to the confidentiality issue, it is not able to copy the whole log)
***DataResourcesStack-LOCAL: creating CloudFormation changeset...
0/4 | 1:28:44 AM | UPDATE_IN_PROGRESS | AWS::CloudFormation::Stack | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***)
1/4 | 1:29:19 AM | UPDATE_COMPLETE | AWS::CloudFormation::Stack | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***)
1/4 | 1:29:23 AM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack | ***DataResourcesStack-LOCAL
1/4 | 1:29:24 AM | DELETE_IN_PROGRESS | AWS::CloudFormation::CustomResource | ***DataTableReplica******
1/4 Currently in progress: ***DataResourcesStack-LOCAL, ***DataTableReplica******
2/4 | 1:30:19 AM | DELETE_FAILED | AWS::CloudFormation::CustomResource | ***DataTableReplica****** Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
at invokeUserFunction (/var/task/framework.js:85:19)
at process._tickCallback (internal/process/next_tick.js:68:7)
Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27)
at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/tmp/n
Environment
- CLI Version : 1.32
- Framework Version: 1.32
- Node.js Version: NodeJS 12.x
- OS : Amazon Linux 2012
- Language (Version): Typescript 3.9.2
Other
- Related issues : We found out this issue when encountered an
operation timeoutissue while creating an additional global table. This global table timeout issue is reported in a separate issue: [aws-dynamodb] Fail to create a global table due to replication time-out #10249 - Suggestions on how to fix : This happens because the IAM policy that
replica-provider'sonEventHandleruses is reverted before deleting a global table. In order to fix this, the rollback sequence of a DynamoDB replica provider nested stack should be reverted only after deletion is succeeded.
This is 🐛 Bug Report