Skip to content

[aws-dynamodb] Fail to rollback if global table creation is failed #10256

@sungbokang

Description

@sungbokang

Hi,

CloudFormation's rollback process has a bug regarding its sequence so that the DynamoDB replication rollback fails due to a missing IAM permission. During the rollback process of creating a DynamoDB global table, CloudFormation tries to delete the global table with the reverted IAM role, so it throws DELETE_FAILED status with a status reason below.

(masked confidential data)

Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at invokeUserFunction (/var/task/framework.js:85:19) at process._tickCallback (internal/process/next_tick.js:68:7) Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27) at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/tmp/nod

This was found out while creating an additional DynamoDB replica(global table).

Reproduction Steps

In order to reproduce this issue, the same reproducing scenario of this issue(#10249) should be followed.

What did you expect to happen?

If creating a DynamoDB replica(global table) fails for whatever reason, then it should've reverted to the previous status without any failures.

What actually happened?

The CloudFormation's events are listed as below,

  1. CREATE_IN_PROGRESS - Resource creation Initiated
    DynamoDB Table replication starts.

  2. CREATE_FAILED - Failed to create resource. Operation timed out
    It fails due to 30 minutes timeout limit.

  3. UPDATE_ROLLBACK_IN_PROGRESS - The following resource(s) failed to create: [***]
    The rollback process initiates.

  4. UPDATE_IN_PROGRESS
    awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource*** nested stack resource and ***DataLambda*** are reverted, which contain the IAM policy for generating the global table.

  5. UPDATE_COMPLETE
    Resouces are well reverted which are necessary for deleting the global table.

  6. UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS

  7. DELETE_IN_PROGRESS
    It tries to remove the global table which has been interrupted due to the 30 minute default totalTimeout limitation of CustomResource.

  8. DELETE_FAILED
    Deleting the global table fails.

Please have a look at the CF log that I partially captured from the console, (due to the confidentiality issue, it is not able to copy the whole log)

***DataResourcesStack-LOCAL: creating CloudFormation changeset...
 0/4 | 1:28:44 AM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack      | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***) 
 1/4 | 1:29:19 AM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack      | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***) 
 1/4 | 1:29:23 AM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack      | ***DataResourcesStack-LOCAL 
 1/4 | 1:29:24 AM | DELETE_IN_PROGRESS   | AWS::CloudFormation::CustomResource | ***DataTableReplica****** 
1/4 Currently in progress: ***DataResourcesStack-LOCAL, ***DataTableReplica******
 2/4 | 1:30:19 AM | DELETE_FAILED        | AWS::CloudFormation::CustomResource | ***DataTableReplica****** Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
    at invokeUserFunction (/var/task/framework.js:85:19)
    at process._tickCallback (internal/process/next_tick.js:68:7)
Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
    at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27)
    at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/tmp/n

Environment

  • CLI Version : 1.32
  • Framework Version: 1.32
  • Node.js Version: NodeJS 12.x
  • OS : Amazon Linux 2012
  • Language (Version): Typescript 3.9.2

Other

  • Related issues : We found out this issue when encountered an operation timeout issue while creating an additional global table. This global table timeout issue is reported in a separate issue: [aws-dynamodb] Fail to create a global table due to replication time-out #10249
  • Suggestions on how to fix : This happens because the IAM policy that replica-provider's onEventHandler uses is reverted before deleting a global table. In order to fix this, the rollback sequence of a DynamoDB replica provider nested stack should be reverted only after deletion is succeeded.

This is 🐛 Bug Report

Metadata

Metadata

Assignees

Labels

@aws-cdk/aws-dynamodbRelated to Amazon DynamoDBbugThis issue is a bug.closed-for-stalenessThis issue was automatically closed because it hadn't received any attention in a while.needs-triageThis issue or PR still needs to be triaged.response-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions