Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible corrupted table cache info on application start #3520

Open
1 task
Sussumu opened this issue Oct 19, 2024 · 4 comments
Open
1 task

Possible corrupted table cache info on application start #3520

Sussumu opened this issue Oct 19, 2024 · 4 comments
Labels
bug This issue is a bug. dynamodb p2 This is a standard priority issue

Comments

@Sussumu
Copy link

Sussumu commented Oct 19, 2024

Describe the bug

We recently faced a bug in production where the application would not load any document, stating that the number of hash keys was different than one. This application has been running for a few months with no changes whatsoever so we thought this was some kind of unwanted infrastructure change. After a restart, everything came back to normal.

I didn't put a lot of time investigating the AWS SDK code, but from what I could see, the code checks for the number of hash keys declared by the application which has to be exactly one. It gets this data from a previously cached value which may come from a DescribeTable call or from the code itself depending on the value of the DisableFetchingTableMetadata. Our code didn't explicitly set this attribute so it may have come from a DescribeTable call. Please correct if I'm wrong.

Is it possible that this call may have corrupt data?

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The application was supposed to query an document from its partition key and sort key as it was doing for a few months.

Current Behavior

System.InvalidOperationException: Must have one hash key defined for the table <<TABLE_NAME>>
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.MakeKey(Object hashKey, Object rangeKey, ItemStorageConfig storageConfig, DynamoDBFlatConfig flatConfig)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadHelperAsync[T](Object hashKey, Object rangeKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadAsync[T](Object hashKey, Object rangeKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)

We inject a IDynamoDBContext and load the document like this:

await _context.LoadAsync<TModel>(partitionKey, sortKey, configuration);

Since the restart we didn't face any more errors like this.

Reproduction Steps

I've just copied the most important parts. There's nothing special about this configuration and we basically copy/paste to another projects with no problem. I can't reproduce it now. Maybe if some background call like the DescribeTable that I've mentioned is altered we can get the same error.

using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DataModel;
using Amazon.Runtime;

const string serviceUrl = "http://localhost:8000/";
const string authenticationRegion = "us-west-1";

var localstackCredentials = new BasicAWSCredentials("local", "local");
var dynamoDbConfig = new AmazonDynamoDBConfig
{
    ServiceURL = serviceUrl,
    AuthenticationRegion = authenticationRegion
};

var dynamoDbClient = new AmazonDynamoDBClient(localstackCredentials, dynamoDbConfig);
var dynamoDbContext = new DynamoDBContext(dynamoDbClient);

var configuration = new DynamoDBOperationConfig
{
    Conversion = DynamoDBEntryConversion.V2,
    ConsistentRead = true,
    RetrieveDateTimeInUtc = true
};

// Exception is thrown here
// The document exists in dynamo
// Query is on the database itself, not in any GSI or LSI
var document = await dynamoDbContext.LoadAsync<Model>("partitionKey", "sortKey", configuration);

// Table name is correct
// We don't configure any other attribute like [DynamoDBHashKey]
[DynamoDBTable(TableNames.SOME_CONSTANT)]
public class Model
{
    public string PartitionKey { get; set; }
    public string SortKey { get; set; }
}

Possible Solution

As I said, I think it's related to the underlying DescribeTable. I assume that disabling DisableFetchingTableMetadata and manually specifying the keys may correct this since it's one less moving part.

Additional Information/Context

The bug started after a Kubernetes pod restart after a node change. All other pods including other ones that query DynamoDb on the same account restarted but only this one got the bug.

AWS .NET SDK and/or Package version used

AWSSDK.DynamoDBv2 Version="3.7.300.12"
AWSSDK.Extensions.NETCore.Setup Version="3.7.300"
AWSSDK.SecretsManager Version="3.7.301.11"
AWSSDK.SecurityToken Version="3.7.300.22"

Targeted .NET Platform

.NET 7.0

Operating System and version

Custom Alpine x64 image

@Sussumu Sussumu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 19, 2024
@bhoradc bhoradc added needs-reproduction This issue needs reproduction. dynamodb p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Oct 21, 2024
@bhoradc bhoradc self-assigned this Oct 21, 2024
@Sussumu
Copy link
Author

Sussumu commented Oct 23, 2024

Unfortunately we got the same error on another table.. This application was ported from ECS to EKS last week and it's allocated on another K8s namespace.

Versions:
AWSSDK.Core 3.7.303.27
AWSSDK.Core.SecretsManager 3.7.301
AWSSDK.SecurityToken 3.7.300.27
AWSSDK.DynamoDBv2 3.7.300.26
AWSSDK.Extensions.NETCore.Setup 3.7.1

@bhoradc
Copy link

bhoradc commented Oct 23, 2024

Hello @Sussumu,

Thank you for reporting the issue. I tried to reproduce it at my end using the code snippet you provided, but unfortunately, I was unable to do so.

However, you rightly pointed out that the issue appears to be non-reproducible at will. I will discuss and review this further with the team to understand the root cause of the problem.

Thanks again for bringing this to our attention.

Regards,
Chaitanya

@bhoradc bhoradc added needs-review and removed needs-reproduction This issue needs reproduction. labels Oct 23, 2024
@bhoradc bhoradc removed their assignment Oct 23, 2024
@normj
Copy link
Member

normj commented Nov 8, 2024

@Sussumu Your first example looks like it is using LocalStack, was your second incident also using LocalStack? You are correct by default the DescribeTable call is used and it is expect the table won't change the metadata is cached. Never heard of DynamoDB service ever returning in accurate information from the DescribeTable call. I'm wondering if there was some issue on the LocalStack side.

@dscpinheiro dscpinheiro added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-review labels Nov 8, 2024
@Sussumu
Copy link
Author

Sussumu commented Nov 18, 2024

@normj Oh it was just an example, in production we are using a regular AWS account for both cases. We're still experiencing this issue though it's not common. Maybe less than once a week.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. dynamodb p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

4 participants