-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nesting Bug When DynamoDB is Mid Provisioning Capacity #1406
Comments
I'll take a look to see if I can try and reproduce. Are you seeing this on the SDK's automatic retries, or on retries within your code? If the latter, you should look for anything where you might be mutating your items hash, as that seems the most likely culprit. I'll investigate just in case the SDK is doing something odd on retries, but this is unlikely. |
It's the second time the unprocessed items array hits the |
Actually, I think I see it now. It looks like you're taking the raw unprocessed items hash, and plumbing it right back in, and the simple attributes helper it making it a map one level deeper. I'll work on an example of how you can avoid this later, but you can also avoid this by turning off simple attributes at the client level. |
Hmm interesting I actually started off with this example by trevorrowe: items = (1..25).map do |n|
{
put_request: {
item: {
"id" => "value#{n}"
}
}
}
end
unprocessed_items = nil
100.times do
# the target table I created with 1 write capacity units to ensure I will be throttled on batch write
r = dynamodb.batch_write_item(request_items: { "aws-sdk-slow": items })
if r.unprocessed_items.count > 0
unprocessed_items = r.unprocessed_items
break
end
end
dynamodb.batch_write_item(request_items: unprocessed_items) So, I was under the impression I could just throw it back in without disabling @dynamodb ||= Aws::DynamoDB::Client.new(simple_attributes: false)
def item(batch)
{ item: {
user: { s: batch['user'] },
pic: { s: batch['pic'] },
bio: { s: batch['bio'] },
fav_movie: { s: batch['movie'] },
fav_tv_show: { s: batch['fav_tv_shows'] },
age: { n: batch['age'] },
premium_user: { bool: true },
referral_source: { s: batch['source'] }
} }
end |
I'm taking a look at that now, because you should be right about that. |
Okay, I ran code substantially similar to your example above, and had no issues - the items were put in the format you would expect (with simple attributes ON). Looking now for differences in behavior. |
Question: Are you able to reproduce this on demand? If so, it would be interesting to see the response shape you're getting back from unprocessed items. For example, if you captured it in byebug to see if it matches what you would expect. |
Ok, so I reproduced it again and added some more logging. I was also a bit suspicious of the Here's the updated code with all kinds of logging: def write_to_dynamo(items)
begin
tries ||= 5
resp = @dynamodb.batch_write_item(request_items: items)
@logger.info("Dynamoe Respones: #{resp}")
rescue StandardError => e
@logger.error("#{e}\nError on: #{{ request_items: items }}")
@logger.error("Dynamo Response: #{resp}")
sleep(2)
retry unless (tries -= 1).zero?
@logger.fatal('Fatal Dynamo Errors, Programming Aborting')
abort
end
handle_unprocessed_items(resp.unprocessed_items)
end
def handle_unprocessed_items(unprocessed_items)
if unprocessed_items.count > 0
(@tries + 1)
log_failure(unprocessed_items[ENV['DYNAMO_TABLE']])
@logger.info('Printing unprocessed_items')
p unprocessed_items
sleep(@tries * 0.75)
write_to_dynamo(unprocessed_items)
else
reset_tries
print_sent_message
end
end Logs:
I hope that helps you as it still seems odd to me 😄 |
By the way, I can reproduce this by just setting the writes to one and having two Ruby workers running against the same Dynamo Table. |
Ok, so it looks like when it hits the retry logic, it retries the transmuted array rather than the original array. |
That's interesting, I can try to recreate the error by using the |
So, I made a simple example to try and recreate this, and I'm not getting the same issue - retries are as expected: require 'aws-sdk'
require 'securerandom'
require 'byebug'
def writer
client = Aws::DynamoDB::Client.new
1000.times do
batch = []
25.times do
batch << {
put_request: {
item: {
"uuid" => SecureRandom.uuid,
"int" => rand(1000),
"map" => {
"m1" => { "a" => rand(25) },
"m2" => { "b" => rand(25) },
"m3" => { "c" => rand(25) }
}
}
}
}
end
@retry_flag = false
begin
if @retry_flag
byebug
end
resp = client.batch_write_item(
request_items: {
"BatchModel" => batch
}
)
rescue StandardError => e
@retry_flag = true
retry
end
end
end
one = Thread.new { writer }
two = Thread.new { writer }
thr = Thread.new { writer }
one.join
two.join
thr.join When this eventually fails and I hit the byebug component, I can try to dive a bit deeper in to your code, but so far I'm still unable to recreate the issue. |
We also have a bit of a hunch as to a possible source. Checking that now. |
Okay, I do see a possible bug source here. The code works under normal circumstances, but it's possible you're getting an unusual error, like perhaps a network error, under production load and our plugin stack is consuming the mutated hash for a built-in retry. In our Still investigating. |
Quick note - this is the top priority bug we have remaining, but it is proving rather hard to repro for failing test purposes. Still working on it. |
No worries 😃 . |
If you have a small reproducing example, possibly with wire trace examples, that would help. Still investigating even if that is hard to get. |
Hey @awood45 sorry for the delay. Going out of town, will work to reproduce over the weekend. |
Okay finally made a separate reproducible example (this is all the code needed to reproduce again): require 'aws-sdk'
require 'byebug'
require 'logger'
@dynamo_table = 'test-throttle'
@dynamodb ||= Aws::DynamoDB::Client.new(profile: 'dev')
@logger = Logger.new(STDOUT).tap { |log| log.progname = 'DynamoDB' }
@logger.level = Logger::INFO
@tries = 0
@https_tries = 0
@spot = 0
@users = *('0'..'50')
def build_put_request(data)
while data.any?
batch = data.slice!(0, 25)
items = put_request_items(batch)
write_to_dynamo(@dynamo_table => items)
end
end
def put_request_items(batch)
Array.new(batch.length) { |_i| { put_request: item } }
end
def item
@spot < 50 ? @spot += 1 : @spot = 0
{ item: {
user: @users[@spot],
pic: 'great pic',
bio: 'such words',
fav_movies: ['balto', 'the lion king', 'swingers'],
fav_tv_shows: ['spongebob', 'dexter', 'stranger things'],
age: SecureRandom.uuid,
premium_user: true,
referral_source: 'Johnson & Jonson'
} }
end
def write_to_dynamo(items)
byebug if @https_tries > 1
resp = @dynamodb.batch_write_item(request_items: items)
handle_unprocessed_items(resp.unprocessed_items)
rescue StandardError => e
@logger.error("#{e}\nError on: #{{ request_items: items }}")
@https_tries += 1
retry
end
def handle_unprocessed_items(unprocessed_items)
if unprocessed_items.count > 0
log_failure(unprocessed_items[@dynamo_table]) if (@tries + 1) > 1
sleep(@tries * 0.75)
write_to_dynamo(unprocessed_items)
else
@tries = 0
@logger.info('Sent 25 items')
end
end
def make_dummies
build_put_request([*(1..10_000)])
end
make_dummies Logs
I did this on a newly created table. I did not hit the error I hit before but instead a |
Sorry about the delay here - I couldn't repro with that script, but actively picking this up again. |
no worries 😃 |
Under a particular set of circumstances, a return value struct can have SimpleAttributes transformations applied multiple times, which can case incorrect types and values to be written (or more likely, fail to be written). This doesn't happen for hashes, but we hadn't applied the same approach to structs. This change applies the same approach to type structs, including a previously failing test case. Resolves GitHub issue #1406
Okay, I think we got it. Going to try and get this out with the next release. |
Please feel free to reopen if you see any recurrence of this issue after pulling down version |
Very cool, thanks so much 💯 ! |
Under a particular set of circumstances, a return value struct can have SimpleAttributes transformations applied multiple times, which can case incorrect types and values to be written (or more likely, fail to be written). This doesn't happen for hashes, but we hadn't applied the same approach to structs. This change applies the same approach to type structs, including a previously failing test case. Resolves GitHub issue #1406
I'm using this code to do bulk writes to DynamoDB:
I get this error when I run out of provisioned capacity:
ERROR -- DynamoDB: The level of configured provisioned throughput for one or more global secondary indexes of the table was exceeded. Consider increasing your provisioning level for the under-provisioned global secondary indexes with the UpdateTable API
So now, on the retry (Another app updates the provisioned writes dynamically), I get this error:
ERROR -- DynamoDB: The provided key element does not match the schema
When I look at the data suddenly I have this odd nestings that appear:
The request should look like this as far as I understand:
This only seems to happen when I'm mid provisioning more capacity or I did something wrong?
AWS Gem Versions:
The text was updated successfully, but these errors were encountered: