[common] Fix integer overflow error in JitteredBackOffStrategy found by fuzzer.#10417
Conversation
Signed-off-by: Antonio Vicente <avd@google.com>
…ng the exponential backoff. The multiplers used to be 2**N-1 while now the multiplers are 2**(N-1) Signed-off-by: Antonio Vicente <avd@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the fix. 1 question.
/wait-any
| retry_timer_ = new Event::MockTimer(&dispatcher_); | ||
| EXPECT_CALL(*retry_timer_, enableTimer(std::chrono::milliseconds(24), _)); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(190)); | ||
| retry_timer_ = new testing::StrictMock<Event::MockTimer>(&dispatcher_); |
There was a problem hiding this comment.
FYI we have StrictMock as default in our test runs, so this shouldn't be needed AFAIK
There was a problem hiding this comment.
Sorry, this was left over from earlier attempts to debug the test failures.
| EXPECT_CALL(random_, random()).WillOnce(Return(49)); | ||
| retry_timer_ = new Event::MockTimer(&dispatcher_); | ||
| EXPECT_CALL(*retry_timer_, enableTimer(std::chrono::milliseconds(24), _)); | ||
| EXPECT_CALL(random_, random()).WillOnce(Return(190)); |
There was a problem hiding this comment.
qq: why does this test need to change as well as the HDS test? It seems like this fix shouldn't have any behavior change in the normal case?
There was a problem hiding this comment.
The issue is related to this part of the original implementation that traces back to the first revision of Envoy in github:
uint32_t multiplier = (1 << current_retry_) - 1;
This is not increasing backoff by a factor of 2 each time. I think that what was really intended was:
uint32_t multiplier = 1 << (current_retry_ - 1);
Current backoff sequence: 1, 3, 7, 15...
New backoff sequence: 1, 2, 4, 8, 16...
The biggest difference is the jump from increase from 1 to 3x instead of 2x at the first step. Replicating the current behavior is possible, but the code would be slightly more complicated.
Some discussion about it here:
#3791 (comment)
There was a problem hiding this comment.
Ah OK got it, makes sense, thanks.
There was a problem hiding this comment.
Of course, thanks for digging deeper on unexplained code changes.
Signed-off-by: Antonio Vicente <avd@google.com>
Description: Fix integer overflow in JitteredBackOffStrategy::nextBackOffMs found by fuzzing. After 32 retries, multiplier becomes negative. Of course, this is unlikely to ever happen in production, since it would take about 100 days before we'ld hit 32 retries with base_interval of 1ms and max_interval of uint64_t max.
Risk Level: n/a
Testing: unit
Docs Changes: n/a
Release Notes: n/a