Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion source/common/grpc/async_client_manager_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,12 @@ void AsyncClientManagerImpl::RawAsyncClientCache::evictEntriesAndResetEvictionTi
// Evict all the entries that have expired.
while (!lru_list_.empty()) {
MonotonicTime next_expire = lru_list_.back().accessed_time_ + EntryTimeoutInterval;
if (now >= next_expire) {
if ((now >= next_expire) ||
// since 'now' and 'next_expire' are in nanoseconds, the following condition is to check if
// the difference between them is less than 1 second. If we don't do this, the timer will
// be enabled with 0 seconds, which will cause the timer to fire immediately. This will
// cause cpu spike.
(std ::chrono::duration_cast<std::chrono::seconds>(next_expire - now).count() == 0)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good way to test this? How did you chose 1 second as a good threshold?

@vikaschoudhary16 vikaschoudhary16 Oct 12, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Lizan for taking a look.

How did you chose 1 second as a good threshold?

While enabling timer at L218, duration is round-off to seconds. Spike happens when this round-off result is 0. Hence I used same rounding-off in the if condition to avoid timer initialiazation with 0s. From the logs, that I shared in PR description as well, when I figured out that if difference between next_expire and now is less than a sec it gets rounded off to 0, hence mentioned 1 sec in code comment. May be it gets round off to 0 only when diff is less than .5sec, I am not exactly sure about this round-off.

Do we have a good way to test this?

I will go through existing tests and see if we have a way to test this out

// Erase the expired entry.
lru_map_.erase(lru_list_.back().config_with_hash_key_);
lru_list_.pop_back();
Expand Down