Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Await for a chain worker to be available for eviction #3037

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jvff
Copy link
Contributor

@jvff jvff commented Dec 16, 2024

Motivation

The WorkerState contains an LRU cache of chain workers (more specifically, it keeps open channels to the ChainWorkerActors so that they keep running). Once the cache becomes full, requests for new chain workers randomly attempt to evict an idle chain worker.

Eviction was done by looking for an idle chain worker, in the order of least-recently used endpoints. This means that we iterate through the endpoints, starting with the one that was last used the earliest, and checked if it had a strong count of one or less. If it did, it meant that the endpoint that sent the request had already been dropped, and that only happens after the response is received.

If no idle worker is found, that means that there is no worker that's a candidate for eviction, because they all seem to be handling a request. If we evict a chain worker before it finishes handling the request and later on start another chain worker for the same chain while the request is still executing, a race condition is formed which could corrupt the chain state data. Therefore, if there's no eviction candidate the code would wait 250 milliseconds and try again repeatedly for up to three seconds before giving up with an error.

This whole eviction process may run concurrently between many requesters. This leads to possible unfairness, as there's no guarantee which one will succeed. It also introduces undesirable delays between retries.

Proposal

Use a Semaphore to establish a queue among the chain worker requests. This way there's certainty that once the permit is acquired, there will be at least one idle chain worker ready to be evicted.

In order to implement this change, a new ChainWorkers type was created to manage the cache. The ChainActorEndpoint also became a new type, in order to hold the semaphore permit.

Test Plan

CI should catch any regressions to the happy path, and #3052 should add a new test for the extreme load case.

Release Plan

  • Nothing to do, because this is a just a refactor that should follow the normal release cycle.

Links

@jvff jvff added the enhancement New feature or request label Dec 16, 2024
@jvff jvff added this to the Testnet #2 milestone Dec 16, 2024
@jvff jvff self-assigned this Dec 16, 2024
@jvff jvff force-pushed the remove-worker-cache-eviction-delays branch 2 times, most recently from 598e667 to 43fb237 Compare December 17, 2024 14:23
jvff added 9 commits December 17, 2024 18:31
Prepare to move the chain worker cache handling out of the `WorkerState`
type.
Simplify `WorkerState` by using the new type to handle how the chain
worker cache is evicted.
Create a separate `handle_request` method with the code to handle a
single request.
Prepare to change the eviction procedure.
Split the previous method to make it more readable.
Ensure that the synchronous `Mutex` is locked and used in a separate
synchronous method.
Replace the old alias with the same name with `ChainRequestSender`. The
goal is for the new type to also hold a permit to indicate it is using a
chain worker.
Ensure that attempts to use chain workers when the cache is full are
fair.
It's no longer needed, because eviction is now only performed once a
permit is held, guaranteeing that there's at least one chain worker
that's idle.
@jvff jvff force-pushed the remove-worker-cache-eviction-delays branch from 43fb237 to e5ffc04 Compare December 17, 2024 19:08
@jvff jvff changed the title [WIP] Await for a chain worker to be available for eviction Await for a chain worker to be available for eviction Dec 17, 2024
@jvff jvff marked this pull request as ready for review December 17, 2024 19:59
@jvff jvff requested review from afck, Twey, ma2bd and deuszx December 17, 2024 19:59
linera-core/src/chain_worker/actor.rs Show resolved Hide resolved
Comment on lines +59 to +60
cache: Arc::new(Mutex::new(LruCache::new(limit))),
active_endpoints: Arc::new(Semaphore::new(limit.get())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using the same limit here mean that if limit requests are in flight to a single chain worker, and there's another incoming request for a different worker, we will now block that new request even though a slot would be free?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It does seem like we're first trying to acquire a self.active_endpoints Semaphore even if there already exists a chain worker for the same chain_id.

Copy link
Contributor Author

@jvff jvff Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true. I thought about having a map of permits, so that requests that get queued anyway would reuse the existing permit. However, that may lead to starvation because a chain worker that is heavily used will never release its permit.

The solution I thought then was to have another semaphore for acquiring each permit, but I thought the permitception would be overengineering. I'm not sure how to proceed here, and would love some feedback and ideas.

I still think it's okay to improve (fix?) this in a follow-up PR because:

  • this impacts a (hopefully) edge case (lots of requests for a single chain)
  • worst outcome is degraded performance
  • with the benefit that it would be fairer in extreme scenarios (all chains getting many requests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an edge case to have several requests for the same chain worker. And every time we have that we underutilize at least one other chain worker.

But I don't have a better idea at the moment either.

let mut cache = self.cache.lock().unwrap();

if let Some(sender) = cache.get(&chain_id) {
Ok(ChainActorEndpoint::new(sender.clone(), permit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See my previous comment.) Would it be better to clone the existing permit here?

.iter()
.rev()
.find(|(_, candidate_endpoint)| candidate_endpoint.strong_count() <= 1)
.expect("`stop_one` should only be called while holding a permit for an endpoint");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a race condition where the permit is already freed but the reference count hasn't been set to 0 yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I was lucky that I declared the fields in ChainActorEndpoint in the right order, so that the channel sender is always dropped before the permit 😅

Copy link
Contributor

@deuszx deuszx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this PR affect the semantics of getting the chan worker endpoint? previously we'd try to find a chain worker (for eviction) and if none available we'd wait for 250ms and repeat that for 3 seconds (approx 12 times). Can you put that into the PR description?

@ma2bd
Copy link
Contributor

ma2bd commented Dec 18, 2024

Ok so since #3052 is pending, we don't know how much this helps, do we?

@jvff
Copy link
Contributor Author

jvff commented Dec 18, 2024

Ok so since #3052 is pending, we don't know how much this helps, do we?

I don't think we need to measure that now. The goal is to remove the delays and to make it fairer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants