runtime: making runtime accessible from non-worker threads#7695
runtime: making runtime accessible from non-worker threads#7695alyssawilk merged 5 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this, some initial thoughts.
/wait
| absl::MutexLock lock(&snapshot_mutex_); | ||
| snapshots_.erase(std::this_thread::get_id()); | ||
| auto snapshot_it = | ||
| snapshots_.emplace(std::this_thread::get_id(), new SnapshotState{createNewSnapshot(), true}) |
There was a problem hiding this comment.
I don't think this is thread safe as we expect createNewSnapshot() to run on the main thread. Here is how I would recommend implementing this:
- On the main thread, create/store a shared pointer to a snapshot that is to be used for non-registered threads. This can be done during the standard load call above, when you set valid to false in the map.
- In the code in this function, instead of creating a new snapshot, just grab the existing shared pointer and store it in the map under lock.
- Then I think the rest of the logic will remain the same.
WDYT? Edit: See above comment for a potentially simpler implementation.
There was a problem hiding this comment.
I'm not sure I understand the problem of createNewSnapshot being called on the main thread. There's one lock, I don't think we deadlock, and as long as non-worker thread calls snapshot(), and latches it beyond the lifetime of the next call to snapshot() I think it's fine but maybe you can clue me in on slack
There was a problem hiding this comment.
The issue is createNewSnapshot() potentially reads from the disk, synchronizes with RTDS, etc. There can be concurrent access to all of these things, which is why today we always create the snapshot on the main thread and then fan it out using TLS. ]
IMO we can simply store the shared snapshot as something that is updated under write lock and read under read lock, and then offer a version of snapshot() that returns the shared pointer. This should be simple, will not change the current worker performance in any way, and will work fine for non-worker threads. If perf is an issue later for non-worker threads I think we can offer a better solution in which non-worker threads register for updates by providing some type of Postable interface?
| // be accessing them. Instead mark them invalid and clean up during the next in-thread call to | ||
| // snapshot() | ||
| { | ||
| absl::ReaderMutexLock lock(&snapshot_mutex_); |
There was a problem hiding this comment.
I think this needs to be the write lock?
Also, the way this works as written is that if there are a lot of threads that are coming and going, we will potentially leak the map. This might be OK, but can you comment this for now?
As an aside, it would be better if snapshot() just returned the shared_ptr and then I think we could avoid all of this valid stuff and even the map of thread IDs. Thinking about it more, how about just creating a snapshot() variant that returns the shared pointer and using that in the static runtime lookup functions? If we have that, I believe all of this can be simplified a huge amount by just grabbing that shared pointer under read lock, and then writing the shared pointer under write lock in this function?
There was a problem hiding this comment.
yeah, should be a write lock.
Happy to comment regarding lifetime. I had an iteration which on invalidation would simply do a 5s delay delete, but was uncomfortable about lifetime not being fixed.
I started in on snapshot as shared pointer, and the thing I don't like about that is that the TLS snapshot isn't a shared pointer, so runtime feature lookup would always return the shared pointer snapshot not the TLS snapshot, which could arguably have different values during update. I don't think there's anything unsafe about it, but it does mean we're doing more shared pointer access and locks, and runtime thrashing makes me twitchy. Any thoughts on how to avoid that ugliness or you think it's the way to go?
There was a problem hiding this comment.
the TLS snapshot isn't a shared pointer
The TLS snapshot is actually a shared pointer so I don't think anything needs to change in that path (just use the shared_ptr version of the TLS lookup function if it's registered). Let's sync up on slack?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, looks great. Just a few small comments.
/wait
include/envoy/runtime/runtime.h
Outdated
| * @return const shared_ptr<Snapshot> the current snapshot. This function may safely be called | ||
| * from non-worker theads. | ||
| */ | ||
| virtual const std::shared_ptr<Snapshot> threadsafeSnapshot() PURE; |
There was a problem hiding this comment.
I think you want std::shared_ptr<const Snapshot> ?
| @@ -552,12 +553,29 @@ void RtdsSubscription::validateUpdateSize(uint32_t num_resources) { | |||
|
|
|||
| void LoaderImpl::loadNewSnapshot() { | |||
| ThreadLocal::ThreadLocalObjectSharedPtr ptr = createNewSnapshot(); | |||
There was a problem hiding this comment.
nit: just use a local var of snapshot shared pointer type, and you can void the dynamic cast below.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Making runtime accessible for non-worker threads, and using the new accessor for runtime features. This allows the work done in #7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well. Risk Level: High (affects runtime access for all runtime features) Testing: new unit tests, integration tests use in #7601 Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
…y#7695) (#27) Making runtime accessible for non-worker threads, and using the new accessor for runtime features. This allows the work done in envoyproxy#7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well. Risk Level: High (affects runtime access for all runtime features) Testing: new unit tests, integration tests use in envoyproxy#7601 Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Signed-off-by: Piotr Sikora <piotrsikora@google.com>
…y#7695) (envoyproxy#27) Making runtime accessible for non-worker threads, and using the new accessor for runtime features. This allows the work done in envoyproxy#7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well. Risk Level: High (affects runtime access for all runtime features) Testing: new unit tests, integration tests use in envoyproxy#7601 Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Signed-off-by: Piotr Sikora <piotrsikora@google.com> Signed-off-by: LSChyi <alan81920@gmail.com>
Making runtime accessible for non-worker threads, and using the new accessor for runtime features.
This allows the work done in #7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well.
Risk Level: High (affects runtime access for all runtime features)
Testing: new unit tests, integration tests use in #7601
Docs Changes: n/a
Release Notes: n/a