Skip to content

runtime: making runtime accessible from non-worker threads#7695

Merged
alyssawilk merged 5 commits intoenvoyproxy:masterfrom
alyssawilk:runtime_other_threads
Jul 25, 2019
Merged

runtime: making runtime accessible from non-worker threads#7695
alyssawilk merged 5 commits intoenvoyproxy:masterfrom
alyssawilk:runtime_other_threads

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk commented Jul 23, 2019

Making runtime accessible for non-worker threads, and using the new accessor for runtime features.

This allows the work done in #7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well.

Risk Level: High (affects runtime access for all runtime features)
Testing: new unit tests, integration tests use in #7601
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, some initial thoughts.

/wait

absl::MutexLock lock(&snapshot_mutex_);
snapshots_.erase(std::this_thread::get_id());
auto snapshot_it =
snapshots_.emplace(std::this_thread::get_id(), new SnapshotState{createNewSnapshot(), true})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is thread safe as we expect createNewSnapshot() to run on the main thread. Here is how I would recommend implementing this:

  • On the main thread, create/store a shared pointer to a snapshot that is to be used for non-registered threads. This can be done during the standard load call above, when you set valid to false in the map.
  • In the code in this function, instead of creating a new snapshot, just grab the existing shared pointer and store it in the map under lock.
  • Then I think the rest of the logic will remain the same.

WDYT? Edit: See above comment for a potentially simpler implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the problem of createNewSnapshot being called on the main thread. There's one lock, I don't think we deadlock, and as long as non-worker thread calls snapshot(), and latches it beyond the lifetime of the next call to snapshot() I think it's fine but maybe you can clue me in on slack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is createNewSnapshot() potentially reads from the disk, synchronizes with RTDS, etc. There can be concurrent access to all of these things, which is why today we always create the snapshot on the main thread and then fan it out using TLS. ]

IMO we can simply store the shared snapshot as something that is updated under write lock and read under read lock, and then offer a version of snapshot() that returns the shared pointer. This should be simple, will not change the current worker performance in any way, and will work fine for non-worker threads. If perf is an issue later for non-worker threads I think we can offer a better solution in which non-worker threads register for updates by providing some type of Postable interface?

// be accessing them. Instead mark them invalid and clean up during the next in-thread call to
// snapshot()
{
absl::ReaderMutexLock lock(&snapshot_mutex_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be the write lock?

Also, the way this works as written is that if there are a lot of threads that are coming and going, we will potentially leak the map. This might be OK, but can you comment this for now?

As an aside, it would be better if snapshot() just returned the shared_ptr and then I think we could avoid all of this valid stuff and even the map of thread IDs. Thinking about it more, how about just creating a snapshot() variant that returns the shared pointer and using that in the static runtime lookup functions? If we have that, I believe all of this can be simplified a huge amount by just grabbing that shared pointer under read lock, and then writing the shared pointer under write lock in this function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, should be a write lock.
Happy to comment regarding lifetime. I had an iteration which on invalidation would simply do a 5s delay delete, but was uncomfortable about lifetime not being fixed.

I started in on snapshot as shared pointer, and the thing I don't like about that is that the TLS snapshot isn't a shared pointer, so runtime feature lookup would always return the shared pointer snapshot not the TLS snapshot, which could arguably have different values during update. I don't think there's anything unsafe about it, but it does mean we're doing more shared pointer access and locks, and runtime thrashing makes me twitchy. Any thoughts on how to avoid that ugliness or you think it's the way to go?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TLS snapshot isn't a shared pointer

The TLS snapshot is actually a shared pointer so I don't think anything needs to change in that path (just use the shared_ptr version of the TLS lookup function if it's registered). Let's sync up on slack?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks great. Just a few small comments.

/wait

* @return const shared_ptr<Snapshot> the current snapshot. This function may safely be called
* from non-worker theads.
*/
virtual const std::shared_ptr<Snapshot> threadsafeSnapshot() PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want std::shared_ptr<const Snapshot> ?

@@ -552,12 +553,29 @@ void RtdsSubscription::validateUpdateSize(uint32_t num_resources) {

void LoaderImpl::loadNewSnapshot() {
ThreadLocal::ThreadLocalObjectSharedPtr ptr = createNewSnapshot();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use a local var of snapshot shared pointer type, and you can void the dynamic cast below.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@alyssawilk alyssawilk merged commit e61681d into envoyproxy:master Jul 25, 2019
mattklein123 pushed a commit that referenced this pull request Aug 13, 2019
Making runtime accessible for non-worker threads, and using the new accessor for runtime features.

This allows the work done in #7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well.

Risk Level: High (affects runtime access for all runtime features)
Testing: new unit tests, integration tests use in #7601
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
LukeShu pushed a commit to gravitee-io/envoy that referenced this pull request Aug 26, 2019
…y#7695) (#27)

Making runtime accessible for non-worker threads, and using the new accessor for runtime features.

This allows the work done in envoyproxy#7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well.

Risk Level: High (affects runtime access for all runtime features)
Testing: new unit tests, integration tests use in envoyproxy#7601
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
LSChyi pushed a commit to LSChyi/envoy that referenced this pull request Sep 24, 2019
…y#7695) (envoyproxy#27)

Making runtime accessible for non-worker threads, and using the new accessor for runtime features.

This allows the work done in envoyproxy#7601, moving the strict HTTP checks out of the HCM and into the codec, where the integration tests use them from client/server threads, and other downstream Envoys might use them from non-worker threads as well.

Risk Level: High (affects runtime access for all runtime features)
Testing: new unit tests, integration tests use in envoyproxy#7601
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: LSChyi <alan81920@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants