Skip to content

test: remove the manual declarations of TestThread and figure it out from the platform.#18557

Merged
snowp merged 25 commits intoenvoyproxy:mainfrom
jmarantz:auto-test-thread
Nov 1, 2021
Merged

test: remove the manual declarations of TestThread and figure it out from the platform.#18557
snowp merged 25 commits intoenvoyproxy:mainfrom
jmarantz:auto-test-thread

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Oct 11, 2021

Commit Message: Earlier attempts to integrate a requirement to manually declare TestThread made it too hard to integrate with other repos that depend on Envoy, but not on all its test infrastructure.

Instead, this PR uses platform-specific mechanisms to determine whether the current thread is the 'test' thread (ie the thread in main()). This makes it unnecessary to declare TestThread anywhere. However, the TestThread empty struct is left in the codebase for now to avoid adding more conflicts with other repos. It can be removed in a month or so.

A new RAII construct SkipAsserts is added as an escape for scenarios where we are integrating Envoy code in a context where the thread-checking is not relevant.

Additional Description:
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz marked this pull request as draft October 11, 2021 15:55
@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 11, 2021

Elisha -- FYI this should do the job. I am waiting for CI to tell me that Windows and Mac tests still work.

eziskind
eziskind previously approved these changes Oct 11, 2021
Copy link
Copy Markdown
Contributor

@eziskind eziskind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmarantz
Copy link
Copy Markdown
Contributor Author

This iteration fails Windows CI because return true doesn't quite work:

2021-10-11T17:15:28.8042309Z [ RUN      ] ThreadLocalInstanceImplDispatcherTest.Dispatcher
2021-10-11T17:15:28.8042995Z test/common/thread_local/thread_local_impl_test.cc(304): error: Value of: Thread::MainThread::isMainOrTestThread()
2021-10-11T17:15:28.8046776Z   Actual: true
2021-10-11T17:15:28.8047260Z Expected: false
2021-10-11T17:15:28.8047891Z Stack trace:
2021-10-11T17:15:28.8048370Z   00007FF680D0068C: (unknown)
2021-10-11T17:15:28.8048858Z   00007FFCA0C77974: BaseThreadInitThunk
2021-10-11T17:15:28.8049389Z   00007FFCA8E2A2F1: RtlUserThreadStart
2021-10-11T17:15:28.8049726Z 

Looking at mitigations.

…ting in some contexts.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…c went down.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz marked this pull request as ready for review October 13, 2021 01:54
@jmarantz
Copy link
Copy Markdown
Contributor Author

/assign-from @envoyproxy/first-pass-reviewers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/first-pass-reviewers assignee is @RyanTheOptimist

🐱

Caused by: a #18557 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
#elif defined(__APPLE__)
return pthread_main_np() != 0;
#else
return true;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davinci26 @wrowe is there an approach that would work in Windows?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know on top of my head, I am looking into it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this change correctly, it seems like on Windows this change would mean that isMainOrTestThread() would always return true. Presumably that is ok? Would that not cause problems?

Are linux and APPLE the only two non-Windows platforms we build on? If so, those would fall into the "return true" block. I wonder if it might be wise to something like:

#elif defined(WIN32)
    return true;
#else
    static_assert(false, "unknown platform");
#endif

In case we end up with such a platform down the road. (Fuscia or FreeBSD or something)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it would cause problems if we did an ASSERT(!isMainTestOrThread()), so I've gone with a strategy more like what you suggested.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows might be tricky to support due to the way the threading model works. Can you open an issue and I will pick it up at some point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #18669 -- this is not a blocker for this PR though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, Windows benefit implicitly from this change just by having better test coverage on Linux. I want to provide parity as external filter writers might need this feature when they test their filters on windows

@jmarantz jmarantz marked this pull request as draft October 15, 2021 12:56
@jmarantz
Copy link
Copy Markdown
Contributor Author

I'm reconsidering this approach -- I think we should have the checks all be macro-based so they can be a no-op on unsupported platforms.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… touched by this PR

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…s where we cannot determine whether we are in the test thread. That way we get compile errors rather than assertions -- faster to fix.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@snowp
Copy link
Copy Markdown
Contributor

snowp commented Oct 21, 2021

I tested this out with some internal code we have that has been fighting against the thread checks: it calls into code that reuses Envoy code from an arbitrary thread, and at times this will hit code paths that try to verify that code is done on main/test thread (though this check is not very meaningful in this custom context).

We've been working around it for now by marking the context as a TestThread (very hacky, but works), and with this PR it seems like there is no way around the thread check. Would it be a lot of work to keep TestThread working the way it does, or provide some alternative? One alternative I could think of would be an RAII handle that would allow disabling thread checks for some time.

@mattklein123

@mattklein123
Copy link
Copy Markdown
Member

I'm going to assign this over to @snowp to make sure this ends up working for our internal use case as well.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 21, 2021

Ah glad I got some feedback on this before pushing forward.

The problem with the prior state where TestThread was declared by the test infrastructure is that we had a ton of tests that included Envoy code, but not its test infrastructure. So the code that declared the TestThread from test_runner.cc did not get called.

But we could OR that all together and let you still declare a manual TestThread if you want to. Do you need to declare multiple distinct TestThread objects at the same time? Or do you just instantiate TestThread around a call that might do some checking?

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Oct 22, 2021

The way it works right now is that we just wrap offending code with TestThread, which seems to do the trick. We haven't had an issue with having to declare it multiple times, but I can imagine it being a thing down the line.

I think ideally we'd have something like a IgnoreThreadChecks handle that lets us do what we do with TestThread today, but without issues of there being multiple threads needing to bypass thread checks. I think with that we could have the real test thread be inferred via the logic in this PR, while also giving us an easy way for us to bypass these thread checks - kinda like a "i know what I'm doing, let me skip these checks".

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 22, 2021

IgnoreThreadChecks as an RAII context seems feasible to me. I'm curious what scenarios though you are calling something that is asserting to be on a thread where you want to disable that.

I think to implement this, only the ASSERT_IS_TEST_THREAD type assertions defined in thread.h should be affected, right? E.g. you might have ASSERT(MainThread::isTestThread()) and you want that to be ignored, but you might also have ASSERT(!MainThread::IsTestThread()) that you'd also want to skip, and it'd be hard to do that from inside the impl if isMainThread.

So is it OK to have the IgnoreThreadChecks only on the new ASSERT_ macros defined here?

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Oct 22, 2021

The first cases of this we've hit is trying to convert YAML to protobuf: ValueUtil::loadFromYaml is called within that call stack, which uses TRY_ASSERT_MAIN_THREAD. This may have been broken prior to the change to introduce the TestThread RAII handle, it's a somewhat recent feature internally.

Our use case is related to envoy-mobile: we're having the platform call into native code to initialize a subsystem, which attempts to parse YAML configuration that is provided by the function. This happens independently of EM startup, so the code may run on any thread. A bunch of the thread assertions in Envoy rely on the delineation between the main thread and worker threads, but this falls apart in envoy-mobile contexts where there are a lot of other threads in play, so we're hoping to just be able to bypass these checks when running outside of the EM event loops.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 23, 2021

Is it possible for us to engineer ValueUtil::loadFromYaml that doesn't throw exceptions on failure? Or is that embedded in an impl we can't control? If that's possible, we could in the interim switch from TRY_ASSERT_MAIN_THREAD to TRY_NEEDS_AUDIT.

So, if we are committed to the goal of not having exceptions caught on the worker thread, a better way to do that is to leave a trail of harder-to-resolve issues to tackle later (TRY_NEEDS_AUDIT) than to just disable the check.

If that strategy works for you I can leave this PR as is.

If we'd rather not go in that direction then an IgnoreThreadChecks RAII object could work, as we would check for the 'disabled' bit in the ASSERT_IS_TEST_THREAD family of macros, which are what's being called from TRY_ASSERT_MAIN_THREAD.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

For the sake of argument I added the hook to override the threading macros, though I'm not totally convinced they are the right thing.

But in the spirit of not getting bogged down I've added it; happy to remove it by reverting 4f42287.

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Oct 25, 2021

It wouldn't be hard to change the serialization functions to not assert a main thread, but it gives raise to what appears to be a conflict of interest: within the Envoy code base we want to ensure that exceptions are only used on the main thread, but code outside of the repo may want to call it without thread checks.

One option would be to reimplement this function (and any other function that this might apply to) outside of Envoy main. From our perspective it's a bit unfortunate to not be able to reuse code that already does exactly what we want, but I guess it depends on how intrusive it would be to support overriding the checks?

@mattklein123 Do you have any thoughts here?

@lizan lizan removed their assignment Oct 25, 2021
@mattklein123
Copy link
Copy Markdown
Member

@mattklein123 Do you have any thoughts here?

It's seems reasonable to me to allow the default implementations to be overriden in a custom main?

@jmarantz
Copy link
Copy Markdown
Contributor Author

OK, makes sense to me. In that case does this patch look like it works for you? If so I can add some more comments and address other issues you might have.

snowp
snowp previously approved these changes Oct 27, 2021
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for iterating!

@snowp
Copy link
Copy Markdown
Contributor

snowp commented Oct 28, 2021

Can you merge main and we'll get this in?

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

merged. Also filled in some comments dismissing your approval :)

@jmarantz
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18557 (comment) was created by @jmarantz.

see: more, trace.

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@snowp snowp merged commit aac113e into envoyproxy:main Nov 1, 2021
@jmarantz jmarantz deleted the auto-test-thread branch November 1, 2021 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants