Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Projected winrt::consume_ methods will nullptr crash if the underlying QueryInterface call fails #1442

Merged
merged 9 commits into from
Oct 30, 2024

Conversation

dmachaj
Copy link
Contributor

@dmachaj dmachaj commented Oct 28, 2024

Why is this change being made?

The generated projection code for interfaces that the metadata declares as required for a runtimeclass assume that QueryInterface never fails. Assuming the metadata is correct in the first place, this is a valid assumption for inproc calls.

However, for cross-process calls the QI can fail even if the metadata is correct and the class really does implement all of the required interfaces. It can fail with E_ACCESSDENIED and a variety of other RPC error codes.

When this happens there is a nullptr crash in the generated consume method. This can be very painful to debug because the HRESULT is lost by the time it crashes.

Briefly summarize what changed

This set of changes fixes the crash by detecting the QueryInterface error and throwing an exception when that occurs during one of these required casts. The try_as method was changed to capture COM error context when the QI fails. The code gen surrounding WINRT_IMPL_STUB was changed to save the result into a temporary variable and then pass it to a new check_cast_result method. check_cast_result is marked noinline so that the binary size impact of throwing exceptions is limited to a single function instead of inlining into high-volume generated code.

If the cast succeeded then nothing happens. If the cast failed, returning null, then the stored COM exception is retrieved. Assuming it is available the HRESULT is pulled out of it and it is thrown using check_result. This then propagates like any other exception. Callers are free to try and catch it or let it go uncaught and crash. Now they have the choice.

I also added a new file of test code that exercises this code path. The test_component IDL declares a runtimeclass that implements IStringable. And then the implementation fails to implement IStringable. When ToString is called on this object it hits the failure path. The cppwinrt code gen will not allow this to happen so I had to directly use winrt::implements.
 

How was this change tested?

Besides the new test cases I also wrote a little console app that crashes this way. I built and ran it using the latest stable cppwinrt as well as my private new one. As expected the debugger blame is far more useful with these changes.

IDL:

namespace CrashRepro
{
    // This class declares that it implements another interface but under the covers it actually does
    // not.  This allows us to test the behavior when QI's that should not fail, do fail.
    runtimeclass LiesAboutInheritance : Windows.Foundation.IStringable
    {
        void StubMethod();
    }
}

main.cpp:

#include <winrt/CrashRepro.h>

#include <wil/result_originate.h>

namespace
{
    struct LiesAboutInheritance : public winrt::implements<LiesAboutInheritance, winrt::CrashRepro::ILiesAboutInheritance>
    {
        LiesAboutInheritance() = default;
        void StubMethod() {}
    };
}

int main()
{
    winrt::init_apartment();

    auto lies = winrt::make_self<LiesAboutInheritance>().as<winrt::CrashRepro::LiesAboutInheritance>();
    FAIL_FAST_IF_NULL(lies);
    lies.StubMethod();

    try
    {
        lies.ToString();
    }
    CATCH_FAIL_FAST();
}

Before:
FAILURE_BUCKET_ID: ACCESS_VIOLATION_c0000005_CppwinrtQIFailureCrashRepro.exe!winrt::impl::consume_Windows_Foundation_IStringable_winrt::CrashRepro::LiesAboutInheritance_::ToString

After:
FAILURE_BUCKET_ID: STOWED_EXCEPTION_80004002_CppwinrtQIFailureCrashRepro.exe!winrt::check_cast_result_winrt::impl::abi_winrt::Windows::Foundation::IStringable,void_::type_*_

Note that the HRESULT ends up in the signature and the blame says that the cast result to IStringable is what failed.

Closes #1441

@dmachaj dmachaj changed the title Projected winrt::consume_ methods will nullptr crash if Bug: Projected winrt::consume_ methods will nullptr crash if the underlying QueryInterface call fails Oct 28, 2024
strings/base_meta.h Outdated Show resolved Hide resolved
@dmachaj
Copy link
Contributor Author

dmachaj commented Oct 28, 2024

As a binary size test case I built a private copy of WinUI3. The binary with cppwinrt usage is Microsoft.UI.Xaml.Controls.dll. The binary size for that actually went down (not up). 5,775,872 to 5,636,096 (-139,776 bytes aka -2.4%).

@kennykerr
Copy link
Collaborator

As a binary size test case I built a private copy of WinUI3. The binary with cppwinrt usage is Microsoft.UI.Xaml.Controls.dll. The binary size for that actually went down (not up). 5,775,872 to 5,636,096 (-139,776 bytes aka -2.4%).

Compared to what? The previous commit in the cppwinrt repo or some previous release?

@dmachaj
Copy link
Contributor Author

dmachaj commented Oct 28, 2024

As a binary size test case I built a private copy of WinUI3. The binary with cppwinrt usage is Microsoft.UI.Xaml.Controls.dll. The binary size for that actually went down (not up). 5,775,872 to 5,636,096 (-139,776 bytes aka -2.4%).

Compared to what? The previous commit in the cppwinrt repo or some previous release?

Compared to master. (Actually, not literally master. I had to deactivate the breaking changes from #1319 before I could compile XAML with a newer cppwinrt.). The only difference between the two binaries under comparison is the changes in this PR.

@kennykerr
Copy link
Collaborator

kennykerr commented Oct 29, 2024

Technically the FH4 work that Raymond alluded to is about folding identical catch handlers rather than call sites. A reduction here for Microsoft.UI.Xaml.Controls.dll would seem to imply that code that was previously inlined is no longer.

… the temporary alive until the semicolon. Remove that macro from where I need to call the check method to fix this crash
@dmachaj
Copy link
Contributor Author

dmachaj commented Oct 29, 2024

With the latest iteration (with the fix to keep the lifetime valid on the casted-to object) the binary size difference is close to zero. With the changes in this PR the XAML controls DLL is 10KB smaller than before (well below 1% diff).

@jonwis
Copy link
Member

jonwis commented Oct 29, 2024

Technically the FH4 work that Raymond alluded to is about folding identical catch handlers rather than call sites. A reduction here for Microsoft.UI.Xaml.Controls.dll would seem to imply that code that was previous inlined is no longer.

In prior investigations, I found there are types that have pathological inline behavior. Types like ValueSet, for instance, don't have any methods of their own - it's a flavored IMap<>. Someone accidentally doing the obvious thing:

ValueSet v{};
v.Insert(L"kittens", winrt::box_value(7u));
v.Insert(L"puppies", winrt::box_value(2.0f));
v.Insert(L"muffins", winrt::box_value(L"blueberry"));

... ended up with three inlined QueryInterface calls from IValueSet -> IMap<String, Object> in order to call IMap<>::Insert. It's not clear why the optimizer/inliner chose to splat the sequence of "load vtable, load queryinterface, push address of IMap<>, call" instead of leaving that as "invoke consume_ValueSet_IMapInsert" method calls, but it was very noticeable in the sizebench builds (building the same way as our release branch build builds.)

Any way to either merge those three QI sequences (tear off interface caching?) or at least not inline them (so the QI + call Insert is an invocable thing) would produce really great size enhancements. The code's already doing a bunch of "call" anyhow, we aren't saving much by inlining the QI+Call into the parent frame.

(edited to add)

I guess we're saving something by not faulting in more code to run, as the QI+Call are inlined directly. And we're adding some amount of exception handling overhead on a per-frame basis that might not have been there. But we're taking away some amount of temporary management (that QI comes paired with a Release of the temporary...) from the parent frame.

Overall, reducing the inlining seems like a win.

@jonwis
Copy link
Member

jonwis commented Oct 29, 2024

As a binary size test case I built a private copy of WinUI3. The binary with cppwinrt usage is Microsoft.UI.Xaml.Controls.dll. The binary size for that actually went down (not up). 5,775,872 to 5,636,096 (-139,776 bytes aka -2.4%).

Make sure to also run the validation with the "real build line" - things like PGO can be super duper smart and decide to either inline or fold based on "other data." I tried fixing winrt::hstring to be less inlined only to find that the PGO optimizer was going to already de-inline all the calls on its own.

@kennykerr
Copy link
Collaborator

Before I worked on FH4, I worked with MSVC on an optimization to treat certain calls to QueryInterface as pure functions. This worked well for solving such problems but we could never get the optimizer to hoist these calls reliably. Eventually I started working on the fast abi, which achieved the same effect but sadly was never adopted.

Ultimately, types like ValueSet are very poorly designed and should not be used. In the early days of cppwinrt I had diagnostics that would print out debug warnings at runtime for such pathological cases but it was deemed too judgy. 😉

cppwinrt/code_writers.h Outdated Show resolved Hide resolved
@dmachaj dmachaj force-pushed the user/dmachaj/failfast-bad-qi2 branch from 903190b to 46f64de Compare October 29, 2024 21:36
@kennykerr
Copy link
Collaborator

image

🤣🤣 vcxproj files are so awful.

@dmachaj
Copy link
Contributor Author

dmachaj commented Oct 30, 2024

With the latest iteration of code Microsoft.UI.Xaml.Controls.dll is 6,634,560 bytes (-141,312bytes, -2.4%, compared to baseline). That is back to where it was with the failfast approach.

@dmachaj dmachaj merged commit b82340f into master Oct 30, 2024
75 checks passed
@dmachaj dmachaj deleted the user/dmachaj/failfast-bad-qi2 branch October 30, 2024 16:54
@@ -546,6 +546,28 @@ namespace winrt::impl
}
return result;
}

template <typename T>
WINRT_IMPL_NOINLINE void check_cast_result(T* from WINRT_IMPL_SOURCE_LOCATION_ARGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be a template. Can just take a void*. This allows the different instantiations to be folded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. If I spin another PR I'll make that change.

@@ -546,6 +546,28 @@ namespace winrt::impl
}
return result;
}

template <typename T>
WINRT_IMPL_NOINLINE void check_cast_result(T* from WINRT_IMPL_SOURCE_LOCATION_ARGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have to be WINRT_IMPL_NOINLINE inline to avoid ODR problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of ODR problems? The only variance based on compiler options is the SOURCE_LOCATION macro and that already has global ODR guards.

The noinline is important because it is how the code bloat is avoided by having lots of exception throwing locations spewed out into containing code.

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if code gen would be better as

WINRT_IMPL_NOINLINE [[noreturn]] void throw_failed_cast(hresult code WINRT_IMPL_SOURCE_LOCATION_ARGS)
{
    throw hresult_error(code, take_ownership_from_abi WINRT_IMPL_SOURCE_LOCATION_FORWARD);
}

inline void check_cast_result(hresult code WINRT_IMPL_SOURCE_LOCATION_ARGS)
{
    if (code < 0) throw_failed_cast(code WINRT_IMPL_SOURCE_LOCATION_FORWARD);
}

where we have a new try_as_reason that returns both a pointer and an hresult.

void* result{};
hresult code = ptr->QueryInterface(guid_of<To>(), &result);
return { wrap_as_result<To>(result), code };

dmachaj added a commit that referenced this pull request Nov 12, 2024
…hen the type is already a match (#1448)

Why is this change being made?
I have been doing some local builds of an internal component against the latest cppwinrt.exe, which includes the fixes in #1442. I have noticed some modest (1%) binary size grown when comparing the April 2024 release and the latest. Using SizeBench to analyze binary size differences I determined that the inlined code gen seems to explain the difference. This change eliminates that increase, at least for the binary I'm testing against, and even shrinks it a bit further.

Briefly summarize what changed
The various winrt::impl::consume_THING methods get heavily inlined in release builds. The current code gen has some casts for when the types don't match (and is an AddRef/Release when they do match). The small amount of new cast result checking ends up at many call sites and slowly adds up a bit. I think we can do better in the cases where the types already match.

This seems to have proven true in practice. Using if constexpr to determine if the type is a match allows us to skip any casts when this is true. In fact that is a net improvement over the original baseline because we don't need to AddRef/Release either. We can directly call the appropriate method. When a cast is necessary the code gen is identical to the previous baseline. The QueryInterface call is unavoidable and so is checking the result.

The code writer format string is definitely starting to creak under the weight of many arguments. I don't want to refactor that as part of this PR but it would be good to simplify in the future.

Also included is late feedback from the previous PR from @oldnewthing. The check_cast_result method can take void* instead of a template argument because it only null checks it. In my local measurements the optimizer already folded them so it makes no binary size difference but it is still a nice change to take.

How was this change tested?
Ran the full suite of cppwinrt tests locally. I also debugged into Windows.Foundation.IStringable.ToString for cases where it both is and is not the correct type. They run the expected code paths. I also checked the binary size of a large example to confirm the expected reduction. The disassembly for a Release build similarly matches expectations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Projected winrt::consume_ methods will nullptr crash if the underlying QueryInterface call fails
6 participants