-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the code size of generated consume methods by skipping casts when the type is already a match #1448
Conversation
…hen the type is already a match
Neat! Consider folding/reducing the callsites even further like the static method functionality does, so you get this (caution: PR-editbox-code, may not compile...): template <typename I, typename D, typename S, typename Q>
void call_member(S self, Q&& q)
{
if constexpr (!std::is_same_v<D, I>)
{
auto const& castedResult = static_cast<winrt::Windows::Foundation::IStringable const&>(static_cast<D const&>(*self));
auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)&castedResult;
check_cast_result(abiType);
check_hresult(q(abiType));
}
else
{
auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)self;
check_hresult(q(abiType));
}
}
template <typename D>
auto consume_Windows_Foundation_IStringable<D>::ToString() const
{
void* value{};
call_member<winrt::Windows::Foundation::IStringable, typename D>(this, [&](auto a) { return a->ToString(&value); });
return hstring{value, take_ownership_from_abi};
} The resulting codegen should basically be equivalent, but it might reduce time in the compiler itself... |
Bonus side effect it makes the code writer a lot less heavy. |
Do you mean the current PR makes it less heavy? Or Jon's suggestion does? |
The suggestion, since all that baggage is moved to a method that's outside the code writer. |
I think I'm OK with just taking the change as is for now. @sylveon is right and we should take a close look generally at the compiler throughput. I saw some serious drag related to the |
Ultimately what I want (see the #1123 forever-issue) is some "auto fast abi" that generates a fastabi-like jacket around C++/WinRT types that aren't fastabi, to further reduce the "QI from IFoo -> IFoo3" noise with hugely-deep type hierarchies (looking at you, PropertySet and WinUI3 Button...) |
I'd like to save that feedback for a future commit. I don't understand the compiler throughput tradeoffs well enough to do a good job validating that. What I have here is already validated and comes with a nice ~1% binary size reduction. I think that any concerted effort on lower compiler overhead should focus on the end-to-end runtime so that the focus can go to wherever the costs are highest. The consume_ methods may or may not be major contributors to that. I haven't done any measurements to know. |
Code comment says // The `noexcept` versions will crash if check_cast_result throws but that is no different than previous... but there is no Oh wait, there it is. Nevermind. |
Why is this change being made?
I have been doing some local builds of an internal component against the latest cppwinrt.exe, which includes the fixes in #1442. I have noticed some modest (1%) binary size grown when comparing the April 2024 release and the latest. Using SizeBench to analyze binary size differences I determined that the inlined code gen seems to explain the difference. This change eliminates that increase, at least for the binary I'm testing against, and even shrinks it a bit further.
Briefly summarize what changed
The various
winrt::impl::consume_THING
methods get heavily inlined in release builds. The current code gen has some casts for when the types don't match (and is an AddRef/Release when they do match). The small amount of new cast result checking ends up at many call sites and slowly adds up a bit. I think we can do better in the cases where the types already match.This seems to have proven true in practice. Using
if constexpr
to determine if the type is a match allows us to skip any casts when this is true. In fact that is a net improvement over the original baseline because we don't need to AddRef/Release either. We can directly call the appropriate method. When a cast is necessary the code gen is identical to the previous baseline. The QueryInterface call is unavoidable and so is checking the result.The code writer format string is definitely starting to creak under the weight of many arguments. I don't want to refactor that as part of this PR but it would be good to simplify in the future.
Also included is late feedback from the previous PR from @oldnewthing. The
check_cast_result
method can takevoid*
instead of a template argument because it only null checks it. In my local measurements the optimizer already folded them so it makes no binary size difference but it is still a nice change to take.How was this change tested?
Ran the full suite of cppwinrt tests locally. I also debugged into Windows.Foundation.IStringable.ToString for cases where it both is and is not the correct type. They run the expected code paths. I also checked the binary size of a large example to confirm the expected reduction. The disassembly for a Release build similarly matches expectations.
Here is what the new IStringable consume method looks like: