Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the code size of generated consume methods by skipping casts when the type is already a match #1448

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

dmachaj
Copy link
Contributor

@dmachaj dmachaj commented Nov 11, 2024

Why is this change being made?

I have been doing some local builds of an internal component against the latest cppwinrt.exe, which includes the fixes in #1442. I have noticed some modest (1%) binary size grown when comparing the April 2024 release and the latest. Using SizeBench to analyze binary size differences I determined that the inlined code gen seems to explain the difference. This change eliminates that increase, at least for the binary I'm testing against, and even shrinks it a bit further.

Briefly summarize what changed

The various winrt::impl::consume_THING methods get heavily inlined in release builds. The current code gen has some casts for when the types don't match (and is an AddRef/Release when they do match). The small amount of new cast result checking ends up at many call sites and slowly adds up a bit. I think we can do better in the cases where the types already match.

This seems to have proven true in practice. Using if constexpr to determine if the type is a match allows us to skip any casts when this is true. In fact that is a net improvement over the original baseline because we don't need to AddRef/Release either. We can directly call the appropriate method. When a cast is necessary the code gen is identical to the previous baseline. The QueryInterface call is unavoidable and so is checking the result.

The code writer format string is definitely starting to creak under the weight of many arguments. I don't want to refactor that as part of this PR but it would be good to simplify in the future.

Also included is late feedback from the previous PR from @oldnewthing. The check_cast_result method can take void* instead of a template argument because it only null checks it. In my local measurements the optimizer already folded them so it makes no binary size difference but it is still a nice change to take.

How was this change tested?

Ran the full suite of cppwinrt tests locally. I also debugged into Windows.Foundation.IStringable.ToString for cases where it both is and is not the correct type. They run the expected code paths. I also checked the binary size of a large example to confirm the expected reduction. The disassembly for a Release build similarly matches expectations.

Here is what the new IStringable consume method looks like:

    template <typename D> auto consume_Windows_Foundation_IStringable<D>::ToString() const
    {
        void* value{};
        if constexpr (!std::is_same_v<D, winrt::Windows::Foundation::IStringable>)
        {
            auto const& castedResult = static_cast<winrt::Windows::Foundation::IStringable const&>(static_cast<D const&>(*this));
            auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)&castedResult;
            check_cast_result(abiType);
            check_hresult(abiType->ToString(&value));
        }
        else
        {
            auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)this;
            check_hresult(abiType->ToString(&value));
        }
        return hstring{ value, take_ownership_from_abi };
    }

@jonwis
Copy link
Member

jonwis commented Nov 11, 2024

Neat! Consider folding/reducing the callsites even further like the static method functionality does, so you get this (caution: PR-editbox-code, may not compile...):

template <typename I, typename D, typename S, typename Q>
void call_member(S self, Q&& q)
{
    if constexpr (!std::is_same_v<D, I>)
    {
        auto const& castedResult = static_cast<winrt::Windows::Foundation::IStringable const&>(static_cast<D const&>(*self));
        auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)&castedResult;
        check_cast_result(abiType);
        check_hresult(q(abiType));
    }
    else
    {
        auto const abiType = *(abi_t<winrt::Windows::Foundation::IStringable>**)self;
        check_hresult(q(abiType));
    }
}

template <typename D>
auto consume_Windows_Foundation_IStringable<D>::ToString() const
{
    void* value{};
    call_member<winrt::Windows::Foundation::IStringable, typename D>(this, [&](auto a) { return a->ToString(&value); });
    return hstring{value, take_ownership_from_abi};
}

The resulting codegen should basically be equivalent, but it might reduce time in the compiler itself...

@sylveon
Copy link
Contributor

sylveon commented Nov 11, 2024

Bonus side effect it makes the code writer a lot less heavy.

@dmachaj
Copy link
Contributor Author

dmachaj commented Nov 11, 2024

Bonus side effect it makes the code writer a lot less heavy.

Do you mean the current PR makes it less heavy? Or Jon's suggestion does?

@sylveon
Copy link
Contributor

sylveon commented Nov 11, 2024

The suggestion, since all that baggage is moved to a method that's outside the code writer.

@jonwis
Copy link
Member

jonwis commented Nov 11, 2024

I think I'm OK with just taking the change as is for now. @sylveon is right and we should take a close look generally at the compiler throughput. I saw some serious drag related to the winrt::impl::combine method for things like guids that seemed to be just "splat this constant"-able instead of combining with some cleverness.

@jonwis
Copy link
Member

jonwis commented Nov 11, 2024

Ultimately what I want (see the #1123 forever-issue) is some "auto fast abi" that generates a fastabi-like jacket around C++/WinRT types that aren't fastabi, to further reduce the "QI from IFoo -> IFoo3" noise with hugely-deep type hierarchies (looking at you, PropertySet and WinUI3 Button...)

@dmachaj
Copy link
Contributor Author

dmachaj commented Nov 11, 2024

I'd like to save that feedback for a future commit. I don't understand the compiler throughput tradeoffs well enough to do a good job validating that. What I have here is already validated and comes with a nice ~1% binary size reduction.

I think that any concerted effort on lower compiler overhead should focus on the end-to-end runtime so that the focus can go to wherever the costs are highest. The consume_ methods may or may not be major contributors to that. I haven't done any measurements to know.

@dmachaj dmachaj merged commit d296af6 into master Nov 12, 2024
75 checks passed
@dmachaj dmachaj deleted the user/dmachaj/consume_size_reduction branch November 12, 2024 00:09
@oldnewthing
Copy link
Member

oldnewthing commented Nov 12, 2024

Code comment says

// The `noexcept` versions will crash if check_cast_result throws but that is no different than previous...

but there is no check_cast_result any more.

Oh wait, there it is. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants