-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict_not_null performance issues with reference types #930
Comments
So just to be clear, the goal would be to make this work:
But because it's
Correct? Here's a question: given CTAD, what exactly does |
@NicolBolas
|
Thanks!
Do you have a test case that this null check has measurable overhead? I'm not trying to be pedantic here... The reason I ask is because this summer I spent several weeks doing an extensive survey and measurement of 20 years' worth of processors, and could not measure any difference for a similar check for I didn't put the paper in a mailing yet because I'm working on a related companion paper that I want to publish at the same time, but for this thread I put a copy in my public misc-papers repo here: https://github.com/hsutter/misc-papers/blob/main/2229%20self%20move%20noop%20data.pdf The first Conclusion at the end of the paper was this: "Branches with the three characteristics above — with one alternative empty, virtually always taking the nonempty path, and testing a condition that uses only memory locations that will be used anyway by the nonempty path — should be considered cost-free on all known hardware." [ETA the other conclusion] The second Conclusion, pertinent here, was this, emphasis added: "Language [or in this case library] design decisions should be decided on their other merits, but not considering concern about potential performance cost of a branch for an identity test (e.g., as currently required for copy assignment) or not-null test (e.g., as currently required for delete and free). Those tests should be considered free." But although I was not able to measure a cost to this kind of test on any current hardware, I would be very interested if someone does have a repro, that would be very helpful. As I mention in the paper, I set out to measure the overhead, expecting it to be measurable, and ended up testing so many processors (with the help of many kind people acknowledged in the paper) because I kept failing to measure it. |
If the check doesn't have measurable overhead, then we also don't need to answer any of the other design/usability questions raised by trying to optimize for it. |
I made a very quick test using an online compiler:. https://rextester.com/l/cpp_online_compiler_gcc Open not_null_test.txt and paste the contents. Click the "Run it" button and it will show the measured times. Run it a few times to get an idea of the average difference. What I did was to simply create a new class called not_null_custom. It's an exact copy of gsl::not_null, except I modified the constructors. Edit: For this test I didn't actually add the constructor that takes a reference, but it shows the speedup without the null check. Edit2: I just noticed that I made a copy paste error. The first test uses not_null_custom in the first if-case. You may replace it with gsl::not_null, although it doesn't really matter. The second test using only not_null_custom is faster anyway. |
I've now fixed the mistake: You can run it directly with Godbolt: https://godbolt.org/z/nqre4E |
Quick ack: Thanks, @Amaroker, this is very helpful! |
[various edits to fix typos/grammar/clarity and add a note] Thanks for waiting, I've now had cycles to look at this. This is an interim update so far, I'll look at it again with the GSL maintainers, probably after the U.S. holiday week. First, I distilled it down to a minimal repro (by removing code that wasn't exercised in this test and moving the duplicated test loop into a template) and ensured it still preserved the reported performance difference. That got it down from ~560 lines to ~60 lines, and made it easier to see the delta between The first thing I discovered is that there are actually three differences in the exercised parts of the current and revised
(Also: The benchmark comparison was being done against Then I experimented with different parts of the code, and I found three micro changes each of which at least erased the performance difference and often/always made the current GSL version faster(!). Here is the updated repro: https://godbolt.org/z/nd81q1 In there, you'll find three places where I use The three things I found are:
I've verified that all three local changes have the same performance effect in the original 560-line repro So this is what I've found so far, and the benchmark does not appear to support the proposed change in this Issue. But it has led to potentially removing other inefficiencies in the GSL |
Thanks @hsutter for sharing your observations! The performance improvements that you suggest would be satisfying. I agree that there's no need to assert on each read (hence the removal) because the invariant has already been validated when the pointer was set in the constructor. But I'm still curious. You previously suggested that not null test is to be considered free. But in this example, that's only true in the constructor. So, what really surprises me is that the assertion (not-null test without __builtin_expect) has different performance impact in a constructor and in a member function. Any thoughts on that? |
@Amaroker That's true, and I'm still not certain that there's a need to remove that check inside I'm still investigating, but on some further experiments I ran today it does appear the extra check is free even in Thanks again for the repro! |
I rewrote Without std::terminate() the best option
Sorry for resurrecting issue that was last commented 2 years ago :) EDIT: |
Hi @KindDragon thanks for resurfacing this. |
Yes it does |
After checking the assembler code I found that the compiler is too smart and if we just take the address from a vector element, it knows that it is not null and removes the call to std::terminate. Corrected test:
So based on this new benchmark and assembler code from Godbolt https://godbolt.org/z/6oPjrMcdE I think we should use |
Thanks everyone! Based on feedback from the Guidelines editors in Guidelines issue 2006 and performance measurements, we'll remove the null check from However, we don't want to use the Microsoft's guidance is to not use |
Thank you for detailed answer
Hmm, but if the programmer wrote [[assume]] shouldn't we trust him? This could only happen in new code and he probably wrote it to optimize the code |
In C++ by default indeed we should trust them. And as a vendor we absolutely want to give them, as The Customer, the power tools they want. And we do trust the programmer even for some "assumptions" such as assuming that loop unrolling is okay or that vector code generation is okay. However, our experience with the general Why not just make the nonportable But a standard |
I appreciate your answer. If in the future it will be possible to implement |
This is a follow up to #396, where @hsutter expressed concern of changes needed to the call site and asked for examples.
To understand my situation better, here's some background information:
It's our company's policy to only use not_null for legacy code / existing APIs that we can't break. However, the first choice for new modern code is always strict_not_null, because we much prefer compile-time errors to run-time errors.
So, the only times we do changes to call sites are during internal refactoring when we strengthen the code with strict_not_null. With this approach I believe Herb's concern doesn't apply.
Consider the following function declaration:
void foo( const strict_not_null<X*>& p )
(1). Sometimes we already have a not_null pointer ready to use directly...
foo( px );
...and (2) sometimes we have a stack based value and we need to construct a temporary strict_not_null object.
foo( strict_not_null<X*>(&x) );
However, (2) is not as efficient as it could be. Performance could be improved by introducing a new explicit and noexcept constructor that avoids the null check. It wouldn't break any existing code.
The text was updated successfully, but these errors were encountered: