Make interned's last_interned_at equal Revision::MAX if they are interned outside a quer#804
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #804 will degrade performances by 5.79%Comparing Summary
Benchmarks breakdown
|
fea2e4b to
fafa4ca
Compare
|
Nonsense benchmark. |
|
I think we discussed this before, but couldn't come up with a test case that would make it fail due to the db lifetime. The idea was to set |
Well rust-analyzer needs
I think yes. I thought about that, but removing the assert seemed easier than starting to mess with the query stack in the struct interning. Is there a specific reason you want to keep this assert? |
|
I'd prefer to set the revision to |
6629b7c to
d97ed2c
Compare
|
@ibraheemdev I edited per your suggestion. |
last_interned_at >= last_changed_revisiond97ed2c to
88c7b9d
Compare
src/interned.rs
Outdated
| if value.last_interned_at.load() < current_revision { | ||
| value.last_interned_at.store(current_revision); | ||
| } | ||
|
|
There was a problem hiding this comment.
You could use fetch_max here to store the maximum between the current revision and the last_interned_at to avoid two separate atomic operations.
There was a problem hiding this comment.
This is AtomicRevision, not AtomicUsize. I will need to define fetch_max(), and it's not worth it. It's not like a race condition is problematic here.
There was a problem hiding this comment.
AtomicRevision is just a small wrapper around AtomicUsize. You can see OptionalAtomicRevision how we exposed other atomic methods.
This isn't just about races, it's also about avoiding unnecessary atomic operations in a very hot method
There was a problem hiding this comment.
fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.
There was a problem hiding this comment.
Although we can be faster via branchless; I will change to that.
88c7b9d to
22f0dc2
Compare
|
@MichaReiser Addressed comments. |
…interned outside a query There is an assert that `last_interned_at >= last_changed_revision`, and it can fail without this, see the added test.
22f0dc2 to
97a04e2
Compare
| value.last_interned_at.store(std::cmp::max( | ||
| current_revision, | ||
| value.last_interned_at.load(), | ||
| )); |
There was a problem hiding this comment.
Hmm, that was not the idea. The idea was to use AtomicUsize::fetch_max to combine the load and store instructions
Something like
value.last_interned_at.fetch_max(current_revision, Ordering::XXX)where AtomicRevision::fetch_max internally calls fetch_max
Would you mind making this change in a follow up PR?
There was a problem hiding this comment.
Interestngly enough it seems the fetch_max version is worse? https://godbolt.org/z/9efcq7cnh
There was a problem hiding this comment.
Interesting. It sort of make sense because both operations now are atomic. It'd be interesting to see if arm64 produces more efficient instructions
There was a problem hiding this comment.
When these operations affect more than one bit, they cannot be represented by a single x86-64 instruction. Similarly, the fetch_max and fetch_min operations also have no corresponding x86-64 instruction. For these operations, we need a different strategy than a simple lock prefix.
A later version of ARM64, part of ARMv8.1, also includes new CISC style instructions for common atomic operations. For example, the new ldadd (load and add) instruction is equivalent to an atomic fetch_add operation, without the need for an LL/SC loop. It even includes instructions for operations like fetch_max, which don’t exist on x86-64.
It also includes a cas (compare and swap) instruction corresponding to compare_exchange. When this instruction is used, there’s no difference between compare_exchange and compare_exchange_weak, just like on x86-64.
While the LL/SC pattern is quite flexible and nicely fits the general RISC pattern, these new instructions can be more performant, as they can be easier to optimize for with specialized hardware.
https://marabos.nl/atomics/hardware.html
fetch_max should be more efficient on ARM64.
There was a problem hiding this comment.
That's exactly what I said:
fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.
And ARM is the same in this regard.
There was a problem hiding this comment.
Generally RMW operations are expensive compared to regular (non-seqcst) load/stores. On x86 these will compile to regular (same as non-atomic) load/store instructions, while RMWs entail a strong barrier (a pipeline stall). If the branch can avoid performing a store the load may be worth it (as a contended store is much more expensive than a branch/load), but I would stay away from the RMW.
There is an assert that
last_interned_at >= last_changed_revision, and it can fail without this, see the added test.CC @ibraheemdev, you introduced this assert in #602.