Rework initialization of constants & class variables#15333
Rework initialization of constants & class variables#15333straight-shoota merged 12 commits intocrystal-lang:masterfrom
Conversation
|
This should have a |
d1de2eb to
ce6e9a2
Compare
|
Rebased to add the |
|
I think it's not registering in the interface. Maybe it has to be on the last line 🤔 |
|
FWIW we could also add the co-author attribution in the squash commit when we merge this PR (though we'd need to remember that, so it's best to already add it to the original commits 😅) |
|
I'm pretty sure the easiest way would be to cherry pick their commits into a branch, then push up your own. If there are commits from more than 1 author on a branch GH should automatically add the |
|
I added a commit to only enable the feature from crystal EDIT: why is CI getting completely broken because I changed the version comparison from |
|
Trying to work on #14905 I notice that this is still concurrency unsafe because we only enable |
|
Moving back to draft while I'm working on #14905. I might change the new |
|
Ready again. I won't need change to the |
Co-authored-by: David Keller <davidkeller@tuta.io> Based on the PR by @BlobCodes: crystal-lang#15216 The performance improvement is two-fold: 1. the usage of a i8 instead of an i1 boolean to have 3 states instead of 2, which permits to quickly detect recursive calls without an array; 2. inline tricks to optimize the fast and slow paths. Unlike the PR: 1. Doesn't use atomics: it already uses a mutex that guarantees acquire release memory ordering semantics, and __crystal_once_init is only ever called in the main thread before any other thread is started. 2. Removes the need for a state maintained by the compiler, yet keeps forward and backward compatibility (both signatures are supported).
Co-authored-by: David Keller <davidkeller@tuta.io> @BlobCodes: I noticed that adding this code prevents LLVM from re-running the once mechanism multiple times for the same variable. Modified to avoid an undefined behavior when the assumption doesn't hold that doubles as a safety net (print error + exit).
Co-authored-by: David Keller <davidkeller@tuta.io> @BlobCodes: I think it would be better to print the bug message in `Crystal.once` instead of `__crystal_once` to reduce complexity at the callsite. The previous unreachable method can then be used in the inlined `__crystal_once` so LLVM also knows it doesn't have to re-run the method. It's now even safe because `Crystal.once` would panic if it failed; it should already be impossible, but let's err on the safe side.
f7e50b7 to
2296069
Compare
|
CI is finally all green. |
src/crystal/once.cr
Outdated
| # tell LLVM that it can optimize away repeated `__crystal_once` calls for | ||
| # this global (e.g. repeated access to constant in a single funtion); | ||
| # this is truly unreachable otherwise `Crystal.once` would have panicked | ||
| Crystal.once_unreachable unless flag.value.initialized? |
There was a problem hiding this comment.
praise: This is pretty darn clever 👏 Kudos to @BlobCodes
There was a problem hiding this comment.
I second that: thanks a lot @BlobCodes 🙇
| enum OnceState : Int8 | ||
| Processing = -1 | ||
| Uninitialized = 0 | ||
| Initialized = 1 | ||
| end |
There was a problem hiding this comment.
This is very nit-picky, but in my v2 branch, I replaced this trinary enum with a Int8 so all comparisons are done with 0 (== Initialized becomes > 0).
Comparing with 0 results in fewer (or smaller) assembly instructions on most CPUs
The only arch I'm really familiar with is risc-v, so here's an example in risc-v:
# enum comparison `return if state == Initialized`
li t0, 1 # load 1 into t0
bne t0, a0, init # initialize if needed
ret # early return (or normal program flow if inlined)
init:
# stuff
# comparison `return if state <= 0`
blez a0, init # initialize if needed
ret # early-return (or normal program flow if inlined)
init:
# stuffOf course this is micro-optimization, but if this is inlined into every const access, it could be noticable.
There was a problem hiding this comment.
EDIT: I fixed the examples below... I stupidly used a UInt8 🤦
On x86_64 the only difference is in the jump instruction:
; flag.value == 1
cmpb $0x1,(%rdi)
jne 39 <foo+0x9>
; flag.value > 0
cmpb $0x0,(%rdi)
jle 39 <foo+0x9>Same for ARM32:
; flag.value == 1
ldrb r0, [r0]
cmp r0, #1
moveq pc, lr
; flag.value > 0
ldrb r0, [r0]
cmp r0, #1
movge pc, lrBut AArch64 indeed only needs one instruction instead of two for the equality check: And same for AArch64:
; flag.value == 1
ldrb w8, [x0]
cmp w8, #0x1
b.ne 40 <foo+0x10>
; flag.value > 0
ldrb w8, [x0]
cmp w8, #0x1
b.lt 40 <*foo+0x10>NOTE: I'm not fluent in the assembly of each arch. I compiled a tiny program with --cross-compile --target=... then used objdump --disassemble from a crosschain build of binutils to compare the LLVM generated assembly.
There was a problem hiding this comment.
; flag.value > 0
cmpb $0x0,(%rdi)
je 39 <foo+0x9>To me that looks like an unsigned comparison. So it only checks x != 0.
There was a problem hiding this comment.
I just wrote a commit, but didn't push it (yet): I'm wondering if we'd really benefit from the change in practice 🤔
There was a problem hiding this comment.
I tested again, it's actually a jle!
There was a problem hiding this comment.
@BlobCodes I run my tests again and updated my comment above.
We can probably do better in manually written assembly, but LLVM actually generates the same assembly for all 3 architectures. The only difference stands in the jump instruction 🤷
There was a problem hiding this comment.
Let's see in a follow up if we can squeeze even more performance with this idea.
|
Do not forget!! Co-authored-by to be moved to the bottom |
|
@oprypin: GitHub seems to recognize it. Correctly populates the |
|
@straight-shoota That's not right. The suggested commit message that I get is huge, with all messages concatenated. Maybe you edited it yourself. Try to clear browser caches. |
|
I'm using https://github.com/refined-github/refined-github which is probably adjusting the commit message from the default experience (or it's GitHub's New Merge Experience feature). Anyway, as a result I get the correct attribution pre-filled. So it must be discoverable somehow. I |
|
Hmm well in that case it's sad that it completely loses the concatenated commit descriptions. That seems undesirable in general, not only here. Or maybe you consider it a positive, I don't know |

Follow up and closes #15216 by @BlobCodes. Also incorporates the LLVM optimization from master...BlobCodes:crystal:perf/crystal-once-v2.
Similar to the original PR, this changes the
flagto have 3 states instead of 2. This led to change the signature of the__crystal_once[_init]functions that now take an Int8 (i8) instead of Bool (i1). In addition it drops the "once state" that isn't needed anymore.This requires a new compiler build to benefit from the improvement. The legacy versions of the
__crystal_once[_init]methods are still supported by both the stdlib and the compiler to keep both forward & backward compatibility (1.15- can build 1.16+ and 1.16+ can build 1.15-).Unlike the original PR, this one doesn't use any atomics: the mutex already protects the flag, and the mutex itself is explicitly initialized by the main thread before any other thread is started (i.e. no parallelism issues).
A follow-up could leverage
ReferenceStorageand.unsafe_constructto inline theMutexinstead of allocating in the GC heap. Along with #15330 then__crystal_once_initcould become allocation free, which could prove useful for such a core/low level feature.