Rework initialization of constants & class variables by ysbaddaden · Pull Request #15333 · crystal-lang/crystal

ysbaddaden · 2025-01-09T13:38:30Z

Follow up and closes #15216 by @BlobCodes. Also incorporates the LLVM optimization from master...BlobCodes:crystal:perf/crystal-once-v2.

Similar to the original PR, this changes the flag to have 3 states instead of 2. This led to change the signature of the __crystal_once[_init] functions that now take an Int8 (i8) instead of Bool (i1). In addition it drops the "once state" that isn't needed anymore.

This requires a new compiler build to benefit from the improvement. The legacy versions of the __crystal_once[_init] methods are still supported by both the stdlib and the compiler to keep both forward & backward compatibility (1.15- can build 1.16+ and 1.16+ can build 1.15-).

Unlike the original PR, this one doesn't use any atomics: the mutex already protects the flag, and the mutex itself is explicitly initialized by the main thread before any other thread is started (i.e. no parallelism issues).

A follow-up could leverage ReferenceStorage and .unsafe_construct to inline the Mutex instead of allocating in the GC heap. Along with #15330 then __crystal_once_init could become allocation free, which could prove useful for such a core/low level feature.

src/crystal/once.cr

oprypin · 2025-01-09T17:11:01Z

This should have a Co-authored-by: attribution somewhere in some commit

ysbaddaden · 2025-01-10T16:41:38Z

Rebased to add the co-authored-by to the individual commits 👍

oprypin · 2025-01-10T17:09:33Z

I think it's not registering in the interface. Maybe it has to be on the last line 🤔

straight-shoota · 2025-01-10T17:20:55Z

FWIW we could also add the co-author attribution in the squash commit when we merge this PR (though we'd need to remember that, so it's best to already add it to the original commits 😅)

Blacksmoke16 · 2025-01-10T17:24:23Z

I'm pretty sure the easiest way would be to cherry pick their commits into a branch, then push up your own. If there are commits from more than 1 author on a branch GH should automatically add the co-authored-by to the squash commit message for all additional authors.

ysbaddaden · 2025-01-10T18:37:58Z

I added a commit to only enable the feature from crystal 1.16.0-dev. Let's merge after updating the version in the master branch.

EDIT: why is CI getting completely broken because I changed the version comparison from 1.15.0-dev to 1.16.0-dev?

ysbaddaden · 2025-01-10T19:20:21Z

Trying to work on #14905 I notice that this is still concurrency unsafe because we only enable Mutex for preview_mt. The example thus fails with "recursive initialization" which is wrong because they're concurrent accesses.

ysbaddaden · 2025-01-13T09:31:31Z

Moving back to draft while I'm working on #14905. I might change the new __crystal_once signature to take a closure_data pointer, so we can pass any proc.

ysbaddaden · 2025-01-13T18:50:27Z

Ready again. I won't need change to the fun signatures, so it's 👌

@BlobCodes

Co-authored-by: David Keller <davidkeller@tuta.io> Based on the PR by @BlobCodes: crystal-lang#15216 The performance improvement is two-fold: 1. the usage of a i8 instead of an i1 boolean to have 3 states instead of 2, which permits to quickly detect recursive calls without an array; 2. inline tricks to optimize the fast and slow paths. Unlike the PR: 1. Doesn't use atomics: it already uses a mutex that guarantees acquire release memory ordering semantics, and __crystal_once_init is only ever called in the main thread before any other thread is started. 2. Removes the need for a state maintained by the compiler, yet keeps forward and backward compatibility (both signatures are supported).

@BlobCodes

Co-authored-by: David Keller <davidkeller@tuta.io> @BlobCodes: I noticed that adding this code prevents LLVM from re-running the once mechanism multiple times for the same variable. Modified to avoid an undefined behavior when the assumption doesn't hold that doubles as a safety net (print error + exit).

@BlobCodes

Co-authored-by: David Keller <davidkeller@tuta.io> @BlobCodes: I think it would be better to print the bug message in `Crystal.once` instead of `__crystal_once` to reduce complexity at the callsite. The previous unreachable method can then be used in the inlined `__crystal_once` so LLVM also knows it doesn't have to re-run the method. It's now even safe because `Crystal.once` would panic if it failed; it should already be impossible, but let's err on the safe side.

ysbaddaden · 2025-01-14T11:57:55Z

CI is finally all green.

src/crystal/once.cr

src/compiler/crystal/codegen/once.cr

straight-shoota · 2025-01-16T14:54:27Z

src/crystal/once.cr

+    # tell LLVM that it can optimize away repeated `__crystal_once` calls for
+    # this global (e.g. repeated access to constant in a single funtion);
+    # this is truly unreachable otherwise `Crystal.once` would have panicked
+    Crystal.once_unreachable unless flag.value.initialized?


praise: This is pretty darn clever 👏 Kudos to @BlobCodes

I second that: thanks a lot @BlobCodes 🙇

src/crystal/once.cr

BlobCodes · 2025-01-16T15:54:50Z

src/crystal/once.cr

+    enum OnceState : Int8
+      Processing    = -1
+      Uninitialized = 0
+      Initialized   = 1
+    end


This is very nit-picky, but in my v2 branch, I replaced this trinary enum with a Int8 so all comparisons are done with 0 (== Initialized becomes > 0).

Comparing with 0 results in fewer (or smaller) assembly instructions on most CPUs
The only arch I'm really familiar with is risc-v, so here's an example in risc-v:

# enum comparison `return if state == Initialized` li t0, 1 # load 1 into t0 bne t0, a0, init # initialize if needed ret # early return (or normal program flow if inlined) init: # stuff # comparison `return if state <= 0` blez a0, init # initialize if needed ret # early-return (or normal program flow if inlined) init: # stuff

Of course this is micro-optimization, but if this is inlined into every const access, it could be noticable.

EDIT: I fixed the examples below... I stupidly used a UInt8 🤦

On x86_64 the only difference is in the jump instruction:

; flag.value == 1 cmpb $0x1,(%rdi) jne 39 <foo+0x9> ; flag.value > 0 cmpb $0x0,(%rdi) jle 39 <foo+0x9>

Same for ARM32:

; flag.value == 1 ldrb r0, [r0] cmp r0, #1 moveq pc, lr ; flag.value > 0 ldrb r0, [r0] cmp r0, #1 movge pc, lr

~~But AArch64 indeed only needs one instruction instead of two for the equality check:~~ And same for AArch64:

; flag.value == 1 ldrb w8, [x0] cmp w8, #0x1 b.ne 40 <foo+0x10> ; flag.value > 0 ldrb w8, [x0] cmp w8, #0x1 b.lt 40 <*foo+0x10>

NOTE: I'm not fluent in the assembly of each arch. I compiled a tiny program with --cross-compile --target=... then used objdump --disassemble from a crosschain build of binutils to compare the LLVM generated assembly.

; flag.value > 0 cmpb $0x0,(%rdi) je 39 <foo+0x9>

To me that looks like an unsigned comparison. So it only checks x != 0.

I just wrote a commit, but didn't push it (yet): I'm wondering if we'd really benefit from the change in practice 🤔

I tested again, it's actually a jle!

@BlobCodes I run my tests again and updated my comment above.

We can probably do better in manually written assembly, but LLVM actually generates the same assembly for all 3 architectures. The only difference stands in the jump instruction 🤷

Let's see in a follow up if we can squeeze even more performance with this idea.

src/crystal/once.cr

oprypin · 2025-01-17T13:11:56Z

Do not forget!! Co-authored-by to be moved to the bottom

straight-shoota · 2025-01-17T13:18:09Z

@oprypin: GitHub seems to recognize it. Correctly populates the Co-Authored-By tag for the commit message:

oprypin · 2025-01-17T13:49:35Z

@straight-shoota That's not right. The suggested commit message that I get is huge, with all messages concatenated. Maybe you edited it yourself. Try to clear browser caches.

straight-shoota · 2025-01-17T15:07:48Z

I'm using https://github.com/refined-github/refined-github which is probably adjusting the commit message from the default experience (or it's GitHub's New Merge Experience feature). Anyway, as a result I get the correct attribution pre-filled. So it must be discoverable somehow. I

oprypin · 2025-01-17T18:13:14Z

Hmm well in that case it's sad that it completely loses the concatenated commit descriptions. That seems undesirable in general, not only here. Or maybe you consider it a positive, I don't know

ysbaddaden added kind:refactor topic:stdlib:runtime labels Jan 9, 2025

ysbaddaden self-assigned this Jan 9, 2025

ysbaddaden added the topic:compiler:codegen label Jan 9, 2025

Sija reviewed Jan 9, 2025

View reviewed changes

src/crystal/once.cr Show resolved Hide resolved

ysbaddaden force-pushed the refactor/crystal-once branch from d1de2eb to ce6e9a2 Compare January 10, 2025 16:40

ysbaddaden marked this pull request as draft January 13, 2025 09:30

ysbaddaden mentioned this pull request Jan 13, 2025

Add fiber safety to __crystal_once & class_[getter|property]?(&) macros #15340

Closed

ysbaddaden marked this pull request as ready for review January 13, 2025 18:49

ysbaddaden added 6 commits January 14, 2025 09:53

Fix: macro 'class_property' must be defined

6ebfcf4

Fix: only enabled for 1.16.0-dev (not 1.15.0-dev)

71a1c4d

Add LLVM optimization to legacy __crystal_once

2296069

ysbaddaden force-pushed the refactor/crystal-once branch from f7e50b7 to 2296069 Compare January 14, 2025 08:57

Fix: LLVM 14 and below need explicit pointer cast (i8* -> i1*)

2c1c20b

Fix: missing :nodoc:

273978b

straight-shoota reviewed Jan 16, 2025

View reviewed changes

BlobCodes reviewed Jan 16, 2025

View reviewed changes

src/crystal/once.cr Show resolved Hide resolved

ysbaddaden added 4 commits January 16, 2025 19:46

Extract Intrinsics.unreachable

e7527d3

Only cast i8* to i1* on LLVM < 15

e42a575

No need for Crystal.once_mutex getter

4f5092e

Fix: crystal tool format

cf58312

straight-shoota approved these changes Jan 16, 2025

View reviewed changes

ysbaddaden added this to the 1.16.0 milestone Jan 17, 2025

straight-shoota merged commit 8d02c8b into crystal-lang:master Jan 20, 2025
70 checks passed

ysbaddaden deleted the refactor/crystal-once branch January 30, 2025 17:42

HertzDevil mentioned this pull request Mar 6, 2025

RFC: unreachable! method to express unreachable code explicitly #4749

Closed

BrewTestBot mentioned this pull request Apr 9, 2025

crystal 1.16.0 Homebrew/homebrew-core#219037

Merged

1 task

Uh oh!

Conversation

ysbaddaden commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oprypin commented Jan 9, 2025

Uh oh!

ysbaddaden commented Jan 10, 2025

Uh oh!

oprypin commented Jan 10, 2025

Uh oh!

straight-shoota commented Jan 10, 2025

Uh oh!

Blacksmoke16 commented Jan 10, 2025

Uh oh!

ysbaddaden commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysbaddaden commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysbaddaden commented Jan 13, 2025

Uh oh!

ysbaddaden commented Jan 13, 2025

Uh oh!

ysbaddaden commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

straight-shoota Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BlobCodes Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlobCodes Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oprypin commented Jan 17, 2025

Uh oh!

straight-shoota commented Jan 17, 2025

Uh oh!

oprypin commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

straight-shoota commented Jan 17, 2025

Uh oh!

oprypin commented Jan 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ysbaddaden commented Jan 9, 2025 •

edited

Loading

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 10, 2025 •

edited

Loading

ysbaddaden commented Jan 14, 2025 •

edited

Loading

BlobCodes Jan 16, 2025 •

edited

Loading

ysbaddaden Jan 16, 2025 •

edited

Loading

oprypin commented Jan 17, 2025 •

edited

Loading