Skip to content

PoC: Replace Thread::Local(T) with Thread::LocalStorage#16029

Closed
ysbaddaden wants to merge 7 commits intocrystal-lang:masterfrom
ysbaddaden:feature/custom-thread-local-storage
Closed

PoC: Replace Thread::Local(T) with Thread::LocalStorage#16029
ysbaddaden wants to merge 7 commits intocrystal-lang:masterfrom
ysbaddaden:feature/custom-thread-local-storage

Conversation

@ysbaddaden
Copy link
Collaborator

@ysbaddaden ysbaddaden commented Jul 28, 2025

Alternative to #15616 that doesn't work in some situations on x86_64-darwin as we found out in the macOS CI runs of #15955.

After wasting hours of debug on CI, I can't get PCRE2 + TSS + stack unwind to work correctly on this target. Either PCRE2+TSS segfaults when creating the key, or it works and we can't raise anymore (unwinding the stack doesn't work). I have no idea where the problem lies.

This PoC merely reimplements the system TSS/TLS (thread specific/local storage) in pure Crystal. It's one GC malloc per thread that needs to store something, yet still allows dynamic key creation/deletion at runtime. A compile time static table as proposed by @BlobCodes might be more interesting (but no runtime dynamism).

On targets that don't support the @[ThreadLocal] annotation the interface merely calls into the system API because that would otherwise do the same thing twice: 1. lookup the key in system TSS table and 2. lookup the key in the crystal TSS table).

Performance wise, it should be as efficient as the system TSS; we could probably use the unsafe variant of @[ThreadLocal] proposed by @BlobCodes too in #15889 since there's no fiber yield point within the API (though we might want to annotate some methods with @[NoInline]).

NOTE: to be honest I'm not so sure about this solution, it seems really overkill. I'm not even sure we'll use it outside of PCRE2... but maybe there are other external libraries that could benefit from thread locals with destructors?

NOTE: another use case for thread-local-storage is #16157 (thread local RNG instance).

@ysbaddaden ysbaddaden self-assigned this Jul 28, 2025
end

{% if LocalStorage.has_constant?(:Table) %}
property local_storage = Pointer(LocalStorage::Table).null
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pointer is to keep a live pointer to the per-thread table, so the GC won't collect it; we could spare it if we had a binding for GC_malloc_uncollectable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used GC_malloc_uncollectable in some libraries/experiments already and it's very slow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn.

Is the call slow, or is having uncollectable memory slowing down the GC? If it's the former it might be acceptable since we expect to malloc every now and then.

@ysbaddaden
Copy link
Collaborator Author

On targets that don't support the @[ThreadLocal] annotation the interface merely calls into the system API because that would otherwise do the same thing twice.

I might revise this: with a thread pool (see #15885) and the ability to move an execution context scheduler to another thread (see #15871) we might want to attach data to the current scheduler rather than the current thread 🤔

That would make the custom implementation more legit.

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Sep 11, 2025

Note: scheduler-local-storage might be useful for io_uring where we'll need a local ring for each thread scheduler in the execution context.

@ysbaddaden
Copy link
Collaborator Author

On targets that don't support the @[ThreadLocal] annotation the interface merely calls into the system API because that would otherwise do the same thing twice: 1. lookup the key in system TSS table and 2. lookup the key in the crystal TSS table).

This is actually wrong: storing a Reference into a thread local can lead the GC to collect the reference —the GC can scan thread locals.

Also, the compiler should set llvm::TargetOptions::EmulatedTLS for said targets (namely Android < 29, MinGW and OpenBSD) so that @[ThreadLocal] would work on all supported targets. Both Rust and Zig do that. It might be set by default for some targets, I just tested OpenBSD for example and it worked.

An issue is that the option is only available to the C++ API, and both Zig and Rust use custom wrappers.

Implements the usual TLS/TSS solution on top of the `@[ThreadLocal]`
annotation on targets where it's available and working, otherwise merely
calls into the system API and skips the custom implementation as it
would merely duplciate what the system does.

The interface follows the system APIs that use void pointers so storing
anything in TLS/TSS doesn't create a hard dependency. For example PCRE2
can default to register keys with destructors that call into LibPCRE2,
without requiring to always link libpcre2 when we don't use regular
expressions.
@ysbaddaden ysbaddaden force-pushed the feature/custom-thread-local-storage branch from b7a22f9 to bc9dc05 Compare September 25, 2025 15:34
@ysbaddaden
Copy link
Collaborator Author

Updated.

  1. Rebased from master.
  2. Now always uses the custom table to be GC safe.
  3. Added a Random.default use case (see Random::DEFAULT is thread unsafe #16157).

The original implementation required to keep a reference to any GC
memory stored in Thread::LocalStorage when using the TSS or FLS
fallbacks because the GC can't scan the tables and will consider the
memory to be unreachable and collect it (oops).

This refactor always uses the custom table and only stores the pointer
to the table in TSS/FLS which is safe because a pointer to the table is
always kept in reachable memory.
@ysbaddaden ysbaddaden force-pushed the feature/custom-thread-local-storage branch from bc9dc05 to 92e77ee Compare September 25, 2025 15:41
@ysbaddaden

This comment was marked as outdated.

@ysbaddaden
Copy link
Collaborator Author

It's a nice approach, but we can do something much simpler: #16173.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants