Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sysvabi64] Add chapter on Thread Local Storage #311

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

smithp35
Copy link
Contributor

The thread local storage chapter contains:

  • A description of Thread Local Storage based on addenda32
  • The key design decisions of AArch64 TLS such as tls variant, tls dialect, TCB size.
  • The ABI required code sequence for TLSDESC that must be emitted exactly, as GNU ld requires it to be.
  • Sequences for the different code-models.
  • Relaxations for GD->IE, GD->LE and IE->LE.
  • Synchronization requirements for Lazy TLSDESC. With advice not to support it due to overhead of synchronization.

The thread local storage chapter contains:
* A description of Thread Local Storage based on addenda32
* The key design decisions of AArch64 TLS such as tls variant,
  tls dialect, TCB size.
* The ABI required code sequence for TLSDESC that must be emitted
  exactly, as GNU ld requires it to be.
* Sequences for the different code-models.
* Relaxations for GD->IE, GD->LE and IE->LE.
* Synchronization requirements for Lazy TLSDESC. With advice not
  to support it due to overhead of synchronization.
and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be
the smallest positive integer that satisfies the following congruence:

``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TP+TCB+PAD on the left could be confusing, as TCB is placed before TP. Perhaps mention the requirement of TP first (= 0 (modulo p_align)), then describe PAD and this formula.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.

Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``.

A significant number of dynamic linkers use a different calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that glibc Variant I handles p_vaddr!=0 (mod p_align) correctly. The bug (https://sourceware.org/bugzilla/show_bug.cgi?id=24606
) is for Variant II (x86 etc).

I have fixed FreeBSD rtld's Variant II in https://reviews.freebsd.org/D31538 . Its Variant I may or may not have the bug.

musl has been good since 1.1.23

Therefore, it's probably not "a significant number" but yeah p_vaddr=0 (mod p_align) is good for maximum compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.

The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).

add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var

Static link time TLS Relaxations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps call this Optimization to be consistent with x86/ppc and "Relocation optimization" (ADRP) and leave the term "relocation relaxation" for RISC-V style section shrinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.

I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.


Undefined Weak Symbols

An undefined weak symbol has the value 0. As the resolver function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the glibc behavior, but musl doesn't have the special __dl_tlsdesc_undefweak. I think it's better to allow flexibility and require a particular behavior on undefined weak TLS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is just an example of what can be done. I've written at the top of the section

The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker

and

These examples are for illustrative purposes only

I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.

* Edits to split up the bullet points in How to denote TLS
  in source.
* Changed program-own state to process-state as the thread-id
  may not be stored separately from the programs data.
* Removed typically from some of the descriptions as the typically
  will almost always be the case for a sysvabi platform.
* Linked alignment padding to the definition.
* Provided a bit more information about generation counters.
* Rearranged formulas and used TCBsize to make it clearer.
* Taken out "significant" from a significant number of dynamic
  linkers.
* Give reason for using relaxation rather than optimization.
* Clarify that there is no requirement to implement any TLSDESC
  resolver given in the sysvabi.
Copy link
Contributor Author

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the review.

I've updated based on this and some comments I received internally.

Given that ``TP ≡ 0 (modulo PT_TLS.p_align)``. An expression
for `PAD` is ``PAD = (PT_TLS.p_vaddr - TCB) mod PT_TLS.p_align``.

A significant number of dynamic linkers use a different calculation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to be confident on the status of the various dynamic linkers. I can remove the significant part.

The glibc bug looks good for static TLS, it does mention in https://sourceware.org/bugzilla/show_bug.cgi?id=24606#c7 that dynamic TLS still needs p_vaddr to be 0 (modulo p_align).

add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var
ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var

Static link time TLS Relaxations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.

I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.


Undefined Weak Symbols

An undefined weak symbol has the value 0. As the resolver function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is just an example of what can be done. I've written at the top of the section

The TLS resolver functions are not standardized by this ABI as they are internal to the dynamic linker

and

These examples are for illustrative purposes only

I'll see if there's anything I can do to state that there is no requirement to implement a specific resolver function.

and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be
the smallest positive integer that satisfies the following congruence:

``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)``
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants