Global variables and atomics #314

jfbastien · 2015-08-27T04:36:11Z

Implementation of global variables for LLVM, and subsequent discussion with @ncbray led me to realize an interesting interaction with atomics which I think is worth pointing out and documenting as a "question".

Treat this a a brain dump. In particular, it doesn't require your attention now. Add to it if you wish, let's make sure we close it out before MVP, but let's keep focused on getting some basics working first.

Once we add threading to WebAssembly, are globals accessible atomically?

The answer to this may mean we diverge slightly from JavaScript Shared Memory that @lars-t-hansen is working on. This means that any theorem prover we use to validate the JavaScript memory model has to take global variables into account for WebAssembly.

If we follow LLVM's lead and simply tag load/store with atomic (and a memory order e.g. acquire/release/seq_cst) then this extends quite easily to load_global and store_global. If we create separate atomic opcodes from load/store, then we must also duplicate load_global/store_global with atomic siblings.

An interesting point about global variables is that none of them are address-taken. A code generator could decide to emit no global, and always put things in a heap. That opens up developer's code to heap object overflow, whereas globals are always "safe" (bugs in developers' code can't cause their app to get owned through globals being overwritten).

Globals also potentially lead to better optimization by the .wasm→machine code translator. Doubly so when dealing with atomics because it provides very precise type and pointer escape information.

Global variables can also be exported to shared objects by name.

I'm now wondering: why isn't global an attribute on load/store the same way atomic could be?

While we're there, why isn't HEAP a magical exported global, of byte-array type (assuming we allow array global variables, and force them to always be in bounds) which the main module can share with dynamically loaded libraries? This is going back to what Emscripten does, except it has these handy aliases that type-cast the byte array to other types.

The text was updated successfully, but these errors were encountered:

lukewagner · 2015-08-27T14:06:10Z

It'd be more work, but you're right there would be potential performance upsides to having unaliased memory locations with atomic access. Definitely worth considering.

I'm now wondering: why isn't global an attribute on load/store the same way atomic could be?

One difference is that load/store have an expression tree as operand for the address whereas the global ops would have an immediate integer index (into the global array). It'd be nice not to have the global attribute affect the opcode's signature.

While we're there, why isn't HEAP a magical exported global, of byte-array type (assuming we
allow array global variables, and force them to always be in bounds) which the main module can
share with dynamically loaded libraries?

I'm assuming you're referring to linear memory itself, not the aliased-stack-in-the-heap pointer. In that case, I don't understand the use case: linear memory is fundamentally shared between the main module and all dylibs that get linked in. Are you talking about sharing heaps between different (not dynamically linked) modules? If so, that raises the granularity-of-sharing questions in the middle of #304.

titzer · 2015-08-27T14:30:38Z

On Thu, Aug 27, 2015 at 4:06 PM, Luke Wagner [email protected]
wrote:

It'd be more work, but you're right there would be potential performance
upsides to having unaliased memory locations with atomic access. Definitely
worth considering.

I'm now wondering: why isn't global an attribute on load/store the same
way atomic could be?

I think the reasoning is that since globals are all typed, a single
LoadGlobal/SetGlobal bytecode suffices. Also, the global's index is part of
the bytecode, not an operand, as Luke said. We will need to add atomicity
to LoadGlobal/StoreGlobal in the future.

One difference is that load/store have an expression tree as operand for

the address whereas the global ops would have an immediate integer index
(into the global array). It'd be nice not to have the global attribute
affect the opcode's signature.

While we're there, why isn't HEAP a magical exported global, of byte-array
type (assuming we
allow array global variables, and force them to always be in bounds) which
the main module can
share with dynamically loaded libraries?

I'm assuming you're referring to linear memory itself, not the
aliased-stack-in-the-heap pointer. In that case, I don't understand the use
case: linear memory is fundamentally shared between the main module and all
dylibs that get linked in. Are you talking about sharing heaps between
different (not dynamically linked) modules? If so, that raises the
granularity-of-sharing questions in the middle
#304 (comment)
of #304 #304.

—
Reply to this email directly or view it on GitHub
#314 (comment).

AndrewScheidecker · 2015-08-27T14:48:56Z

Is a C++ compiler going to be smart enough to put a global std::atomic variable in a global instead of a data segment?

lukewagner · 2015-08-27T15:11:36Z

With LTO, yes, it could see the address is never taken in the whole program and the symbol is not exported.

jfbastien · 2015-08-27T16:07:15Z

@lukewagner said:

One difference is that load/store have an expression tree as operand for the address whereas the global ops would have an immediate integer index (into the global array). It'd be nice not to have the global attribute affect the opcode's signature.

"into the global array" what do you mean? Globals are unaliased with anything else, including linear memory. What's the global array you refer to? The process' virtual address space?

What I'm suggesting is that linear memory access could just be access to a global (HEAP or LINEAR_MEMORY). Indexing works because my proposal requires us to support globals which are arrays. Globals can still be scalars, in which case you can't index them.

@lukewagner said:

I'm assuming you're referring to linear memory itself, not the aliased-stack-in-the-heap pointer. In that case, I don't understand the use case: linear memory is fundamentally shared between the main module and all dylibs that get linked in. Are you talking about sharing heaps between different (not dynamically linked) modules? If so, that raises the granularity-of-sharing questions in the middle of #304.

Yes, linear memory. Yes it's shared, but what I'm suggesting is that this sharing isn't fundamental: the main module says (global HEAP i32[1024]) (export HEAP) and a dylib says (import HEAP i32[]) (assuming there's a way to share size, maybe that's a global of its own).

This means that you can have multiple independent heap arrays if you so desire, and you can shared whichever ones you want.

@AndrewScheidecker said:

Is a C++ compiler going to be smart enough to put a global std::atomic variable in a global instead of a data segment?

Yes, though we don't even need LTO for this. Note that a C++ compiler can also leave everything in linear memory if it wants! C++ semantics make either choice acceptable, and the compiler chooses whether to use globals or not (but globals have nifty properties around security and performance, so using them is nice).

kg · 2015-08-27T17:07:49Z

LoadGlobal/SetGlobal bytecode suffices. Also, the global's index is part of the bytecode, not an operand, as Luke said.

Hang on, what? A pair of opcodes per unique global? That doesn't make any sense, the identity of the global being loaded/stored has to be an operand, otherwise a typical application would have a hundred unique opcodes filling up the opcode table.

titzer · 2015-08-27T17:22:43Z

There's one LoadGlobal and one StoreGlobal opcode, followed by a byte
(actually a varint in v8-native) indicating which global. It's similar to
GetLocal/StoreLocal.

On Thu, Aug 27, 2015 at 7:08 PM, Katelyn Gadd [email protected]
wrote:

LoadGlobal/SetGlobal bytecode suffices. Also, the global's index is part
of the bytecode, not an operand, as Luke said.

Hang on, what? A pair of opcodes per unique global? That doesn't make any
sense, the identity of the global being loaded/stored has to be an operand,
otherwise a typical application would have a hundred unique opcodes filling
up the opcode table.

—
Reply to this email directly or view it on GitHub
#314 (comment).

lukewagner · 2015-08-27T23:31:13Z

"into the global array" what do you mean?

What @titzer said.

What I'm suggesting is that linear memory access could just be access to a global...

Ah, I see. I like the idea of trying to unify seemingly-similar things (especially when we start considering adding atomic ops for globals). However, there are a lot of sophisticated things we want to do with linear memory that I don't think would map very well to general array types (e.g., resizing, the mmap toolkit); it'd be sorta like the old ArrayBuffer limitations of asm.js again. Neither an I keen on trying to generalize every one of these so that they work on N independent heaps accessed by M independent modules. 1:1 (with dynamic linking injecting into the 1) avoids a lot of complexity.

Also, I wonder how common of a need this will be once we have dynamic linking and shmem: when do you want a separate linear memory (which means you can't easily mix pointers and so you're talking about really separate address spaces) but still want this load-time-fixed heap sharing? We've definitely seen cases where you want to shuffle memory between two big arrays in asm.js, but usually it's the heap and something dynamic (say, a canvas's array buffer) that wants GC integration.

Global (and local) array types are definitely worth considering independently as a feature (not meant to replace linear memory). @sunfishcode pointed out that they could actually be slower than heap access when signal handlers are used to eliminate bounds checking for linear memory access (assuming each global array isn't given a 4GiB reservation :). So it'd be good to measure once we're farther along to see what the benefits would be in practice.

jfbastien · 2015-08-28T01:30:49Z

@lukewagner agreed, the main point of this issue was to discuss globals + atomics. Asking questions about the rest is more of a self-sanity check: it does sound like we have good reasons to have gone the way we did!

lukewagner · 2015-09-08T18:31:08Z

Extending conversation in #154 here:

While there are definite use cases for unaliased thread-local variables (e.g., sp) and as well as aliased thread-local and shared global variables (global data + dynamic linking), I think the use case for unaliased global variables is weak. This whole issue highlights that there is non-trivial complexity cost to having shared unaliased globals. If we remove unaliased shared global variables, then, following the proposal in #154, we don't need separate atomicity annotations for globals; either we use linear memory ops for the aliased globals or the plain global ops for unaliased thread-locals.

sunfishcode · 2015-10-23T21:38:33Z

As discussed in #344, we no longer have globals in the MVP.

binji · 2017-05-09T21:13:32Z

We have globals again, and now we have the v1 threads proposal. Is this something we want to resurrect there? My assumption is that we'll want to postpone this until we start looking at pure WebAssembly threads.

jfbastien · 2017-05-09T23:59:47Z

I think the main points are:

Global atomicity.
Multiple memories.

You can address 1. in the threads repo, and we can tackle 2. separately.

Closing.

jfbastien added the question label Aug 27, 2015

jfbastien added this to the MVP milestone Aug 27, 2015

jfbastien mentioned this issue Aug 27, 2015

Atomics across different objects tc39/proposal-ecmascript-sharedmem#16

Closed

This was referenced Sep 2, 2015

Loads, stores, memory types, and conversions #326

Closed

Postpone adding globals until dynamic linking #154

Closed

sunfishcode modified the milestones: Future Features, MVP Oct 23, 2015

lukewagner modified the milestones: Future Features, Essential Post-MVP Features Oct 29, 2015

sunfishcode removed the question label Jul 12, 2016

jfbastien closed this as completed May 9, 2017

jfbastien mentioned this issue May 23, 2017

🛤 threads #1073

Closed

binji mentioned this issue Oct 16, 2018

🛤 threads WebAssembly/proposals#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global variables and atomics #314

Global variables and atomics #314

jfbastien commented Aug 27, 2015

lukewagner commented Aug 27, 2015

titzer commented Aug 27, 2015

AndrewScheidecker commented Aug 27, 2015

lukewagner commented Aug 27, 2015

jfbastien commented Aug 27, 2015

kg commented Aug 27, 2015

titzer commented Aug 27, 2015

lukewagner commented Aug 27, 2015

jfbastien commented Aug 28, 2015

lukewagner commented Sep 8, 2015

sunfishcode commented Oct 23, 2015

binji commented May 9, 2017

jfbastien commented May 9, 2017

Global variables and atomics #314

Global variables and atomics #314

Comments

jfbastien commented Aug 27, 2015

lukewagner commented Aug 27, 2015

titzer commented Aug 27, 2015

AndrewScheidecker commented Aug 27, 2015

lukewagner commented Aug 27, 2015

jfbastien commented Aug 27, 2015

kg commented Aug 27, 2015

titzer commented Aug 27, 2015

lukewagner commented Aug 27, 2015

jfbastien commented Aug 28, 2015

lukewagner commented Sep 8, 2015

sunfishcode commented Oct 23, 2015

binji commented May 9, 2017

jfbastien commented May 9, 2017