-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform JS static allocations at compile time #7850
Conversation
@@ -527,6 +527,15 @@ def is_int(x): | |||
return False | |||
|
|||
|
|||
def align_memory(addr): | |||
return (addr + 15) & -16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find hex literals more readable for bitwise operations but YMMV
def apply_forwarded_data(forwarded_data): | ||
forwarded_json = json.loads(forwarded_data) | ||
# Be aware of JS static allocations | ||
shared.Settings.STATIC_BUMP = forwarded_json['STATIC_BUMP'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously STATIC_BUMP is preexisting, but it does seem odd to be setting global settings (which usually just come via command-line flags or a certain narrow set of ways to configure) based on this forwarded data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually does this mean that STATIC_BUMP can now be demoted from a shared setting to something local to emscripten.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is slightly odd, yeah, we use shared.Settings
as a global location for compiler state. We do mark them in settings.js
with comments saying some are just for internal use, like this one. But perhaps they could be in shared.Internals
or such?
Specifically here, this can't be a local var to the file since we do want to use it elsewhere (e.g. when we write the metadata we write this - although I didn't update all those places yet in this PR, to keep it small).
emscripten.py
Outdated
static_bump = shared.Settings.STATIC_BUMP | ||
# * then the stack (up on fastcomp, down on upstream) | ||
stack_low = align_memory(global_start + static_bump) | ||
stack_high = stack_low + shared.Settings.TOTAL_STACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack_high should be aligned too, right, at least for wasm backend? are there restrictions on TOTAL_STACK increments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll align that too.
I don't think we have any checks on the size of the stack, but we could have the compiler error if it's not a multiple if 16 I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably at least have an assert, a misaligned stack would be an ABI violation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added that in ASSERTIONS mode at runtime, for the stack (and heap too).
if shared.Settings.SIDE_MODULE or shared.Settings.WASM: | ||
pre = pre.replace('{{{ STATIC_BUMP }}}', str(staticbump)) | ||
|
||
pre = apply_memory(pre, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the new logic still correct for sidemodules and/or !WASM? I don't see any mentions of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identical for wasm and !wasm.
In a side module we wouldn't have STACK_BASE etc., as the location is determined at runtime anyhow (by the dynamic loader). So maybe it's a little odd we compute those even though they are not written out. I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
This should probably increment |
@rianhunter I'm not sure this changes the ABI, yet? If we stop passing DYNAMICTOP_PTR from JS to wasm then that would be a change, but this PR doesn't go that far yet. Or is there something else here? |
@kripken I looked briefly but I thought this might change the exports available to the hosted wasm. If all the same memory-layout exports are available then ignore my comment. Otherwise I understand this only changes how the JS runtime allocates static memory. |
Ok, yeah, then this should be fine as-is. In fact I verified earlier that the wasm emitted is identical to before this PR - the change is only on the JS side. Thanks for checking - good to have more eyes on this stuff. |
src/library_pthread.js
Outdated
@@ -529,7 +528,7 @@ var LibraryPThread = { | |||
}, | |||
|
|||
_num_logical_cores__deps: ['emscripten_force_num_logical_cores'], | |||
_num_logical_cores: '; if (ENVIRONMENT_IS_PTHREAD) __num_logical_cores = PthreadWorkerInit.__num_logical_cores; else { PthreadWorkerInit.__num_logical_cores = __num_logical_cores = allocate(1, "i32*", ALLOC_STATIC); HEAPU32[__num_logical_cores>>2] = navigator["hardwareConcurrency"] || ' + {{{ PTHREAD_HINT_NUM_CORES }}} + '; }', | |||
_num_logical_cores: '; if (ENVIRONMENT_IS_PTHREAD) __num_logical_cores = PthreadWorkerInit.__num_logical_cores; else { PthreadWorkerInit.__num_logical_cores = __num_logical_cores = {{{ makeStaticAlloc(1) }}}; HEAPU32[__num_logical_cores>>2] = navigator["hardwareConcurrency"] || ' + {{{ PTHREAD_HINT_NUM_CORES }}} + '; }', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be makeStaticAlloc(4)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yeah, fixing these. (Technically the allocations are stack aligned, so 1 means at least 4 anyhow, but better not to rely on that.)
src/library_pthread.js
Outdated
@@ -1031,7 +1030,7 @@ var LibraryPThread = { | |||
}, | |||
|
|||
// Stores the memory address that the main thread is waiting on, if any. | |||
_main_thread_futex_wait_address: '; if (ENVIRONMENT_IS_PTHREAD) __main_thread_futex_wait_address = PthreadWorkerInit.__main_thread_futex_wait_address; else PthreadWorkerInit.__main_thread_futex_wait_address = __main_thread_futex_wait_address = allocate(1, "i32*", ALLOC_STATIC)', | |||
_main_thread_futex_wait_address: '; if (ENVIRONMENT_IS_PTHREAD) __main_thread_futex_wait_address = PthreadWorkerInit.__main_thread_futex_wait_address; else PthreadWorkerInit.__main_thread_futex_wait_address = __main_thread_futex_wait_address = {{{ makeStaticAlloc(1) }}}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeStaticAlloc(4)
as well?
src/parseTools.js
Outdated
} | ||
|
||
function makeStaticString(string) { | ||
var len = string.length + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct size computation for an UTF-8 string - either need to use lengthBytesUTF8(string)
or if you want to conservatively allocate, use string.length*4+1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixing with lengthBytesUTF8 at compile time, to not waste any space here.
tests/test_core.py
Outdated
@@ -3790,7 +3790,7 @@ def test_dylink_jslib(self): | |||
def test_dylink_global_var_jslib(self): | |||
create_test_file('lib.js', r''' | |||
mergeInto(LibraryManager.library, { | |||
jslib_x: 'allocate(1, "i32*", ALLOC_STATIC)', | |||
jslib_x: '{{{ makeStaticAlloc(1) }}}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeStaticAlloc(4)
?
@@ -54,7 +54,7 @@ var LibraryPThread = { | |||
|
|||
#if PTHREADS_PROFILING | |||
createProfilerBlock: function(pthreadPtr) { | |||
var profilerBlock = (pthreadPtr == PThread.mainThreadBlock) ? allocate({{{ C_STRUCTS.thread_profiler_block.__size__ }}}, "i32*", ALLOC_STATIC) : _malloc({{{ C_STRUCTS.thread_profiler_block.__size__ }}}); | |||
var profilerBlock = (pthreadPtr == PThread.mainThreadBlock) ? {{{ makeStaticAlloc(C_STRUCTS.thread_profiler_block.__size__) }}} : _malloc({{{ C_STRUCTS.thread_profiler_block.__size__ }}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it looks like the above three code blocks have historically overallocated 4x the number of bytes needed, but this change fixes that)
Its been more than a year since this was removed in #7850.
Its been more than a year since this was removed in #7850.
Its been more than a year since this was removed in #7850.
First part of #7795: this removes the ability to do static allocations in JS when the app starts. That means no more
staticAlloc
,ALLOC_STATIC
, and thestaticSealed
logic to handle it. Instead, static allocations are done at compile time, usingmakeStaticAlloc
. The JS compiler informsemscripten.py
of those static allocations, and then they are added after static allocations from LLVM, basically.Benefits:
staticSealed
), etc. Seeapply_memory
inemscripten.py
for how simply memory layout is after this: we run the C compiler, then the JS compiler, then in that python code we can lay out static memory completely at compile time.x = staticAlloc(size)
tox = ptr
(hardcoded ptr numbers, computed at compile time). On hello world (with closure/-Os) this saves around 1%, more than I thought actually.Downsides:
{{{ makeStaticAlloc(size) }}}
which does the allocation and returns a ptr, at compile time. However, when JS did the call at runtime, if the code was not reached the allocation was not done. In this model, if the JS library is parsed and preprocessed, we allocate - which means thelibrary_pthreads.js
ones are not done by default, but thelibrary.js
ones are. As a result, we are wasting in some cases a little static space that is actually unused (maybe 100 bytes?). This is probably pretty negligible, but we could optimize it out later, at the cost of more complexity in the JS compiler - not sure if that would be worth it.Possible followups / stuff not done in this PR:
wasm-emscripten-finalize
etc. This would save passing it at runtime into the wasm.PR also contains a few minor test fixes, like the stack-overflow test had an incorrect part, but we never noticed til now where the new memory layout happens to make it actually cause a problem.