Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform JS static allocations at compile time #7850

Merged
merged 32 commits into from
Jan 14, 2019
Merged

Conversation

kripken
Copy link
Member

@kripken kripken commented Jan 10, 2019

First part of #7795: this removes the ability to do static allocations in JS when the app starts. That means no more staticAlloc, ALLOC_STATIC, and the staticSealed logic to handle it. Instead, static allocations are done at compile time, using makeStaticAlloc. The JS compiler informs emscripten.py of those static allocations, and then they are added after static allocations from LLVM, basically.

Benefits:

  • Remove a bunch of complexity (no more staticSealed), etc. See apply_memory in emscripten.py for how simply memory layout is after this: we run the C compiler, then the JS compiler, then in that python code we can lay out static memory completely at compile time.
  • Saves code size and startup work by replacing x = staticAlloc(size) to x = ptr (hardcoded ptr numbers, computed at compile time). On hello world (with closure/-Os) this saves around 1%, more than I thought actually.

Downsides:

  • The current model is very simple: JS library files do {{{ makeStaticAlloc(size) }}} which does the allocation and returns a ptr, at compile time. However, when JS did the call at runtime, if the code was not reached the allocation was not done. In this model, if the JS library is parsed and preprocessed, we allocate - which means the library_pthreads.js ones are not done by default, but the library.js ones are. As a result, we are wasting in some cases a little static space that is actually unused (maybe 100 bytes?). This is probably pretty negligible, but we could optimize it out later, at the cost of more complexity in the JS compiler - not sure if that would be worth it.

Possible followups / stuff not done in this PR:

  • No changes to dynamic allocation in JS when the app starts. Only static allocation is changed here.
  • We could hardcode DYNAMICTOP_PTR's location in the wasm (but not its value, not without changes to dynamic allocation, see previous point), say as a param to wasm-emscripten-finalize etc. This would save passing it at runtime into the wasm.

PR also contains a few minor test fixes, like the stack-overflow test had an incorrect part, but we never noticed til now where the new memory layout happens to make it actually cause a problem.

@kripken kripken requested review from juj and dschuff January 10, 2019 23:52
@@ -527,6 +527,15 @@ def is_int(x):
return False


def align_memory(addr):
return (addr + 15) & -16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find hex literals more readable for bitwise operations but YMMV

def apply_forwarded_data(forwarded_data):
forwarded_json = json.loads(forwarded_data)
# Be aware of JS static allocations
shared.Settings.STATIC_BUMP = forwarded_json['STATIC_BUMP']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously STATIC_BUMP is preexisting, but it does seem odd to be setting global settings (which usually just come via command-line flags or a certain narrow set of ways to configure) based on this forwarded data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually does this mean that STATIC_BUMP can now be demoted from a shared setting to something local to emscripten.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly odd, yeah, we use shared.Settings as a global location for compiler state. We do mark them in settings.js with comments saying some are just for internal use, like this one. But perhaps they could be in shared.Internals or such?

Specifically here, this can't be a local var to the file since we do want to use it elsewhere (e.g. when we write the metadata we write this - although I didn't update all those places yet in this PR, to keep it small).

emscripten.py Outdated
static_bump = shared.Settings.STATIC_BUMP
# * then the stack (up on fastcomp, down on upstream)
stack_low = align_memory(global_start + static_bump)
stack_high = stack_low + shared.Settings.TOTAL_STACK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack_high should be aligned too, right, at least for wasm backend? are there restrictions on TOTAL_STACK increments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll align that too.

I don't think we have any checks on the size of the stack, but we could have the compiler error if it's not a multiple if 16 I suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably at least have an assert, a misaligned stack would be an ABI violation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added that in ASSERTIONS mode at runtime, for the stack (and heap too).

if shared.Settings.SIDE_MODULE or shared.Settings.WASM:
pre = pre.replace('{{{ STATIC_BUMP }}}', str(staticbump))

pre = apply_memory(pre, metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the new logic still correct for sidemodules and/or !WASM? I don't see any mentions of those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical for wasm and !wasm.

In a side module we wouldn't have STACK_BASE etc., as the location is determined at runtime anyhow (by the dynamic loader). So maybe it's a little odd we compute those even though they are not written out. I'll add a comment.

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

@rianhunter
Copy link
Contributor

This should probably increment EMSCRIPTEN_ABI_MINOR (while EMSCRIPTEN_ABI_MAJOR is still 0).

@kripken
Copy link
Member Author

kripken commented Jan 11, 2019

@rianhunter I'm not sure this changes the ABI, yet? If we stop passing DYNAMICTOP_PTR from JS to wasm then that would be a change, but this PR doesn't go that far yet. Or is there something else here?

@rianhunter
Copy link
Contributor

@kripken I looked briefly but I thought this might change the exports available to the hosted wasm. If all the same memory-layout exports are available then ignore my comment. Otherwise I understand this only changes how the JS runtime allocates static memory.

@kripken
Copy link
Member Author

kripken commented Jan 11, 2019

Ok, yeah, then this should be fine as-is. In fact I verified earlier that the wasm emitted is identical to before this PR - the change is only on the JS side.

Thanks for checking - good to have more eyes on this stuff.

@@ -529,7 +528,7 @@ var LibraryPThread = {
},

_num_logical_cores__deps: ['emscripten_force_num_logical_cores'],
_num_logical_cores: '; if (ENVIRONMENT_IS_PTHREAD) __num_logical_cores = PthreadWorkerInit.__num_logical_cores; else { PthreadWorkerInit.__num_logical_cores = __num_logical_cores = allocate(1, "i32*", ALLOC_STATIC); HEAPU32[__num_logical_cores>>2] = navigator["hardwareConcurrency"] || ' + {{{ PTHREAD_HINT_NUM_CORES }}} + '; }',
_num_logical_cores: '; if (ENVIRONMENT_IS_PTHREAD) __num_logical_cores = PthreadWorkerInit.__num_logical_cores; else { PthreadWorkerInit.__num_logical_cores = __num_logical_cores = {{{ makeStaticAlloc(1) }}}; HEAPU32[__num_logical_cores>>2] = navigator["hardwareConcurrency"] || ' + {{{ PTHREAD_HINT_NUM_CORES }}} + '; }',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be makeStaticAlloc(4)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, fixing these. (Technically the allocations are stack aligned, so 1 means at least 4 anyhow, but better not to rely on that.)

@@ -1031,7 +1030,7 @@ var LibraryPThread = {
},

// Stores the memory address that the main thread is waiting on, if any.
_main_thread_futex_wait_address: '; if (ENVIRONMENT_IS_PTHREAD) __main_thread_futex_wait_address = PthreadWorkerInit.__main_thread_futex_wait_address; else PthreadWorkerInit.__main_thread_futex_wait_address = __main_thread_futex_wait_address = allocate(1, "i32*", ALLOC_STATIC)',
_main_thread_futex_wait_address: '; if (ENVIRONMENT_IS_PTHREAD) __main_thread_futex_wait_address = PthreadWorkerInit.__main_thread_futex_wait_address; else PthreadWorkerInit.__main_thread_futex_wait_address = __main_thread_futex_wait_address = {{{ makeStaticAlloc(1) }}}',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeStaticAlloc(4) as well?

}

function makeStaticString(string) {
var len = string.length + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct size computation for an UTF-8 string - either need to use lengthBytesUTF8(string) or if you want to conservatively allocate, use string.length*4+1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixing with lengthBytesUTF8 at compile time, to not waste any space here.

@@ -3790,7 +3790,7 @@ def test_dylink_jslib(self):
def test_dylink_global_var_jslib(self):
create_test_file('lib.js', r'''
mergeInto(LibraryManager.library, {
jslib_x: 'allocate(1, "i32*", ALLOC_STATIC)',
jslib_x: '{{{ makeStaticAlloc(1) }}}',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeStaticAlloc(4)?

@@ -54,7 +54,7 @@ var LibraryPThread = {

#if PTHREADS_PROFILING
createProfilerBlock: function(pthreadPtr) {
var profilerBlock = (pthreadPtr == PThread.mainThreadBlock) ? allocate({{{ C_STRUCTS.thread_profiler_block.__size__ }}}, "i32*", ALLOC_STATIC) : _malloc({{{ C_STRUCTS.thread_profiler_block.__size__ }}});
var profilerBlock = (pthreadPtr == PThread.mainThreadBlock) ? {{{ makeStaticAlloc(C_STRUCTS.thread_profiler_block.__size__) }}} : _malloc({{{ C_STRUCTS.thread_profiler_block.__size__ }}});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it looks like the above three code blocks have historically overallocated 4x the number of bytes needed, but this change fixes that)

@kripken kripken merged commit d9529ae into incoming Jan 14, 2019
@kripken kripken deleted the simplify-memory branch January 14, 2019 18:19
src/settings.js Show resolved Hide resolved
sbc100 added a commit that referenced this pull request May 21, 2020
Its been more than a year since this was removed in #7850.
sbc100 added a commit that referenced this pull request May 21, 2020
Its been more than a year since this was removed in #7850.
sbc100 added a commit that referenced this pull request May 22, 2020
Its been more than a year since this was removed in #7850.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants