-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce padding in the MPSC Channel + introduce a no count flag #93
Comments
Upstream nim-lang/Nim#13122 {.align.} pragma is not applied if there is a generic field. |
…ution bug nim-lang/Nim#8677 (static sandwich? nim-lang/Nim#11225)
Seems like the second part, avoiding counting for pledges will require duplicating the channel. Using a static bool parameter does not work. There is a static early resolution bug nim-lang/Nim#8677 or static sandwich nim-lang/Nim#11225 that prevents the following weave/weave/channels/channels_mpsc_unbounded_batch.nim Lines 14 to 71 in ba6f1f4
This is the error while compiled depth-first search even after some mixin and checking that the symbol is declared:
weave/weave/memory/memory_pools.nim Lines 512 to 548 in ba6f1f4
Alternatively, having a dereference operator for pointer types, or having typeof(default(T)[]) working in nested context (nim-lang/Nim#13048) would not require a macro that has early symbol resolution issues. |
…ution bug nim-lang/Nim#8677 (static sandwich? nim-lang/Nim#11225)
Padding
The MPSC channel padding is very memory hungry:
weave/weave/channels/channels_mpsc_unbounded_batch.nim
Lines 13 to 36 in 5d90172
WV_CacheLinePadding is at 2x cachelinesize = 128, which means 384 bytes are taken. The value of 128 was chosen because Intel CPU prefetches cache lines by pairs. Facebook's Folly also did in-depth experiments to come up with this value.
This was OK when the MPSC channel was used in a fixed manner for incoming steal requests and incoming freed memory from remote thread however the dataflow parallelism protocol described in #92 (comment) requires allocating ephemeral MPSC channel.
If an application relies exclusively on dataflow graph parallelism, it will incur huge memory overhead as the memory pool only allocated 256 bytes.
As a compromise (hopefully) between cache invalidation prevention and memory usage, the data could be reorganized the following way:
Padding is now 64 bytes for a total of 128 + sizeof(T) bytes taken, well within the memory pool block size of 256 bytes, it can be made intrusive to another datastructure with 256 - 128 - sizeof(T) bytes of metadata to save on allocations.
In terms of cache conflict, front/back are still 2x cache-line apart and there was cache invalidation on count anyway.
The ordering producers field then consumer field assumes that:
This may or may not be true
Count
Count is needed for steal requests to approximate (give a lower-bound) the number of thieves in steal adaptative mode.
It is needed for remote freed memory for the memory pool to give a lower bound to the number of memory blocks that can be collected back in the memory arena.
However in the dataflow parallelism protocol it is not needed to keep track of the enqueued tasks count. Similarly for the non-adaptative steal it is not needed to keep track of the number of steal requests.
Atomic increment/decrement are very taxing as they require flushing the caches. The MPSC channel should have an optional count.
Note that in the case of an optional count, padding between back and front should be 2x cachelines.
Status
The text was updated successfully, but these errors were encountered: