You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To avoid false sharing / bank conflict / cache trashing when multiple threads read and write data in the same cache line, an intermediate array is used with intermediate values padded so that they take different cache line. (See #139)
Tensor x: 1_000_000 float
|
|
V
intermediate reduction: 4 floats (for a quad-core CPU)
|
|
v
Final result: 1 float
Given that cache lines are all 64 Byte, to reduce memory consumption elements of type T should be spaced by n, n = min(64 / sizeof(T), 1) ==> 8 for float64/int64, 16 for float32/int32, 1 for an uint256 for example.
Unfortunately, sizeof(T) works for primitive types and since nim-lang/Nim#8445 for arrays of primitive types. However it is still pending nim-lang/Nim#5664 for Tensor of custom objects.
In the mean time, the intermediate array elements are arbitrarily spaced by 16 (maxItemsPerCacheLine) which waste lots of space when T is a tensor (size a couple hundred bytes).
Implemented in nim-lang/Nim#9356, note that nim-lang/Nim#9493 is an alternative by allowing each threads to keep a local variable and reduce the partial reduction in an #pragma omp critical section.
This would allow much better scaling on proc with more than 8 cores (the current reduction block limit).
To avoid false sharing / bank conflict / cache trashing when multiple threads read and write data in the same cache line, an intermediate array is used with intermediate values padded so that they take different cache line. (See #139)
Given that cache lines are all 64 Byte, to reduce memory consumption elements of type T should be spaced by n,
n = min(64 / sizeof(T), 1)
==> 8 for float64/int64, 16 for float32/int32, 1 for an uint256 for example.Unfortunately, sizeof(T) works for primitive types and since nim-lang/Nim#8445 for arrays of primitive types. However it is still pending nim-lang/Nim#5664 for Tensor of custom objects.
In the mean time, the intermediate array elements are arbitrarily spaced by 16 (
maxItemsPerCacheLine
) which waste lots of space when T is a tensor (size a couple hundred bytes).Arraymancer/src/tensor/backend/openmp.nim
Lines 73 to 92 in 2a3c406
The text was updated successfully, but these errors were encountered: