-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rust support #113
add rust support #113
Conversation
Also, I think it would make sense to move inline static size_t aligned_size(size_t alignment, size_t size)
{
...
} Should work. Then you can remove it from |
I do not quite understand why cxx17 features are not available even with |
CMakeLists.txt
Outdated
add_library(${name} ${type} ${ARGN}) | ||
target_link_libraries(${name} snmalloc_lib) | ||
if(MSVC) | ||
target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is not actually needed, but have it added anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it's not required for a static lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you drop this bit please, and I'll merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjp41 I went to beg by then. Sorry for the late reply.
So I don't know why, but disabling the line # Ensure that we do not link against C++ stdlib when compiling shims.
set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C) for MSVC, fixes the C++17 compilation. Changing to # Ensure that we do not link against C++ stdlib when compiling shims.
if(NOT MSVC)
set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)
endif() should fix your build issues at least. |
with this as an experiment: SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
// Client responsible for checking alignment is not zero
assert(alignment != 0);
// Client responsible for checking alignment is not above SUPERSLAB_SIZE
assert(alignment <= SUPERSLAB_SIZE);
// Client responsible for checking alignment is a power of two
assert(bits::next_pow2(alignment) == alignment);
size = bits::max(size, alignment);
snmalloc::sizeclass_t sc = size_to_sizeclass(size);
if (sc >= NUM_SIZECLASSES)
{
// large allocs are 16M aligned, which is maximum we guarantee
return 1;
}
size_t t = 2;
for (; sc < NUM_SIZECLASSES; sc++, t++)
{
size = sizeclass_to_size(sc);
if ((size & (~size + 1)) >= alignment)
{
return t;
}
}
// Give max alignment.
return t;
} and #include <iostream>
#include "src/snmalloc.h"
#include <cstdlib>
#define LIMIT 10000000
int main() {
size_t all = 0;
size_t max = 0;
srand(time(NULL));
for(size_t i = 0; i < LIMIT; ++i) {
size_t a = snmalloc::bits::next_pow2(1 + (size_t)rand() % (snmalloc::SUPERSLAB_SIZE));
size_t m = rand();
auto t = snmalloc::aligned_size(a, m);
all += t;
max = std::max(max, t);
}
std::cout << "ave: " << (double)all / (double)LIMIT << ", max: " << max << std::endl;
} I can get: ➜ cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5 So perhaps it is acceptable as the average case is quite good? |
Not sure I follow. Are you trying to work out: how much we round up the size to account for alignment? Should the following line subtract, m? auto t = snmalloc::aligned_size(a, m); Also, if auto t = snmalloc::aligned_size(a, m) - std::max(a,m); |
I changed the fake In most cases, the memory size should be smaller. Hence, I test again with size_t m = rand() % snmalloc::size_to_sizeclass(snmalloc::NUM_SIZECLASSES); ➜ cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.49993, max: 2
ave: 1.4999, max: 2
ave: 1.4999, max: 2
ave: 1.49986, max: 2
ave: 1.49986, max: 2
ave: 1.49981, max: 2
ave: 1.49981, max: 2
ave: 1.5001, max: 2
ave: 1.5001, max: 2
ave: 1.49982, max: 2 |
It is more the branches on the fast path that worry me. So looking at the disasm from Clang: 0000000000000000 <rust_alloc>:
0: 48 89 f8 mov %rdi,%rax
3: 64 48 8b 3c 25 00 00 mov %fs:0x0,%rdi
a: 00 00
c: 48 39 c6 cmp %rax,%rsi
f: 48 0f 46 f0 cmovbe %rax,%rsi
13: 48 8d 4e ff lea -0x1(%rsi),%rcx
17: 48 89 ca mov %rcx,%rdx
1a: 48 81 f9 ff ff 00 00 cmp $0xffff,%rcx
21: 77 13 ja 36 <rust_alloc+0x36>
23: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx
27: 48 8b 92 00 00 00 00 mov 0x0(%rdx),%rdx
2e: 48 83 fa 4a cmp $0x4a,%rdx
32: 76 3c jbe 70 <rust_alloc+0x70>
34: eb 69 jmp 9f <rust_alloc+0x9f>
36: 48 83 ca 20 or $0x20,%rdx
3a: f3 4c 0f bd c2 lzcnt %rdx,%r8
3f: 41 b9 3a 00 00 00 mov $0x3a,%r9d
45: 45 31 d2 xor %r10d,%r10d
48: 4d 29 c1 sub %r8,%r9
4b: 41 0f 95 c2 setne %r10b
4f: 44 89 ca mov %r9d,%edx
52: 44 29 d2 sub %r10d,%edx
55: 80 c2 04 add $0x4,%dl
58: c4 e2 eb f7 d1 shrx %rdx,%rcx,%rdx
5d: 83 e2 03 and $0x3,%edx
60: 4a 8d 14 8a lea (%rdx,%r9,4),%rdx
64: 48 83 fa 4a cmp $0x4a,%rdx
68: 77 35 ja 9f <rust_alloc+0x9f>
6a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
70: 89 d1 mov %edx,%ecx
72: 83 e1 7f and $0x7f,%ecx
75: 48 8b 34 cd 00 00 00 mov 0x0(,%rcx,8),%rsi
7c: 00
7d: c4 e2 f0 f3 de blsi %rsi,%rcx
82: 48 39 c1 cmp %rax,%rcx
85: 73 14 jae 9b <rust_alloc+0x9b>
87: 48 83 c2 01 add $0x1,%rdx
8b: 48 83 fa 4a cmp $0x4a,%rdx
8f: 76 df jbe 70 <rust_alloc+0x70>
91: be 00 00 00 01 mov $0x1000000,%esi
96: e9 00 00 00 00 jmpq 9b <rust_alloc+0x9b>
9b: 48 8d 4e ff lea -0x1(%rsi),%rcx
9f: 48 81 f9 ff ff 00 00 cmp $0xffff,%rcx
a6: 77 ee ja 96 <rust_alloc+0x96>
a8: 48 83 e1 f8 and $0xfffffffffffffff8,%rcx
ac: 48 8b b1 00 00 00 00 mov 0x0(%rcx),%rsi
b3: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
b7: 48 85 c0 test %rax,%rax
ba: 74 08 je c4 <rust_alloc+0xc4>
bc: 48 8b 08 mov (%rax),%rcx
bf: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
c3: c3 retq
c4: e9 00 00 00 00 jmpq c9 <rust_alloc+0xc9>
c9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) where as 0000000000003b60 <malloc>:
3b60: 48 89 fe mov %rdi,%rsi
3b63: 48 8b 05 66 44 22 00 mov 0x224466(%rip),%rax # 227fd0 <.got+0x8>
3b6a: 64 48 8b 38 mov %fs:(%rax),%rdi
3b6e: 48 8d 46 ff lea -0x1(%rsi),%rax
3b72: 48 3d ff ff 00 00 cmp $0xffff,%rax
3b78: 77 20 ja 3b9a <malloc+0x3a>
3b7a: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
3b7e: 48 8d 0d 8b 1e 01 00 lea 0x11e8b(%rip),%rcx # 15a10 <_ZN8snmallocL18sizeclass_metadataE>
3b85: 48 8b 34 08 mov (%rax,%rcx,1),%rsi
3b89: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
3b8d: 48 85 c0 test %rax,%rax
3b90: 74 0d je 3b9f <malloc+0x3f>
3b92: 48 8b 08 mov (%rax),%rcx
3b95: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
3b99: c3 retq
3b9a: e9 21 f2 ff ff jmpq 2dc0 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE15alloc_not_smallILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
3b9f: e9 7c f4 ff ff jmpq 3020 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE16small_alloc_slowILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
3ba4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
3bab: 00 00 00
3bae: 66 90 xchg %ax,%ax This looks quite a bit slower potentially at the moment. |
Okay, I have done a bit of maths, I think that we can simplify the whole piece of code to SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
// Client responsible for checking alignment is not zero
assert(alignment != 0);
// Client responsible for checking alignment is not above SUPERSLAB_SIZE
assert(alignment <= SUPERSLAB_SIZE);
// Client responsible for checking alignment is a power of two
assert(bits::next_pow2(alignment) == alignment);
return ((alignment - 1) | (size - 1)) + 1;
} I think this should interact correctly with the sizeclass calculation. It would be great to add a validation test that this does the correct thing. Something similar to
which should work except for 0. |
Cool! #include <iostream>
#include <random>
#include <snmalloc.h>
#include <test/setup.h>
NOINLINE
snmalloc::sizeclass_t size_to_sizeclass(size_t size)
{
return snmalloc::size_to_sizeclass(size);
}
NOINLINE
size_t sizeclass_to_size(snmalloc::sizeclass_t sc)
{
return snmalloc::sizeclass_to_size(sc);
}
NOINLINE
size_t aligned_size(size_t alignment, size_t size)
{
return snmalloc::aligned_size(alignment, size);
}
int main(int, char**)
{
setup();
bool failed = false;
std::random_device dev;
unsigned int seed = dev();
std::cout << "random seed: " << seed << std::endl;
std::uniform_int_distribution<size_t> dist(1);
std::mt19937 rng(seed);
for (snmalloc::sizeclass_t sz = 0; sz < snmalloc::NUM_SIZECLASSES; sz++)
{
// Separate printing for small and medium sizeclasses
if (sz == snmalloc::NUM_SMALL_CLASSES)
std::cout << std::endl;
size_t this_size = snmalloc::sizeclass_to_size(sz),
pre_size = (sz != 0) ? snmalloc::sizeclass_to_size(sz - 1) : 1;
for (size_t alignment = snmalloc::bits::next_pow2(pre_size);
alignment < this_size;
alignment = snmalloc::bits::next_pow2(alignment + 1))
{
std::cout << std::endl;
std::cout << "-- alignment: " << alignment << std::endl;
for (int i = 0; i < 100; ++i)
{
size_t size = dist(rng) % alignment;
std::cout << "---- size: " << size;
bool flag = sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) %
alignment !=
0;
std::cout << ((flag) ? " [FAILED]" : " [SUCCESS]") << std::endl;
failed |= flag;
}
}
}
if (failed)
abort();
}
|
@SchrodingerZhu, I would prefer something a bit more exhaustive. I think the following covers the core properties of the sizeclass calculation, and void test_align_size()
{
bool failed = false;
assert(aligned_size(128, 160) == 256);
for (size_t size = 1; size < sizeclass_to_size(NUM_SIZECLASSES - 1); size++)
{
size_t rsize = sizeclass_to_size(size_to_sizeclass(size));
if (rsize < size)
{
std::cout << "Size class rounding shrunk: " << size << " -> " << rsize
<< std::endl;
failed |= true;
}
auto lsb_rsize = rsize & -rsize;
auto lsb_size = size & -size;
if (lsb_rsize < lsb_size)
{
std::cout << "Original size more aligned than rounded size: " << size
<< " (" << lsb_size << ") -> " << rsize << " (" << lsb_rsize
<< ")" << std::endl;
failed |= true;
}
for (size_t alignment_bits = 0; alignment_bits < SUPERSLAB_BITS;
alignment_bits++)
{
auto alignment = (size_t)1 << alignment_bits;
auto asize = aligned_size(alignment, size);
if (asize < size)
{
std::cout << "Shrunk! Alignment: " << alignment << " Size: " << size
<< " ASize: " << asize << std::endl;
failed |= true;
}
if ((asize & (alignment - 1)) != 0)
{
std::cout << "Not aligned! Alignment: " << alignment
<< " Size: " << size << " ASize: " << asize << std::endl;
failed |= true;
}
}
}
if (failed)
abort();
} In particular, the sizeclass rounding, should never set lower bits, and the alignment rounding should |
The code gen for this change is really nice, clang gives: 0000000000000000 <rust_alloc>:
0: 48 89 f8 mov %rdi,%rax
3: 64 48 8b 3c 25 00 00 mov %fs:0x0,%rdi
a: 00 00
c: 48 83 c0 ff add $0xffffffffffffffff,%rax
10: 48 83 c6 ff add $0xffffffffffffffff,%rsi
14: 48 09 c6 or %rax,%rsi
17: 48 81 fe ff ff 00 00 cmp $0xffff,%rsi
1e: 77 1c ja 3c <rust_alloc+0x3c>
20: 48 83 e6 f8 and $0xfffffffffffffff8,%rsi
24: 48 8b b6 00 00 00 00 mov 0x0(%rsi),%rsi
2b: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
2f: 48 85 c0 test %rax,%rax
32: 74 11 je 45 <rust_alloc+0x45>
34: 48 8b 08 mov (%rax),%rcx
37: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
3b: c3 retq
3c: 48 83 c6 01 add $0x1,%rsi
40: e9 00 00 00 00 jmpq 45 <rust_alloc+0x45>
45: e9 00 00 00 00 jmpq 4a <rust_alloc+0x4a>
4a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) I think this will be pretty fast. |
Co-authored-by: mjp41 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a check for zero alignment in memalign
inside malloc.cc
as our libc tests expect align 0 to work.
Great, work. Thanks
CMakeLists.txt
Outdated
add_library(${name} ${type} ${ARGN}) | ||
target_link_libraries(${name} snmalloc_lib) | ||
if(MSVC) | ||
target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?
Are these requirements treated correctly in current method? |
I believe so. Well, as much as they were before. We don't meet the posix_memalign spec when alignment > SUPERSLAB_SIZE. I intend to fix this later. |
I noticed the codegen for dealloc where a size is supplied is not as good as it could be. I'll submit a separate PR refactoring that. |
Unless you want to add something more, I'm happy to merge this when it passes CI. |
I think it is ready then. |
This pr tracks the work of adding a static linking target for rust support.
My suggestion is that the merging should be considered after the discussions in #109 and rust-lang/rust#68381 are finished.