Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rust support #113

Merged
merged 13 commits into from
Jan 23, 2020
Merged

add rust support #113

merged 13 commits into from
Jan 23, 2020

Conversation

SchrodingerZhu
Copy link
Collaborator

This pr tracks the work of adding a static linking target for rust support.

My suggestion is that the merging should be considered after the discussions in #109 and rust-lang/rust#68381 are finished.

@SchrodingerZhu SchrodingerZhu requested a review from mjp41 January 20, 2020 13:04
@msftclas
Copy link

msftclas commented Jan 20, 2020

CLA assistant check
All CLA requirements met.

CMakeLists.txt Outdated Show resolved Hide resolved
@mjp41
Copy link
Member

mjp41 commented Jan 20, 2020

Also, I think it would make sense to move aligned_size into src/mem/sizeclass.h, then we don't need the code duplication:

  inline static size_t aligned_size(size_t alignment, size_t size)
  {
   ...
  }

Should work. Then you can remove it from malloc.cc and rust.cc.

@SchrodingerZhu
Copy link
Collaborator Author

I do not quite understand why cxx17 features are not available even with /std:c++17 set.

CMakeLists.txt Outdated
add_library(${name} ${type} ${ARGN})
target_link_libraries(${name} snmalloc_lib)
if(MSVC)
target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is not actually needed, but have it added anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure it's not required for a static lib

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you drop this bit please, and I'll merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjp41 I went to beg by then. Sorry for the late reply.

@mjp41
Copy link
Member

mjp41 commented Jan 21, 2020

So I don't know why, but disabling the line

    # Ensure that we do not link against C++ stdlib when compiling shims.
    set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)

for MSVC, fixes the C++17 compilation. Changing to

    # Ensure that we do not link against C++ stdlib when compiling shims.
    if(NOT MSVC)
      set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)
    endif()

should fix your build issues at least.

src/mem/sizeclass.h Outdated Show resolved Hide resolved
@SchrodingerZhu
Copy link
Collaborator Author

with this as an experiment:

  SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
  {
    // Client responsible for checking alignment is not zero
    assert(alignment != 0);
    // Client responsible for checking alignment is not above SUPERSLAB_SIZE
    assert(alignment <= SUPERSLAB_SIZE);
    // Client responsible for checking alignment is a power of two
    assert(bits::next_pow2(alignment) == alignment);

    size = bits::max(size, alignment);
    snmalloc::sizeclass_t sc = size_to_sizeclass(size);
    if (sc >= NUM_SIZECLASSES)
    {
      // large allocs are 16M aligned, which is maximum we guarantee
      return 1;
    }
    size_t t = 2;
    for (; sc < NUM_SIZECLASSES; sc++, t++)
    {
      size = sizeclass_to_size(sc);
      if ((size & (~size + 1)) >= alignment)
      {
        return t;
      }
    }
    // Give max alignment.
    return t;
  }

and

#include <iostream>
#include "src/snmalloc.h"
#include <cstdlib>
#define LIMIT 10000000
int main() {
    size_t all = 0;
    size_t max = 0;
    srand(time(NULL));
    for(size_t i = 0; i < LIMIT; ++i) {
        size_t a = snmalloc::bits::next_pow2(1 + (size_t)rand() % (snmalloc::SUPERSLAB_SIZE));
        size_t m = rand();
        auto t = snmalloc::aligned_size(a, m);
        all += t;
        max = std::max(max, t);
    }
    std::cout << "ave: " << (double)all / (double)LIMIT << ", max: " << max << std::endl;
}

I can get:

➜  cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5

So perhaps it is acceptable as the average case is quite good?

@mjp41
Copy link
Member

mjp41 commented Jan 21, 2020

Not sure I follow. Are you trying to work out: how much we round up the size to account for alignment?

Should the following line subtract, m?

auto t = snmalloc::aligned_size(a, m);

Also, if a is large and m is small, then you can't be aligned, and not smaller than alignment, so perhaps

auto t = snmalloc::aligned_size(a, m) - std::max(a,m);

@SchrodingerZhu
Copy link
Collaborator Author

SchrodingerZhu commented Jan 21, 2020

I changed the fake aligned_size function above to return the number of trials to get the aligned size.
The average times of trials are shown above.

In most cases, the memory size should be smaller. Hence, I test again with

size_t m = rand() % snmalloc::size_to_sizeclass(snmalloc::NUM_SIZECLASSES);
➜  cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.49993, max: 2
ave: 1.4999, max: 2
ave: 1.4999, max: 2
ave: 1.49986, max: 2
ave: 1.49986, max: 2
ave: 1.49981, max: 2
ave: 1.49981, max: 2
ave: 1.5001, max: 2
ave: 1.5001, max: 2
ave: 1.49982, max: 2

@mjp41
Copy link
Member

mjp41 commented Jan 21, 2020

It is more the branches on the fast path that worry me. So looking at the disasm from Clang:

0000000000000000 <rust_alloc>:
   0:	48 89 f8             	mov    %rdi,%rax
   3:	64 48 8b 3c 25 00 00 	mov    %fs:0x0,%rdi
   a:	00 00 
   c:	48 39 c6             	cmp    %rax,%rsi
   f:	48 0f 46 f0          	cmovbe %rax,%rsi
  13:	48 8d 4e ff          	lea    -0x1(%rsi),%rcx
  17:	48 89 ca             	mov    %rcx,%rdx
  1a:	48 81 f9 ff ff 00 00 	cmp    $0xffff,%rcx
  21:	77 13                	ja     36 <rust_alloc+0x36>
  23:	48 83 e2 f8          	and    $0xfffffffffffffff8,%rdx
  27:	48 8b 92 00 00 00 00 	mov    0x0(%rdx),%rdx
  2e:	48 83 fa 4a          	cmp    $0x4a,%rdx
  32:	76 3c                	jbe    70 <rust_alloc+0x70>
  34:	eb 69                	jmp    9f <rust_alloc+0x9f>
  36:	48 83 ca 20          	or     $0x20,%rdx
  3a:	f3 4c 0f bd c2       	lzcnt  %rdx,%r8
  3f:	41 b9 3a 00 00 00    	mov    $0x3a,%r9d
  45:	45 31 d2             	xor    %r10d,%r10d
  48:	4d 29 c1             	sub    %r8,%r9
  4b:	41 0f 95 c2          	setne  %r10b
  4f:	44 89 ca             	mov    %r9d,%edx
  52:	44 29 d2             	sub    %r10d,%edx
  55:	80 c2 04             	add    $0x4,%dl
  58:	c4 e2 eb f7 d1       	shrx   %rdx,%rcx,%rdx
  5d:	83 e2 03             	and    $0x3,%edx
  60:	4a 8d 14 8a          	lea    (%rdx,%r9,4),%rdx
  64:	48 83 fa 4a          	cmp    $0x4a,%rdx
  68:	77 35                	ja     9f <rust_alloc+0x9f>
  6a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  70:	89 d1                	mov    %edx,%ecx
  72:	83 e1 7f             	and    $0x7f,%ecx
  75:	48 8b 34 cd 00 00 00 	mov    0x0(,%rcx,8),%rsi
  7c:	00 
  7d:	c4 e2 f0 f3 de       	blsi   %rsi,%rcx
  82:	48 39 c1             	cmp    %rax,%rcx
  85:	73 14                	jae    9b <rust_alloc+0x9b>
  87:	48 83 c2 01          	add    $0x1,%rdx
  8b:	48 83 fa 4a          	cmp    $0x4a,%rdx
  8f:	76 df                	jbe    70 <rust_alloc+0x70>
  91:	be 00 00 00 01       	mov    $0x1000000,%esi
  96:	e9 00 00 00 00       	jmpq   9b <rust_alloc+0x9b>
  9b:	48 8d 4e ff          	lea    -0x1(%rsi),%rcx
  9f:	48 81 f9 ff ff 00 00 	cmp    $0xffff,%rcx
  a6:	77 ee                	ja     96 <rust_alloc+0x96>
  a8:	48 83 e1 f8          	and    $0xfffffffffffffff8,%rcx
  ac:	48 8b b1 00 00 00 00 	mov    0x0(%rcx),%rsi
  b3:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
  b7:	48 85 c0             	test   %rax,%rax
  ba:	74 08                	je     c4 <rust_alloc+0xc4>
  bc:	48 8b 08             	mov    (%rax),%rcx
  bf:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
  c3:	c3                   	retq   
  c4:	e9 00 00 00 00       	jmpq   c9 <rust_alloc+0xc9>
  c9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)

where as

0000000000003b60 <malloc>:
    3b60:	48 89 fe             	mov    %rdi,%rsi
    3b63:	48 8b 05 66 44 22 00 	mov    0x224466(%rip),%rax        # 227fd0 <.got+0x8>
    3b6a:	64 48 8b 38          	mov    %fs:(%rax),%rdi
    3b6e:	48 8d 46 ff          	lea    -0x1(%rsi),%rax
    3b72:	48 3d ff ff 00 00    	cmp    $0xffff,%rax
    3b78:	77 20                	ja     3b9a <malloc+0x3a>
    3b7a:	48 83 e0 f8          	and    $0xfffffffffffffff8,%rax
    3b7e:	48 8d 0d 8b 1e 01 00 	lea    0x11e8b(%rip),%rcx        # 15a10 <_ZN8snmallocL18sizeclass_metadataE>
    3b85:	48 8b 34 08          	mov    (%rax,%rcx,1),%rsi
    3b89:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
    3b8d:	48 85 c0             	test   %rax,%rax
    3b90:	74 0d                	je     3b9f <malloc+0x3f>
    3b92:	48 8b 08             	mov    (%rax),%rcx
    3b95:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
    3b99:	c3                   	retq   
    3b9a:	e9 21 f2 ff ff       	jmpq   2dc0 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE15alloc_not_smallILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
    3b9f:	e9 7c f4 ff ff       	jmpq   3020 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE16small_alloc_slowILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
    3ba4:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
    3bab:	00 00 00 
    3bae:	66 90                	xchg   %ax,%ax

This looks quite a bit slower potentially at the moment.

@mjp41
Copy link
Member

mjp41 commented Jan 21, 2020

Okay, I have done a bit of maths, I think that we can simplify the whole piece of code to

SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
    // Client responsible for checking alignment is not zero
    assert(alignment != 0);
    // Client responsible for checking alignment is not above SUPERSLAB_SIZE
    assert(alignment <= SUPERSLAB_SIZE);
    // Client responsible for checking alignment is a power of two
    assert(bits::next_pow2(alignment) == alignment);

    return ((alignment - 1) | (size - 1)) + 1;
}

I think this should interact correctly with the sizeclass calculation. It would be great to add a validation test that this does the correct thing. Something similar to
https://github.com/microsoft/snmalloc/blob/master/src/test/func/sizeclass/sizeclass.cc
Where we scan the ranges and check the maths works on every CI push. Something like

   sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) % alignment == 0

which should work except for 0.

@SchrodingerZhu
Copy link
Collaborator Author

Cool!
Do you mean sth like this:

#include <iostream>
#include <random>
#include <snmalloc.h>
#include <test/setup.h>
NOINLINE
snmalloc::sizeclass_t size_to_sizeclass(size_t size)
{
  return snmalloc::size_to_sizeclass(size);
}

NOINLINE
size_t sizeclass_to_size(snmalloc::sizeclass_t sc)
{
  return snmalloc::sizeclass_to_size(sc);
}

NOINLINE
size_t aligned_size(size_t alignment, size_t size)
{
  return snmalloc::aligned_size(alignment, size);
}

int main(int, char**)
{
  setup();

  bool failed = false;

  std::random_device dev;
  unsigned int seed = dev();
  std::cout << "random seed: " << seed << std::endl;

  std::uniform_int_distribution<size_t> dist(1);
  std::mt19937 rng(seed);

  for (snmalloc::sizeclass_t sz = 0; sz < snmalloc::NUM_SIZECLASSES; sz++)
  {
    // Separate printing for small and medium sizeclasses
    if (sz == snmalloc::NUM_SMALL_CLASSES)
      std::cout << std::endl;

    size_t this_size = snmalloc::sizeclass_to_size(sz),
           pre_size = (sz != 0) ? snmalloc::sizeclass_to_size(sz - 1) : 1;

    for (size_t alignment = snmalloc::bits::next_pow2(pre_size);
         alignment < this_size;
         alignment = snmalloc::bits::next_pow2(alignment + 1))
    {
      std::cout << std::endl;
      std::cout << "-- alignment: " << alignment << std::endl;
      for (int i = 0; i < 100; ++i)
      {
        size_t size = dist(rng) % alignment;
        std::cout << "---- size: " << size;
        bool flag = sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) %
            alignment !=
            0;
        std::cout << ((flag) ? " [FAILED]" : " [SUCCESS]") << std::endl;
        failed |= flag;
      }
    }
  }

  if (failed)
    abort();
}

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

@SchrodingerZhu, I would prefer something a bit more exhaustive. I think the following covers the core properties of the sizeclass calculation, and aligned_size, that mean this is correct.

void test_align_size()
{
  bool failed = false;

  assert(aligned_size(128, 160) == 256);

  for (size_t size = 1; size < sizeclass_to_size(NUM_SIZECLASSES - 1); size++)
  {
    size_t rsize = sizeclass_to_size(size_to_sizeclass(size));

    if (rsize < size)
    {
      std::cout << "Size class rounding shrunk: " << size << " -> " << rsize
                << std::endl;
      failed |= true;
    }

    auto lsb_rsize = rsize & -rsize;
    auto lsb_size = size & -size;
    if (lsb_rsize < lsb_size)
    {
      std::cout << "Original size more aligned than rounded size: " << size
                << " (" << lsb_size << ") -> " << rsize << " (" << lsb_rsize
                << ")" << std::endl;
      failed |= true;
    }

    for (size_t alignment_bits = 0; alignment_bits < SUPERSLAB_BITS;
         alignment_bits++)
    {
      auto alignment = (size_t)1 << alignment_bits;
      auto asize = aligned_size(alignment, size);

      if (asize < size)
      {
        std::cout << "Shrunk! Alignment: " << alignment << " Size: " << size
                  << " ASize: " << asize << std::endl;
        failed |= true;
      }

      if ((asize & (alignment - 1)) != 0)
      {
        std::cout << "Not aligned! Alignment: " << alignment
                  << " Size: " << size << " ASize: " << asize << std::endl;
        failed |= true;
      }
    }
  }

  if (failed)
    abort();
}

In particular, the sizeclass rounding, should never set lower bits, and the alignment rounding should
also never set lower bits. As the layout of the size classes ensures every allocation is aligned upto its LSB, then we should be good.

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

The code gen for this change is really nice, clang gives:

0000000000000000 <rust_alloc>:
   0:	48 89 f8             	mov    %rdi,%rax
   3:	64 48 8b 3c 25 00 00 	mov    %fs:0x0,%rdi
   a:	00 00 
   c:	48 83 c0 ff          	add    $0xffffffffffffffff,%rax
  10:	48 83 c6 ff          	add    $0xffffffffffffffff,%rsi
  14:	48 09 c6             	or     %rax,%rsi
  17:	48 81 fe ff ff 00 00 	cmp    $0xffff,%rsi
  1e:	77 1c                	ja     3c <rust_alloc+0x3c>
  20:	48 83 e6 f8          	and    $0xfffffffffffffff8,%rsi
  24:	48 8b b6 00 00 00 00 	mov    0x0(%rsi),%rsi
  2b:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
  2f:	48 85 c0             	test   %rax,%rax
  32:	74 11                	je     45 <rust_alloc+0x45>
  34:	48 8b 08             	mov    (%rax),%rcx
  37:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
  3b:	c3                   	retq   
  3c:	48 83 c6 01          	add    $0x1,%rsi
  40:	e9 00 00 00 00       	jmpq   45 <rust_alloc+0x45>
  45:	e9 00 00 00 00       	jmpq   4a <rust_alloc+0x4a>
  4a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

I think this will be pretty fast.

Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a check for zero alignment in memalign inside malloc.cc as our libc tests expect align 0 to work.

Great, work. Thanks

CMakeLists.txt Outdated
add_library(${name} ${type} ${ARGN})
target_link_libraries(${name} snmalloc_lib)
if(MSVC)
target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?

src/override/rust.cc Outdated Show resolved Hide resolved
@SchrodingerZhu
Copy link
Collaborator Author

The function posix_memalign() allocates size bytes and places the address of the allocated memory > in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a > > power of two and a multiple of sizeof(void *). If size is 0, then posix_memalign() returns either NULL, > or a unique pointer value that can later be successfully passed to free(3).

The obsolete function memalign() allocates size bytes and returns a pointer to the allocated
memory. The memory address will be a multiple of alignment, which must be a power of two.

Are these requirements treated correctly in current method?

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

Are these requirements treated correctly in current method?

I believe so. Well, as much as they were before. We don't meet the posix_memalign spec when alignment > SUPERSLAB_SIZE. I intend to fix this later.

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

I noticed the codegen for dealloc where a size is supplied is not as good as it could be. I'll submit a separate PR refactoring that.

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

Unless you want to add something more, I'm happy to merge this when it passes CI.

@SchrodingerZhu
Copy link
Collaborator Author

I think it is ready then.

@mjp41 mjp41 merged commit 8304ded into microsoft:master Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants