You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you need to allocate dynamic memory in C, you use malloc() and free().
The API is very old, and while you might want to switch to a different implementation,
be it jemalloc, tcmalloc, or mimalloc,
they mostly copy the interface.
It makes sense that they do that – they want to be a mostly drop-in replacement,
but it’s still unfortunate because malloc() and free() are a bad API for memory allocation.
Let’s talk why.
The C allocation functions
malloc() and free() have a very simple interface:
malloc() takes a size and returns a pointer to the allocated memory block of that size,
free() takes a previously allocated pointer and frees it.
void*malloc(size_tsize);voidfree(void*ptr);
Then there is also calloc(), which allocates memory that has been zeroed.
For whatever reason, it has a slightly different interface:
void*calloc(size_tnum,size_tsize);
Logically, it allocates num objects of size each, i.e. num * size bytes.
It also does an overflow check for you, because why not.
Finally, there is realloc():
void*realloc(void*ptr,size_tnew_size);
It attempts to grow or shrink a memory block to the new_size.
This may or may not copy things around in memory, and returns the new starting address, or ptr unchanged if it was left in-place.
Notably, malloc() can be implemented in terms of realloc():
Plain old malloc() does not allow specifying a custom alignment for the aligned memory.
It returns memory that is suitable aligned for any object with fundamental alignment.
Want to allocate a SIMD vector or something aligned at a page boundary? It gets tricky:
constexprautopage_size=4096;void*allocate_page_boundary(std::size_tsize){// Allocate extra space to guarantee alignment.
automemory=std::malloc(page_size+size);// Align the starting address.
autoaddress=reinterpret_cast<std::uintptr_t>(memory);automisaligned=address&(page_size-1);returnstatic_cast<unsignedchar*>(memory)+page_size-misaligned;}
Of course, you can’t free the resulting address with std::free(), since it may point somewhere inside the allocated memory block.
You have to remember the original address as well.
For example, you can store it in the alignment padding directly preceding the aligned address.
At least C11 has added aligned_alloc(), which then became part of C++17:
void*aligned_alloc(size_talignment,size_tsize);
This doesn’t help you with realloc() or calloc(), however.
malloc() doesn’t directly go ahead and ask the OS for memory, that would be too slow.
Instead there are various caches for memory blocks of varying sizes.
For example, a program often allocates 8 byte elements, so it might make sense to keep a list of 8 byte blocks.
When you ask for 8 bytes, it simply returns one from the list:
Of course this requires that the allocator knows the size of a memory block given its pointer.
The only way to do that is to store some metadata about the allocator somewhere.
This could be a global hash table that maps pointers to sizes, or extra metadata store directly in front of the address, as discussed in the over aligned example.
In either case, it means that asking for 8 bytes of memory will not actually allocate 8 bytes of memory, but additional metadata as well.
This is especially wasteful because the user usually knows how big the memory block is it’s currently attempting to free!
template<typenameT>classdynamic_array{T*ptr;std::size_tsize;public:explicitdynamic_array(T*ptr,std::size_tsize):ptr(static_cast<T*>(std::malloc(size*sizeof(T)))){}~dynamic_array(){…// call destructors
// I know that I'm freeing size * sizeof(T) bytes!
std::free(ptr);}};
If free() took the size of the memory block as extra parameter, the implementation wouldn’t need to add extra metadata just for that.
Of course an implementation might choose to allocate extra metadata for performance reasons anyway.
Currently it’s forced to do that.
Consider the implementation of std::vector<T>::push_back().
When there is no capacity to store an additional element it needs to reserve bigger memory and move everything over.
To keep an amortized O(1) complexity, it grows the new memory by some factor:
Suppose the implementation of std::malloc uses a cache of recently freed memory blocks.
When attempting to allocate N blocks, it searches that cache for a block that is at least N bytes big.
If it finds one (either the first one that fits, or the smallest one that fits, or …), returns it.
In that case, the returned memory block might have space for more than N bytes!
This means that we ask for a memory with a capacity for e.g. 14 elements, but get a memory block with a capacity for 16 elements instead.
But we don’t know that!
We treat the block as if it has space for 14 elements only and trigger another unnecessary reallocation for the 15th element.
It would be great if std::malloc() could return how big the allocated memory block actually is, so we can leverage any extra space we might have gotten “for free”.
realloc() attempts to grow a memory block in-place.
If that’s not possible, it allocates a new one and copies the existing contents over.
This is done as-if by std::memcpy().
This automatic copy is problematic.
For starters, it can’t be used with C++ objects that might want to invoke a move constructor.
It also doesn’t work with C objects that have self-referential pointers such as a buffer containing a circular linked list.
This is a shame as realloc()’s ability to grow a memory block in-place is really useful and not achievable in any other way.
Sadly, it can’t be used with e.g. std::vector.
A better interface
Let me propose a new interface that doesn’t have those shortcomings.
It consists of three functions allocate(), deallocate(), and try_expand().
allocate() is the replacement for std::malloc().
It’s goal is to allocate a memory block for a given size and alignment.
Crucially it returns not only a pointer to the allocated memory, but also the total size that is available for the user.
structmemory_block{void*ptr;size_tsize;};/// On success `result.ptr != NULL` and `result.size >= size`.
/// On failure, `result.ptr == NULL` and `result.size == 0`.
memory_blockallocate(size_tsize,size_talignment);
That way, we pass all information the caller has anyway to the allocator.
It might makes sense to relax the requirements for deallocate() a bit, so that the size of the block doesn’t need to be the exact value returned by allocate(), but some other value between the requested size and the actual size.
For example, a std::vector<int> might request 12 bytes (3 ints), but gets 17 bytes (4 ints plus one spare).
It would then call deallocate() with a size of 16 bytes, as it knows the capacity in number of ints.
Finally, try_expand() is a replacement for realloc().
Crucially, it will only attempt to expand the block in-place, and fail if that’s not possible.
/// If the block can be expanded in-place to `new_size`, returns true.
/// Otherwise, returns `false`.
booltry_expand(memory_blockblock,size_tnew_size);
This solves problem #4 by making the caller responsible for copying the allocated memory if necessary.
C++ Solutions
C++’s operator new and operator delete, have inherited the same problems:
void*operatornew(std::size_tsize);voidoperatordelete(void*ptr);// not pictured: dozens of other overloads
To its credit, it keeps making improvements.
C++17: Aligned allocation
C++17 adds an overload that accepts std::align_val_t, which allows specification of a custom alignment.
The alignment is passed as std::align_val_t because a user might have overloaded operator new(std::size_t, std::size_t) already.
C++17: Sized deallocation
A user can actually define its own implementation of operator new/delete to control all memory allocations.
This is then invoked by the compiler to allocate memory.
Since C++17, the compiler will also attempt to invoke the following overloads:
As the compiler knows the size of the objects its deallocating, it can pass that information to the function.
If you’re writing a custom allocator implementation, you don’t need to worry about metadata.
Of course, this doesn’t help the default implementation using std::malloc and std::free.
C++23: Size feedback in std::allocator
C++23 has adopted P0401, which adds a new function to std::allocator:
The function does what it says: it allocates memory for at least n objects and returns the actual size of the memory available.
This behaves like my proposed allocate() function.
The language side with changes for operator new as proposed by P0901 is still in the standardization process, and will hopefully come in C++26.
Conclusion
A good API requests all information it needs (duh) and returns as much information it can provide (law of useful return).
malloc() and free() don’t follow those principles, which make them less useful as they could be.
malloc() and free() are a bad API
https://ift.tt/ok4paTK
malloc() and free() are a bad API
If you need to allocate dynamic memory in C, you use
malloc()
andfree()
. The API is very old, and while you might want to switch to a different implementation, be it jemalloc, tcmalloc, or mimalloc, they mostly copy the interface. It makes sense that they do that – they want to be a mostly drop-in replacement, but it’s still unfortunate becausemalloc()
andfree()
are a bad API for memory allocation.Let’s talk why.
The C allocation functions
malloc()
andfree()
have a very simple interface:malloc()
takes a size and returns a pointer to the allocated memory block of that size,free()
takes a previously allocated pointer and frees it.Then there is also
calloc()
, which allocates memory that has been zeroed. For whatever reason, it has a slightly different interface:Logically, it allocates
num
objects ofsize
each, i.e.num * size
bytes. It also does an overflow check for you, because why not.Finally, there is
realloc()
:It attempts to grow or shrink a memory block to the
new_size
. This may or may not copy things around in memory, and returns the new starting address, orptr
unchanged if it was left in-place. Notably,malloc()
can be implemented in terms ofrealloc()
:Seems straightforward enough, what’s the issue?
Problem #1: Alignment
Plain old
malloc()
does not allow specifying a custom alignment for the aligned memory. It returns memory that is suitable aligned for any object with fundamental alignment.Want to allocate a SIMD vector or something aligned at a page boundary? It gets tricky:
Of course, you can’t free the resulting address with
std::free()
, since it may point somewhere inside the allocated memory block. You have to remember the original address as well.At least C11 has added
aligned_alloc()
, which then became part of C++17:This doesn’t help you with
realloc()
orcalloc()
, however.malloc()
doesn’t directly go ahead and ask the OS for memory, that would be too slow. Instead there are various caches for memory blocks of varying sizes.For example, a program often allocates 8 byte elements, so it might make sense to keep a list of 8 byte blocks. When you ask for 8 bytes, it simply returns one from the list:
Then when we free an 8 byte memory block, it is added to the list instead:
Of course this requires that the allocator knows the size of a memory block given its pointer. The only way to do that is to store some metadata about the allocator somewhere. This could be a global hash table that maps pointers to sizes, or extra metadata store directly in front of the address, as discussed in the over aligned example. In either case, it means that asking for 8 bytes of memory will not actually allocate 8 bytes of memory, but additional metadata as well.
This is especially wasteful because the user usually knows how big the memory block is it’s currently attempting to free!
If
free()
took the size of the memory block as extra parameter, the implementation wouldn’t need to add extra metadata just for that.Problem #3: Wasting space
Consider the implementation of
std::vector<T>::push_back()
. When there is no capacity to store an additional element it needs to reserve bigger memory and move everything over. To keep an amortized O(1) complexity, it grows the new memory by some factor:This works, but can waste memory.
Suppose the implementation of
std::malloc
uses a cache of recently freed memory blocks. When attempting to allocateN
blocks, it searches that cache for a block that is at leastN
bytes big. If it finds one (either the first one that fits, or the smallest one that fits, or …), returns it. In that case, the returned memory block might have space for more thanN
bytes!This means that we ask for a memory with a capacity for e.g. 14 elements, but get a memory block with a capacity for 16 elements instead. But we don’t know that! We treat the block as if it has space for 14 elements only and trigger another unnecessary reallocation for the 15th element.
It would be great if
std::malloc()
could return how big the allocated memory block actually is, so we can leverage any extra space we might have gotten “for free”.Problem #4:
realloc()
realloc()
attempts to grow a memory block in-place. If that’s not possible, it allocates a new one and copies the existing contents over. This is done as-if bystd::memcpy()
.This automatic copy is problematic.
For starters, it can’t be used with C++ objects that might want to invoke a move constructor. It also doesn’t work with C objects that have self-referential pointers such as a buffer containing a circular linked list.
This is a shame as
realloc()
’s ability to grow a memory block in-place is really useful and not achievable in any other way. Sadly, it can’t be used with e.g.std::vector
.A better interface
Let me propose a new interface that doesn’t have those shortcomings. It consists of three functions
allocate()
,deallocate()
, andtry_expand()
.allocate()
is the replacement forstd::malloc()
. It’s goal is to allocate a memory block for a given size and alignment. Crucially it returns not only a pointer to the allocated memory, but also the total size that is available for the user.This takes care of problem #1 and #3.
deallocate()
is a replacement forstd::free()
. It takes amemory_block
as well, in addition to the alignment that was used to request this block:That way, we pass all information the caller has anyway to the allocator.
Finally,
try_expand()
is a replacement forrealloc()
. Crucially, it will only attempt to expand the block in-place, and fail if that’s not possible.This solves problem #4 by making the caller responsible for copying the allocated memory if necessary.
C++ Solutions
C++’s
operator new
andoperator delete
, have inherited the same problems:To its credit, it keeps making improvements.
C++17: Aligned allocation
C++17 adds an overload that accepts
std::align_val_t
, which allows specification of a custom alignment.C++17: Sized deallocation
A user can actually define its own implementation of
operator new
/delete
to control all memory allocations. This is then invoked by the compiler to allocate memory. Since C++17, the compiler will also attempt to invoke the following overloads:As the compiler knows the size of the objects its deallocating, it can pass that information to the function. If you’re writing a custom allocator implementation, you don’t need to worry about metadata.
Of course, this doesn’t help the default implementation using
std::malloc
andstd::free
.C++23: Size feedback in
std::allocator
C++23 has adopted P0401, which adds a new function to
std::allocator
:The function does what it says: it allocates memory for at least n objects and returns the actual size of the memory available. This behaves like my proposed
allocate()
function.The language side with changes for
operator new
as proposed by P0901 is still in the standardization process, and will hopefully come in C++26.Conclusion
A good API requests all information it needs (duh) and returns as much information it can provide (law of useful return).
malloc()
andfree()
don’t follow those principles, which make them less useful as they could be.It’s great to see that C++23 has finally fixed most of those shortcomings, at least on the library side. Of course, modern languages like Rust don’t make any of the mistakes in the first place.
C++ enthusiast. I write libraries.
via foonathan::blog()
July 15, 2024 at 03:45PM
The text was updated successfully, but these errors were encountered: