Skip to content

Latest commit

Β 

History

History
654 lines (486 loc) Β· 24.9 KB

README.md

File metadata and controls

654 lines (486 loc) Β· 24.9 KB

mem++

codecov build benchmarking

C++ memory allocator with smart features, such as garbage collection, and heap compacting.

πŸ”’ Current version

Current library version: 0.3.5

Warning: the API is not stable!

πŸ”¬ Features

  • Garbage Collection
  • Fast memory allocations (using bump allocators techniques)
  • Fast free algorithms
  • Advanced compacting algorithms, which are used to reduce memory fragmentation and improve cache locality: heuristic-layouting

⚠ Supported systems / limitations

  • The library is still in the development. It's definitely not recommended to use in production!
  • All Unix-like systems (where it is possible to use mmap)
  • g++ or clang++ compilers
  • Currently supports only single-threaded applications
  • References/Pointers to GcPtr's are always invalidated after GarbageCollect()

❓ Usage

  1. Install latest build systems: apt install cmake g++ clang
  2. Clone just the library: git clone https://github.com/m4drat/memplusplus/
  3. Clone the library (with tests support): git clone --recurse-submodules=./libraries/googletest https://github.com/m4drat/memplusplus/
  4. If you want to run benchmarks, take a look at this: link

How to use the library as a dependency (external project)

  1. run cmake from the root directory (memplusplus):

    cmake \
        -S . \
        -B build \
        -DCMAKE_BUILD_TYPE=Release \
        -DMPP_BUILD_SHARED_LIBS=ON \
  2. compile and install:

    sudo cmake \
        --build build \
        --target install \
        -j 8
  3. After the library is installed, in your CMakeLists.txt use find_package, to actually use the library:

    cmake_minimum_required(VERSION 3.13)
    project(<YOUR_PROJECT_NAME>)
    
    find_package(mpp 0.3.5 REQUIRED)
    
    add_executable(<YOUR_PROJECT_NAME> <YOUR_SOURCES>)
    target_link_libraries(<YOUR_PROJECT_NAME> PRIVATE mpp::mpp)
  4. After that you will be able to include library headers in your sources like that: #include <mpplib/mpp.hpp>

How to use the library internally

  1. just copy the whole directory with libmemplusplus to your project root directory.

  2. In your project's cmake file add this:

    add_subdirectory(libmemplusplus)
    target_link_libraries(<YOUR_PROJECT_NAME> PRIVATE mpp::mpp)
  3. After that you will be able to include library headers in your sources like that: #include "mpplib/mpp.hpp"

πŸ“€ Build options

Global options:

  • MPP_ENABLE_COVERAGE - build with code coverage support
  • MPP_BUILD_FUZZER - build fuzzer project (will build the library with sanitizers)
  • MPP_BUILD_EXAMPLE - build example project
  • MPP_BUILD_TESTS - build tests
  • MPP_BUILD_DOCS - build documentation

Library options:

  • MPP_BUILD_SHARED_LIBS - build shared or static libraries
  • MPP_FULL_DEBUG - build in full debug mode (adds extended security checks in debug build)
  • MPP_SECURE - build in secure mode with additional security features
  • MPP_PROFILE - enable profiling instrumentation
  • MPP_SANITIZERS - add sanitizers to the build
  • MPP_VALGRIND - adds support for valgrind
  • MPP_COLOUR_DEBUG_OUTPUT - Add colors to debug output
  • MPP_STATS - Add statistics instrumentation.
  • MPP_ENABLE_LOGGING - Enable logging (even in release mode)

πŸ”³ Environment variables

  • MPP_DUMP_OBJECTS_GRAPH=1 / MPP_DUMP_OBJECTS_GRAPH=2 - dump objects graph to file objects.dot, while performing CollectGarbage() (only possible in debug mode)

  • MPP_SHOW_STATISTICS=1 - display statistics after program termination (should be built with MPP_STATS set to ON)

  • MPP_LOG_LEVEL - set log level (default: FATAL). Supported values: TRACE, DEBUG, INFO, WARNING, ERROR, FATAL, DISABLED

πŸ“š Examples

  • Automatic memory management

    #include <mpplib/mpp.hpp>
    
    ...
    
    // create smart pointer, that will automatically deallocate object, when needed
    mpp::MakeShared<Object> object = mpp::MakeShared<Object>(<constructor params>);
    
    // create array of Objects, that will be automatically deallocated, when goes out of scope
    mpp::MakeShared<Object[]> objects = mpp::MakeSharedN<Object>(<array size>, <constructor params>);
    
    ...
    
    // collect all garbage + compact memory (should be called manually)
    mpp::CollectGarbage();
  • Manual memory management - deprecated

    #include <mpplib/mpp.hpp>
    
    ...
    
    // will call constructor automatically, like new
    Object* obj = mpp::Allocate<Object>(<constructor params>);
    // create raw pointer (behaves like malloc)
    void* ptr = mpp::Allocate(128);
    
    ...
    
    // will call destructor automatically, like delete
    mpp::Deallocate(obj);
    // only deallocates raw pointer (behaves like free)
    mpp::Deallocate(ptr);

πŸ’» Debugging/profiling library

Memplusplus provides different debug-like features, such as data visualizers, profilers, and statistics collectors.

  • Backtrace functionality for MPP_ASSERT:

    Add these flags to your project's CMakeLists.txt, if you want to see a backtrace when MPP_ASSERT fails:

    # For GCC
    target_compile_options(${PROJECT_NAME} PRIVATE -g -O0)
    target_link_libraries(${PROJECT_NAME} PRIVATE mpp::mpp -export-dynamic)
    
    # For Clang
    target_compile_options(${PROJECT_NAME} PRIVATE -g -O0)
    target_link_libraries(${PROJECT_NAME} PRIVATE mpp::mpp -Wl,--export-dynamic)
  • Address Sanitizer support (ASAN):

    Enable MPP_SANITIZERS before building the library. Then compile your project with -fsanitize=address flag. As a result, you will get asan-compatible build which will help you to debug any memory management issues.

    Example buggy code:

    void* p1 = mpp::Allocate(128);
    mpp::Deallocate(p1);
    *(uint32_t*)p1 = 0x13371337; // Use-after-free write

    Example ASAN report for this code:

    =================================================================
    ==26738==ERROR: AddressSanitizer: use-after-poison on address 0x19b297400010 at pc 0x557f3c1aaea3 bp 0x7ffcb838c440 sp 0x7ffcb838c438
    WRITE of size 4 at 0x19b297400010 thread T0
        #0 0x557f3c1aaea2 in main /home/madrat/memplusplus/build/../example_project/src/main.cpp:137:20
        #1 0x7f16c401a0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
        #2 0x557f3c0e7d2d in _start (/home/madrat/memplusplus/build/example_project/example_project-d+0x45d2d) (BuildId: cca1c1f77a22aeae021502831561465b63cc9f19)
    
    Address 0x19b297400010 is a wild pointer inside of access range of size 0x000000000004.
    SUMMARY: AddressSanitizer: use-after-poison /home/madrat/memplusplus/build/../example_project/src/main.cpp:137:20 in main
    
    ......
  • Valgrind support:

    Enable MPP_VALGRIND before building the library. Then compile your project with -g flag. As a result, you will get valgrind-compatible build which will help you to track-down any memory leaks. You also might want to disable MPP_SANITIZERS/MPP_BUILD_FUZZER and build using g++.

    Example buggy code:

    void leak() {
        void* p1 = mpp::Allocate(128);
        // mpp::Deallocate(p1);
    }

    Example valgrind report for this code:

    ==21304== HEAP SUMMARY:
    ==21304==     in use at exit: 160 bytes in 1 blocks
    ==21304==   total heap usage: 24 allocs, 23 frees, 82,582 bytes allocated
    ==21304==
    ==21304== Searching for pointers to 1 not-freed blocks
    ==21304== Checked 150,048 bytes
    ==21304==
    ==21304== 160 bytes in 1 blocks are definitely lost in loss record 1 of 1
    ==21304==    at 0x151812: mpp::MemoryManager::Allocate(unsigned long) (memory_manager.cpp:181)
    ==21304==    by 0x152A33: mpp::Allocate(unsigned long) (memory_manager.cpp:352)
    ==21304==    by 0x150238: leak() (main.cpp:131)
    ==21304==    by 0x15024C: main (main.cpp:144)
    ==21304==
    ==21304== LEAK SUMMARY:
    ==21304==    definitely lost: 160 bytes in 1 blocks
    ==21304==    indirectly lost: 0 bytes in 0 blocks
    ==21304==      possibly lost: 0 bytes in 0 blocks
    ==21304==    still reachable: 0 bytes in 0 blocks
    ==21304==         suppressed: 0 bytes in 0 blocks
    ==21304==
    ==21304== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
    
  • Data visualizers:

    In Debug builds, you can dump .dot representation of the ChunkTreap and GcGraph (only using specific environment variable). Later you can render these .dot files using dot from graphviz.

    ChunkTreap visualizer

    How to dump ChunkTreap in code (don't forget to redirect program output to file treap.dot):

    // 1. Get the arena you want to dump the tree for
    std::unique_ptr<Arena>& tmpArena = mpp::GetArenaList()[0];
    
    // 2. Extract ChunkTreap from this arena, and dump it
    // (in this example everything was dumped to std::cout).
    tmpArena->GetFreedChunks().GenerateGraphvizLayout(std::cout);

    How to generate an .svg file with dot:

    dot -Tsvg treap.dot -o treap.svg

    After that you will get .svg file with your freed chunks treap:
    treap

    GcGraph visualizer

    To visualization the GcGraph you have to:

    1. Build the library in debug mode, and set MPP_DUMP_OBJECTS_GRAPH=1 (basic visualization) or MPP_DUMP_OBJECTS_GRAPH=2 (advanced visualization) before running the target app. On each GC cycle, it will dump the objects graph to file "objects_cycle<current cycle number>.dot".

    2. Then generate an .svg file using dot as follows:

      dot -Tsvg objects_cycle<N>.dot -o objects_cycle<N>.svg

    For example, for this code (it creates a linked list and tree):

    // Linked List node
    struct Node {
        uint32_t data;
        SharedGcPtr<Node> prev;
        SharedGcPtr<Node> next;
    
        Node(uint32_t t_data, SharedGcPtr<Node> t_p, SharedGcPtr<Node> t_n)
            : data{t_data}, prev{t_p}, next{t_n}
        {
        }
    };
    
    // Create Linked List
    SharedGcPtr<Node> n1 = MakeShared<Node>(1, nullptr, nullptr);
    SharedGcPtr<Node> n2 = MakeShared<Node>(2, nullptr, nullptr);
    SharedGcPtr<Node> n3 = MakeShared<Node>(3, nullptr, nullptr);
    SharedGcPtr<Node> n4 = MakeShared<Node>(4, nullptr, nullptr);
    
    n1->prev = nullptr;
    n1->next = n2;
    
    n2->prev = n1;
    n2->next = n3;
    
    n3->prev = n2;
    n3->next = n4;
    
    n4->prev = n3;
    n4->next = nullptr;
    
    // Tree node
    struct TreeNode {
        uint32_t data;
        SharedGcPtr<TreeNode> left;
        SharedGcPtr<TreeNode> right;
        SharedGcPtr<TreeNode> up;
    
        TreeNode(uint32_t t_data, SharedGcPtr<TreeNode> t_left, SharedGcPtr<TreeNode> t_right, SharedGcPtr<TreeNode> t_up)
            : data{t_data}, left{t_left}, right{t_right}, up{t_up}
        {
        }
    };
    
    // Create a random tree
    SharedGcPtr<TreeNode> root = MakeShared<TreeNode>(0, nullptr, nullptr, nullptr);
    SharedGcPtr<TreeNode> treeNode1 = MakeShared<TreeNode>(1, nullptr, nullptr, nullptr);
    SharedGcPtr<TreeNode> treeNode2 = MakeShared<TreeNode>(2, nullptr, nullptr, nullptr);
    SharedGcPtr<TreeNode> treeNode3 = MakeShared<TreeNode>(3, nullptr, nullptr, nullptr);
    SharedGcPtr<TreeNode> treeNode4 = MakeShared<TreeNode>(4, nullptr, nullptr, nullptr);
    SharedGcPtr<TreeNode> treeNode5 = MakeShared<TreeNode>(5, nullptr, nullptr, nullptr);
    
    root->up = nullptr;
    root->left = treeNode1;
    root->right = treeNode2;
    
    treeNode1->up = root;
    treeNode1->left = nullptr;
    treeNode1->right = nullptr;
    
    treeNode2->up = root;
    treeNode2->left = treeNode3;
    treeNode2->right = treeNode4;
    
    treeNode3->up = treeNode2;
    treeNode3->left = nullptr;
    treeNode3->right = nullptr;
    
    treeNode4->up = treeNode2;
    treeNode4->left = treeNode5;
    treeNode4->right = nullptr;
    
    treeNode5->up = treeNode4;
    treeNode5->left = nullptr;
    treeNode5->right = nullptr;
    
    CollectGarbage();

    Executing this code with MPP_DUMP_OBJECTS_GRAPH=1 will generate this graph: objects-simple

    Executing this code with MPP_DUMP_OBJECTS_GRAPH=2 will generate this graph: objects-advanced

  • On-demand objects graph visualizer

    To create object graph at the arbitrary point in your program, add this code:

    mpp::g_memoryManager->GetGC().DumpCurrentObjectsGraph();

    You might want to specify dump type (by default it's ADVANCED) and the filename to save .dot graph to (default: "objects-graph.dot").

  • Profiler

    Using the compilation flag MPP_PROFILE, you can build the library with the profiler enabled. To do so:

    1. Set the MPP_PROFILE before building the library
    2. Run the application until it exits
    3. In your current directory, find the file called mpplib-profiling.json
    4. Open this file using the chrome tracing tool. To do so, open chrome and navigate to the chrome://tracing page.

    Example output:

    profiler

  • Statistics collector

    This feature allows you to dump memory allocator statistics. To build the library with this feature, add MPP_STATS while building the project. You can use this feature in 2 ways:

    In runtime using c++ API:

    Example code to dump statistics:

    utils::Statistics::GetInstance().DumpStats(std::cout, true, false, false) << std::endl;

    Example output:

    +============= STATISTICS DUMP START =============+
    MPP: MIN_CHUNK_SIZE     : 32 bytes
    MPP: CHUNK_HEADER_SIZE  : 16 bytes
    MPP: DEFAULT_ARENA_SIZE : 32.000 MB
    MPP: PAGE_SIZE          : 4.000 KB
    ~~~~~~~~~~~~~~~~ All active arenas ~~~~~~~~~~~~~~~~
    -------------- Arena: 0x1ffaf50 --------------
    MPP - Total arena size           : 32.000 MB
    MPP - Currently Allocated Space  : 960.000 Bytes
    MPP - Amount of free memory      : 31.999 MB (99.9971% of total arena size)
    MPP - Amount of allocated memory : 960.000 Bytes (0.0029% of total arena size)
    MPP - Allocated memory == 0      : false
    MPP - Memory used for metadata   : 160.000 Bytes (16.6667% of currently allocated space)
    MPP - TopChunk                   : [0x7f329dd803c0](96, 33553472|InUse:1|PrevInUse:1)
    MPP - Arena begin pointer        : 0x7f329dd80000
    MPP - Arena end pointer          : 0x7f329fd80000
    MPP - Freed chunks nodes (0)
    MPP - Chunks in use (10)
    
    ~~~~~~~~~~~~~~~~~~ General stats ~~~~~~~~~~~~~~~~~~
    -----------------  Arena: 1  -----------------
    MPP - Total amount of allocated memory inside arena : 960.000 Bytes
    MPP - Total amount of freed memory inside arena     : 0 bytes
    MPP - total freed == total allocated                : false
    MPP - Biggest allocation size                       : 96.000 Bytes
    MPP - Smallest allocation size                      : 96.000 Bytes
    MPP - Full size of arena                            : 32.000 MB
    MPP - Arena was allocated for big chunk             : false
    MPP - Arena was allocated by GC                     : false
    
    -----------------  Arena: 2  -----------------
    MPP - Total amount of allocated memory inside arena : 960.000 Bytes
    MPP - Total amount of freed memory inside arena     : 0 bytes
    MPP - total freed == total allocated                : false
    MPP - Biggest allocation size                       : "No allocations have been made yet"
    MPP - Smallest allocation size                      : "No allocations have been made yet"
    MPP - Full size of arena                            : 32.000 MB
    MPP - Arena was allocated for big chunk             : false
    MPP - Arena was allocated by GC                     : true
    
    ~~~~~~~~~~~~~ Garbage Collector stats ~~~~~~~~~~~~~
    -----------------  Cycle: 1  -----------------
    GC - Time wasted inside   GC::Collect() : 2.0000 ms
    GC - Memory cleaned after GC::Collect() : 0 bytes
    GC - Total size of active objects       : 960.000 Bytes
    
    +============== STATISTICS DUMP END ==============+
    

    After program termination, setting environment variable MPP_SHOW_STATISTICS=1. Important note. Runtime stats cannot be dumped using this approach.

    Example output:

    +============= STATISTICS DUMP START =============+
    MPP: MIN_CHUNK_SIZE     : 32 bytes
    MPP: CHUNK_HEADER_SIZE  : 16 bytes
    MPP: DEFAULT_ARENA_SIZE : 32.000 MB
    MPP: PAGE_SIZE          : 4.000 KB
    ~~~~~~~~~~~~~~~~~~ General stats ~~~~~~~~~~~~~~~~~~
    -----------------  Arena: 1  -----------------
    MPP - Total amount of allocated memory inside arena : 960.000 Bytes
    MPP - Total amount of freed memory inside arena     : 0 bytes
    MPP - total freed == total allocated                : false
    MPP - Biggest allocation size                       : 96.000 Bytes
    MPP - Smallest allocation size                      : 96.000 Bytes
    MPP - Full size of arena                            : 32.000 MB
    MPP - Arena was allocated for big chunk             : false
    MPP - Arena was allocated by GC                     : false
    
    -----------------  Arena: 2  -----------------
    MPP - Total amount of allocated memory inside arena : 960.000 Bytes
    MPP - Total amount of freed memory inside arena     : 0 bytes
    MPP - total freed == total allocated                : false
    MPP - Biggest allocation size                       : "No allocations have been made yet"
    MPP - Smallest allocation size                      : "No allocations have been made yet"
    MPP - Full size of arena                            : 32.000 MB
    MPP - Arena was allocated for big chunk             : false
    MPP - Arena was allocated by GC                     : true
    
    ~~~~~~~~~~~~~ Garbage Collector stats ~~~~~~~~~~~~~
    -----------------  Cycle: 1  -----------------
    GC - Time wasted inside   GC::Collect() : 2 ms
    GC - Memory cleaned after GC::Collect() : 0 bytes
    GC - Total size of active objects       : 960.000 Bytes
    
    +============== STATISTICS DUMP END ==============+
    
  • Allocate/Deallocate hooks

    This feature allows you to install your hooks on Allocate and Deallocate. Example:

    // Lambda as an Allocate hook. Important, all hooks should be static!
    static std::function<void*(std::size_t)> allocHook = [&](std::size_t t_AllocSize) {
        mpp::SetAllocateHook(nullptr);
        void* ptr = mpp::Allocate(t_AllocSize);
        mpp::SetAllocateHook(allocHook);
        std::cout << "[mpp] Allocate(" << t_AllocSize << ") -> ";
        mpp::Chunk::DumpChunk(std::cout, mpp::Chunk::GetHeaderPtr(ptr)) << std::endl;
        return ptr;
    };
    // Set actual Allocate hook.
    mpp::SetAllocateHook(allocHook);
    
    // Lambda as an Deallocate hook. Important, all hooks should be static!
    static std::function<bool(void*)> deallocHook = [&](void* t_Addr) {
        mpp::SetDeallocateHook(nullptr);
        bool res = mpp::Deallocate(t_Addr);
        mpp::SetDeallocateHook(deallocHook);
        std::cout << "[mpp] Deallocate(" << t_Addr << ") -> " << res << std::endl;
        return res;
    };
    // Set actual Deallocate hook.
    mpp::SetDeallocateHook(deallocHook);

πŸ”₯ Heuristic layouting

Right now algorithm consists of 2 parts:

1. Find connected components

Divide the graph of all objects into connected components. Each component is a set of objects that are reachable from each other. By dividing the graph into components we can solve the layout problem for each component separately, and what is more important, we can do it in parallel. Also, this step can be called optimal layout in some sense, because it is guaranteed that all objects in the component will be located in the same arena (contiguous memory block).

2. Layout each component heuristically

The next step is to layout objects from components in a way that allows you to access them in a cache-friendly way. This is achieved by using a heuristic layout algorithm. Currently, it's a proof of concept that works only for objects that have a single pointer field (to be precise, it should work for any data structure, that can be represented as a singly linked list). The algorithm is based on the following assumptions

  • If we have a singly linked list, then most likely the next element should be located right after the current one. If it is not (take a look at the image below), then it's a good idea to move it there. By doing so we can reduce cache misses and improve performance.

Benchmark results for the algorithm are presented here: Benchmark results.

We can see that the algorithm is able to improve memory access time by 4-8 times (depending on the size of the list). Of course, it should be noted that the benchmark shows the corner case when the list is fully randomized in memory. In real life, the list might be layouted in a way that is already cache-friendly, so the algorithm will not be able to improve the performance.

Closing words. The algorithm is definitely not perfect, but it's a good start. It can be improved in many ways, for example, by adding more heuristics. Also, it can be used as a base for a more complex layouting algorithm, that will be able to layout any data structure

πŸš€ Performance comparisons

  1. Standalone benchmarks: memplusplus-benchmarks.
  2. Continuous benchmarking results: continuous-benchmarking

🧾 Documentation

  • Install doxygen, sphinx, breathe

  • Configure:

    cmake \
        -S . \
        -B build \
        -DCMAKE_BUILD_TYPE=Release \
        -DMPP_BUILD_FUZZER=OFF \
        -DMPP_BUILD_EXAMPLE=OFF \
        -DMPP_BUILD_TESTS=OFF \
        -DBUILD_SHARED_LIBS=OFF \
        -DMPP_BUILD_DOCS=ON
  • Build docs:

    cmake \
        --build build \
        -j 8
  • Now docs can be found in "./build/docs/doxygen/html" or "./build/docs/sphinx"

πŸ§ͺ Tests

  • Configure tests:

    cmake \
        -S . \
        -B build \
        -DCMAKE_BUILD_TYPE=Release \
        -DBUILD_SHARED_LIBS=OFF \
        -DMPP_BUILD_TESTS=ON
  • Build tests:

    cmake \
        --build build \
        --target unit_tests \
        -j 8
  • Run tests:

    cd build && ctest

    Some tests are disabled because they require you to turn off ASLR (for example GcGraphTest.GenerateGraphvizLayout). This is considered unreliable, but if you still want to run them, you can do it like this:

    cd build && setarch `uname -m` -R ./build/tests/unit_tests --gtest_also_run_disabled_tests

🚁 clang-format and clang-tidy

  • install clang-tidy and clang-format

  • Configure:

    cmake \
        -S . \
        -B build \
        -DCMAKE_BUILD_TYPE=Release
  • Use clang-format:

    cmake \
        --build build \
        --target clang-format
  • Use clang-tidy:

    cmake \
        --build build \
        --target clang-tidy