Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str() gives incorrect result after scoped object has been destructed #15

Open
andykras opened this issue Mar 25, 2021 · 2 comments
Open

Comments

@andykras
Copy link

Consider following example:

    rabbit::object obj2;
    {
        rabbit::array arr1;
        {
            rabbit::object obj1;
            obj1["answer"] = 42;
            auto obj1_str = obj1.str(); // {"answer":42}

            arr1.push_back(obj1);

            auto json = arr1.str(); // [{"answer":42}]
        }
        auto json = arr1.str(); // [{"ÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝ":"ÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝ"}]

        obj2.insert("array", arr1); // exception: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
    }
    auto json = obj2.str();

Expected behavior: str() is working after obj1 destruction.

This code works as expected:

    rabbit::object obj1;
    rabbit::array arr1;
    rabbit::object obj2;
    {
        {
            obj1["answer"] = 42;
            auto obj1_str = obj1.str(); // {"answer":42}

            arr1.push_back(obj1);

            auto json = arr1.str(); // [{"answer":42}]
        }
        auto json = arr1.str(); // [{"answer":42}]

        obj2.insert("array", arr1);
    }
    auto json = obj2.str(); // {"array":[{"answer":42}]}

Also you can pass allocator for temp objects:

    rabbit::object obj2;
    {
        rabbit::array arr1(obj2.get_allocator());
        {
            rabbit::object obj1(obj2.get_allocator());
            obj1["answer"] = 42;
            auto obj1_str = obj1.str(); // {"answer":42}

            arr1.push_back(obj1);

            auto json = arr1.str(); // [{"answer":42}]
            int check = 1;
        }
        auto json = arr1.str(); // [{"answer":42}]

        obj2.insert("array", arr1);
    }
    auto json = obj2.str(); // {"array":[{"answer":42}]}
@csteifel
Copy link
Collaborator

csteifel commented Mar 25, 2021

I can see what's going on here though I'm not sure if we actually want to change the behaviour to maintain performance.

Just to set things out before explaining: If you create a rabbit value/object/array and it is the root value/object/array we come up with a new allocator for it and keep it alongside the underlying rapidjson value, this way all allocating steps that you take on the value afterwards just work and you don't have to think about allocators. When the root value gets destructed we destroy the allocator and all memory that was allocated with it.

So breaking down the 3 cases you describe:

Case 1: you have 3 separate allocators for the 3 different "root" values, you may not be intending for them to be root values but because they were not created via methods on another rabbit value of some kind (eg operator[]) they are "root" values. When you add all of them up into obj2 as each one of them loses scope they deallocate their memory and because these are not POD json types (eg int/double) their values are essentially cleaned up and you get your problem.

Case 2: You have essentially the same thing happening as Case 1 you just happen to luck out with regards to the lifetime of the rabbit objects sticking around and so the allocators haven't been destructed yet.

Case 3: You are telling arr1 and obj1 to use the allocator of the object that will have the longest lifespan and as such the memory sticks around for as long as obj2 sticks around.

Case 3 is the ideal usage for performance reasons (which if you are interested in rapidjson and by extensioni rabbit, you are probably interested in performance) as far as I'm aware. The only alternative I can see that we could implement into rabbit would be to check if when calling insert/push_back/etc... we have two different allocators and if thats the case do a deep copy, which I'm not sure we would want to do. If we did that automatically for you it may not be obvious where there is a possibly large performance hit. For example if we did that for Case 1, your program would work as you seem to expect it to, but, you would have the following: obj1 allocates spaces for the object as you create it, when its added to arr1 we have to do a deep copy because the allocators are not the same, then when you add arr1 to obj2 we have to do another deep copy because the allocators are once again not the same. As you can imagine if we have lots and lots of data in these values that becomes extremely expensive quickly. In Case 3 however, you allocate for obj1, you just allocate a tiny bit more space for the array itself and then move those values into arr1, and then you allocate a tiny bit more space in obj2 to allow moving the arr1 value into obj2, no copying required.

@andykras
Copy link
Author

Thank you for detailed explanation! I got the idea.

Probably case 3 is the best deal from performance perspective, though I wouldn't mind for a deep copy in some cases. To me it's rather to have an option - performance vs convenience.

I think it's worth to mention in readme or even cover in the tests this scenario.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants