Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wiping data after use #152

Closed
alexeikh-promon opened this issue Feb 2, 2015 · 18 comments
Closed

Wiping data after use #152

alexeikh-promon opened this issue Feb 2, 2015 · 18 comments

Comments

@alexeikh-promon
Copy link

I want to parse sensitive data with JsonCpp and am interested in wiping the parsed data in memory after its use. I am willing to develop a patch for it myself. If I submit such a patch, will it be merged?

If yes, is there any suggestion on how to implement such wiping?

I can think of several ways to implement wiping:

  1. As I am interested, first of all, in wiping strings, I can introduce a type Json::string and use it everywhere in the library (except APIs, of course) instead of std::string. By default, Json::string will be typedefed to just std::string. If some #define is active, Json::string will be a type derived from std::string that will wipe all the string characters in the destructor.

  2. I can introduce some wiping allocator and supply it to every std::string and another std:: container object being created. It seems like a more generic approach, but then it's easy to forget to supply an allocator when writing new code. And the patch will be quite large in this case.

  3. Your way, dear library developers?

@cdunn2001
Copy link
Contributor

Personally, I prefer to use instance-allocators, as opposed to the global allocators supported by old STL. An instance allocator can be completely reclaimed -- and wiped -- after the code using it has completed. It also leads to excellent data-locality. I always have an uphill battle explaining the benefits to people, even though to me the cost is zero if you are a rigorous programmer. It seems like you are someone who understands.

However, it's not a trivial change in jsoncpp. I plan to do it someday for the efficiency of the internals (with security as a free side-effect) but not soon.

I see the value (pardon the pun) of a wipe method on Value, but it's non-trivial. Resetting all the std::string data to "" is easy (since std::string is really a std::vector<char> and thus does not free memory when the size shrinks), but dictionary keys are problematic. If you alter the key, a new hashtable entry is created, so the old key will be reclaimed by the memory manager. So I think you would need a new implementation of the dictionary. That's a lot of work.

What are your thoughts?

@jacobsa
Copy link
Contributor

jacobsa commented Feb 3, 2015

@cdunn2001: Can you point me at some documentation on instance allocators?

@cdunn2001
Copy link
Contributor

@jacobsa: Hmmm. I should write something up in my blog. I guess we're talking about multiple ideas at once:

  • Stateful instances of memory allocators (as opposed to global allocators, or stateless allocators created on-the-fly to refer to global data).
  • Memory "arenas", aka "regions".
    • That link refers to a fairly elementary book on memory management, Memory as a Programming Concept in C and C++, by Franek F, available as pdf. "Chapter 9: Linked Data Structures" has some interesting ideas on using arenas for serialization.
  • Memory deallocation.
    • Here is where things get interesting. I eschew destructors for arena-based objects.

Here is my order of preference for memory models:

  1. Use stack memory whenever possible.
  2. Use arenas for large or unlimited, destructorless data within per-request function stacks.
  3. Use new/delete with const unique_ptr for injected dependencies, typically created by abstract factories.
  4. Use shared_ptr only where absolutely necessary. (The Google C++ Style Guide reiterates this.)

Arena data require a Java-esque style of coding, but with a per-request configuration threaded down the call-stack. Here is how to set things up. Key ideas:

  • Arena allocations exist for the lifetime of a request.
    • Instead of std::string and reference-counting, we simply let slices point into our own arena.
  • All memory is reclaimed after each request.
    • Zero fragmentation.
    • Trivially secure wiping.
    • Excellent data locality.
  • Deserialized objects must not have destructors.
    • They will use placement new: new (arena) MyStruct(...);

Drawbacks:

  • valgrind cannot be used for Arena data.
  • Peak memory during a request is not minimal.
    • But unlike GC, memory does not continue to grow, and there is no GC pause.
  • Streaming is a little more complicated.

I generally avoid STL and operator new. Like I said, I would use stack-based allocation (and a mostly side-effect-free coding style) for almost everything within a request, but Arenas are particularly good for serialization/deserialization.

I would generate protobuf to use Arenas too. But actually, msgpack is smarter than Google's protobuf:

  1. msgpack is more clear about numbers of bytes to read (IMHO).
  2. msgpack does not obfuscate integers. I can generally read them with hexdump, e.g.
  3. protobuf effectively encrypts the payload, if you do not have the .proto file.
  4. msgpack is basically one-to-one with JSON.

So if we had a good system for JSON (which requires a header for number of bytes), we could decouple our Value data-type from the encoding, and we could provide a msgpack back-end.

@jacobsa
Copy link
Contributor

jacobsa commented Feb 3, 2015

Cool, thank you for the brain dump.

@alexeikh-promon
Copy link
Author

Personally, I prefer to use instance-allocators, as opposed to the global allocators supported by old STL.

Yes, I agree that stateful allocators are useful. For specifying which arena to use for which objects. I missed stateful allocators support sometimes.

Resetting all the std::string data to "" is easy (since std::string is really a std::vector and thus does not free memory when the size shrinks)

But it is not wiping, is it? Wiping is more like

memset(&str[0], 0, str.capacity());

or

str.resize(str.capacity());
for (auto& c: str) c = (char)rand();

Not sure why you mentioned setting data to "".

but dictionary keys are problematic.

Thus the dictionary keys must wipe themself. It is what my suggestions above are about:

  1. Using a self-wiping string type:
namespace Json
{
#ifdef SELF_WIPING_STRING
class string : public std::string
{
...
    ~string() { wipe(); std::string::~string(); }
};
#else
typedef std::string string;
#endif
}
  1. Using a wiping allocator:
namespace Json
{
#ifdef SELF_WIPING
class WipingAllocator : public std::allocator <char>
{
    ...
};

typedef std::basic_string <char, char_traits <char>, WipingAllocator> string;
#else
typedef std::string string;
#endif
}

@cdunn2001
Copy link
Contributor

class string : public std::string
{
...
    ~string() { wipe(); std::string::~string(); }
};

std::string is reference-counted in some implementations (e.g. gcc). This is a major blunder.

If you wipe strings before destruction, then you can be sure by program logic that they are no longer needed. You're right: You don't technically need to change the size after over-writing the data. And it's probably ok to modify hash-keys just before the hash-table (or red-black) is destructed; the data-structure would have an illegal state, but the destructor probably does not rely on that. So something simple might work as a hack.

But you're missing other issues with std::string. If you create a string through operator+(), then portions of your sensitive data are copied into longer memory. You are not wiping the inchoate data.

@cdunn2001
Copy link
Contributor

The WipingAllocator is a promising idea. However, it would require us to avoid std::string everywhere carefully. That's not a blocker, just a bit of work.

I see one problem: Json::Value::asString() returns std::string. Does that matter?

@alexeikh-promon
Copy link
Author

And it's probably ok to modify hash-keys

Yes, I agree, "probably ok".

If you create a string through operator+(), then portions of your sensitive data are copied into longer memory.

Agree.

The WipingAllocator is a promising idea.

I made a proof of concept:

#include <iostream>
#include <memory>
#include <string>


template <class T>
class WipingAllocator : public std::allocator <T>
{
public:
        template <class U> struct rebind { typedef WipingAllocator <U> other; };

        typedef typename std::allocator<T>::pointer    pointer;
        typedef typename std::allocator<T>::size_type  size_type;
        typedef typename std::allocator<T>::value_type value_type;

        void deallocate(pointer p, size_type n)
        {
                std::fill_n((volatile char*)p, n * sizeof(value_type), 0);
                std::allocator<value_type>::deallocate(p, n);
        }
};

typedef std::basic_string <char, std::char_traits <char>, WipingAllocator <char> > SelfWipingString;


int main()
{
        const char* p1 = NULL;
        const char* p2 = NULL;

        {
                SelfWipingString s1 = "hello";
                p1 = s1.c_str();

                //std::string s2 = s1; // Compile error :(
                std::string s2(s1.data(), s1.length());
                p2 = s2.c_str();
        }

        std::cout << "After wiping: \"" << p1 << "\"" << std::endl;
        std::cout << "No    wiping: \"" << p2 << "\"" << std::endl;
}

I see one problem: Json::Value::asString() returns std::string. Does that matter?

As you can see from the code above, SelfWipingString can not just be assigned to an std::string (and vice versa), but the conversion is pretty easy. And it will, of course, be API caller's responsibility to wipe the std::string that he will receive. Extra copies of std::strings during returning from the function should not happen, because of C++11 move semantics.

There is another little problem though: some std::string implementations contain a small fixed buffer for storing small strings without dynamic memory allocation. So we may want to wipe (*this) in the string's destructor. I.e. use both custom allocator and custom destructor.

@alexeikh-promon
Copy link
Author

On the other hand, that fixed buffer inside std::string will likely be on stack and will soon be overwritten by other data...

Update: not in std::map scenario though.

@cdunn2001
Copy link
Contributor

What if we used a typedef for our internal strings and provided the string-with-WipingAllocator as a compiler-time option? We could return that from asString() too, since we have to break binary-compatibility anyway.

@alexeikh-promon
Copy link
Author

What if we used a typedef for our internal strings and provided the string-with-WipingAllocator as a compiler-time option?

Yes, it's exactly what I am suggesting.

We could return that from asString() too, since we have to break binary-compatibility anyway.

I think we should still return std::string in public APIs like asString(). It's much more convenient for most users of the library to operate on std::strings, than on some Json::strings, and it's very important IMO. For example, in our project here we chose JsonCpp over other JSON libraries because JsonCpp's API is very nice.

@dawesc
Copy link
Contributor

dawesc commented Jan 20, 2016

Hi all, did you get anywhere with this? We're happy to help with the development if we can reach consensus on the approach. Did anybody already branch this and successfully implement the solution?

Thanks

christopher

@dawesc
Copy link
Contributor

dawesc commented Jan 21, 2016

Hi all, just FYI we're going to try implementing this in https://github.com/EFTlab/jsoncpp/tree/secure_allocator and will contact you upon completion to see if it's worthy of inclusion.

@dawesc
Copy link
Contributor

dawesc commented Feb 4, 2016

Hi all, this is now completed and can be pulled from https://github.com/EFTlab/jsoncpp we've unit tests to prove that this is successfully working in that way however i'm guessing given the level of changes required to template most of the code this will require some more major testing. We'll be pulling this into our larger code base ASAP but would like to kick off a conversation about merging to master.

@cdunn2001
Copy link
Contributor

Submit it as a pull-request so we can see the diffs. Then it's easier to discuss.

I think this is a good differentiator for jsoncpp, so I lean toward this. But if we are going to templatize (breaking binary-compatibility, forcing a major version bump), maybe we should include a number type? People currently do not enjoy having to cast uint64_t before assigning to a Json::Value. Definitely needs more discussion.

@janbilek
Copy link

janbilek commented Feb 5, 2016

Hi Christophers, please see #412.

@cdunn2001
Copy link
Contributor

I'm closing this, unless anyone wants to discuss this further. But first, please see my comments at #412.

@cdunn2001
Copy link
Contributor

We are going to provide a kind of solution for this. See #442 and linked discussions.

cdunn2001 added a commit that referenced this issue Mar 20, 2016
Secure allocator available for wiping memory.

Resolves #437.
Resolves #152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants