Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RapidJson's
AddMember
API, presumably by design, is append-only. That is to say, if ever it is called twice with the same key, it will produce a document with duplicate members for that key instead of overwriting a previous member. This means that AddMember can be fast (no check on whether a key exists), but in reality it just pushes the check further up the chain: Mujin'sSetJsonValueByKey
method will first look up whether a key exists, and then overwrite if it exists and create if it does not.However,
FindMember
(the real implementation behindHasMember
) utilizes a linear search approach - it has to iterate every member in the document and perform a key compare. This means that constructing a document of N elements ends up exhibiting roughly polynomial time complexity.To address this, this patch would add a hashtable to each object that associates the first member with a given key in the map with its actual address inside the object. This means that member lookup is now constant time, but requires additional memory and tracking overhead. Some chunk of this overhead could actually be removed if we disallowed duplicate keys entirely - element deletion requires iteration of all subsequent keys in case they were a previously shadowed duplicate key member.
Performance was measured using the current large test scene with 6k parts added to the environment.
Without this patch:
With this patch:
The memory overhead is significantly more than I expected - while the cache only uses views over
string data, it's possible we simply have vast quantities of small member count objects for which it dominates. It's possible this could be reduced by gating this optimization at runtime (i.e, only build the cache for objects with more than some threshold member count). Also possible there's a bug somewhere, though the rapidjson test suite does pass with no leaks.