Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

rschlaikjer · 2023-12-04T08:52:03Z

RapidJson's AddMember API, presumably by design, is append-only. That is to say, if ever it is called twice with the same key, it will produce a document with duplicate members for that key instead of overwriting a previous member. This means that AddMember can be fast (no check on whether a key exists), but in reality it just pushes the check further up the chain: Mujin's SetJsonValueByKey method will first look up whether a key exists, and then overwrite if it exists and create if it does not.

However, FindMember (the real implementation behind HasMember) utilizes a linear search approach - it has to iterate every member in the document and perform a key compare. This means that constructing a document of N elements ends up exhibiting roughly polynomial time complexity.

To address this, this patch would add a hashtable to each object that associates the first member with a given key in the map with its actual address inside the object. This means that member lookup is now constant time, but requires additional memory and tracking overhead. Some chunk of this overhead could actually be removed if we disallowed duplicate keys entirely - element deletion requires iteration of all subsequent keys in case they were a previously shadowed duplicate key member.

Performance was measured using the current large test scene with 6k parts added to the environment.

Without this patch:

Average call to _SavePublishState: 98'278[us]
Total RSS: 442MB

With this patch:

Average call to _SavePublishState: 10'622[us]
Total RSS: 1'384MB (!)

The memory overhead is significantly more than I expected - while the cache only uses views over
string data, it's possible we simply have vast quantities of small member count objects for which it dominates. It's possible this could be reduced by gating this optimization at runtime (i.e, only build the cache for objects with more than some threshold member count). Also possible there's a bug somewhere, though the rapidjson test suite does pass with no leaks.

RapidJson's `AddMember` API, presumably by design, is append-only. That is to say, if ever it is called twice with the same key, it will produce a document with duplicate members for that key instead of overwriting a previous member. This means that AddMember can be fast (no check on whether a key exists), but in reality it just pushes the check further up the chain: Mujin's `SetJsonValueByKey` method will first look up whether a key exists, and then overwrite if it exists and create if it does not. However, `FindMember` (the real implementation behind `HasMember`) utilizes a linear search approach - it has to iterate every member in the document and perform a key compare. This means that constructing a document of N elements ends up exhibiting roughly polynomial time complexity. To address this, this patch would add a hashtable to each object that associates the first member with a given key in the map with its actual address inside the object. This means that member lookup is now _constant_ time, but requires additional memory and tracking overhead. The memory overhead is noticeable - while the cache only uses views over string data, the additional pointer members adds far more weight than expected. It is possible this could be reduced by gating this optimization at runtime (i.e, only build the cache for objects with more than some threshold member count). Some chunk of this overhead could actually be removed if we disallowed duplicate keys entirely - element deletion requires iteration of all subsequent keys in case they were a previously shadowed duplicate key member. Performance was measured using the Zozo robotbridges scene with 6k parts added to the environment. Without this patch: - Total RSS: 442MB - Average call to _SavePublishState: 98'278[us] With this patch: - Total RSS: 1'384MB - Average call to _SavePublishState: 10'622[us]

ziyan and others added 3 commits March 28, 2018 17:16

Remove gtest.

761a6a3

Add additional test on findmember consistency

1192ada

ziyan changed the base branch from master to mujin December 4, 2023 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

rschlaikjer commented Dec 4, 2023

Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

Are you sure you want to change the base?

Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

Conversation

rschlaikjer commented Dec 4, 2023