Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LUT to speed up FindMember / Mujin's SetJsonValueByKey #4

Open
wants to merge 3 commits into
base: mujin
Choose a base branch
from

Conversation

rschlaikjer
Copy link

RapidJson's AddMember API, presumably by design, is append-only. That is to say, if ever it is called twice with the same key, it will produce a document with duplicate members for that key instead of overwriting a previous member. This means that AddMember can be fast (no check on whether a key exists), but in reality it just pushes the check further up the chain: Mujin's SetJsonValueByKey method will first look up whether a key exists, and then overwrite if it exists and create if it does not.

However, FindMember (the real implementation behind HasMember) utilizes a linear search approach - it has to iterate every member in the document and perform a key compare. This means that constructing a document of N elements ends up exhibiting roughly polynomial time complexity.

To address this, this patch would add a hashtable to each object that associates the first member with a given key in the map with its actual address inside the object. This means that member lookup is now constant time, but requires additional memory and tracking overhead. Some chunk of this overhead could actually be removed if we disallowed duplicate keys entirely - element deletion requires iteration of all subsequent keys in case they were a previously shadowed duplicate key member.

Performance was measured using the current large test scene with 6k parts added to the environment.

Without this patch:

  • Average call to _SavePublishState: 98'278[us]
  • Total RSS: 442MB

With this patch:

  • Average call to _SavePublishState: 10'622[us]
  • Total RSS: 1'384MB (!)

The memory overhead is significantly more than I expected - while the cache only uses views over
string data, it's possible we simply have vast quantities of small member count objects for which it dominates. It's possible this could be reduced by gating this optimization at runtime (i.e, only build the cache for objects with more than some threshold member count). Also possible there's a bug somewhere, though the rapidjson test suite does pass with no leaks.

ziyan and others added 3 commits March 28, 2018 17:16
RapidJson's `AddMember` API, presumably by design, is append-only. That
is to say, if ever it is called twice with the same key, it will produce
a document with duplicate members for that key instead of overwriting a
previous member. This means that AddMember can be fast (no check on
whether a key exists), but in reality it just pushes the check further
up the chain: Mujin's `SetJsonValueByKey` method will first look up
whether a key exists, and then overwrite if it exists and create if it
does not.

However, `FindMember` (the real implementation behind `HasMember`)
utilizes a linear search approach - it has to iterate every member in
the document and perform a key compare. This means that constructing a
document of N elements ends up exhibiting roughly polynomial time complexity.

To address this, this patch would add a hashtable to each object that
associates the first member with a given key in the map with its actual
address inside the object. This means that member lookup is now
_constant_ time, but requires additional memory and tracking overhead.
The memory overhead is noticeable - while the cache only uses views over
string data, the additional pointer members adds far more weight than
expected. It is possible this could be reduced by gating this
optimization at runtime (i.e, only build the cache for objects with more
than some threshold member count).

Some chunk of this overhead could actually be removed if we disallowed
duplicate keys entirely - element deletion requires iteration of all
subsequent keys in case they were a previously shadowed duplicate key
member.

Performance was measured using the Zozo robotbridges scene with 6k
parts added to the environment.

Without this patch:
- Total RSS: 442MB
- Average call to _SavePublishState: 98'278[us]

With this patch:
- Total RSS: 1'384MB
- Average call to _SavePublishState: 10'622[us]
@ziyan ziyan changed the base branch from master to mujin December 4, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants