Skip to content

Use map instead of unordered_map for dictionary properties in CesiumGltf and Cesium3DTiles #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

lilleyse
Copy link
Contributor

std::map will preserve the order of elements in the dictionary so that we don't get different glTF / 3D Tiles serialization results on different compilers.

I first noticed this when working on the 3D Tiles writer where one of the unit tests was failing on windows because the order of "property1" and "property2" happened to be swapped.

{
  "asset": {
    "version": "1.0",
    "tilesetVersion": "1.2.3"
  },
  "properties": {
    "property2": {
      "maximum": 5,
      "minimum": 1
    },
    "property1": {
      "maximum": 10,
      "minimum": 0
    }
  },
{
  "asset": {
    "version": "1.0",
    "tilesetVersion": "1.2.3"
  },
  "properties": {
    "property1": {
      "maximum": 10,
      "minimum": 0
    },
    "property2": {
      "maximum": 5,
      "minimum": 1
    }
  },

… is preserved during the read/write roundtrip and unit tests pass consistently on Linux and Windows
@lilleyse lilleyse requested a review from kring October 25, 2021 16:06
@kring
Copy link
Member

kring commented Oct 26, 2021

But also std::map is a red-black tree and std::unordered_map is a hash table. So std::unordered_map has constant time lookup and insertion (versus logarithmic time for std::map) and is likely to be much more cache-friendly as well. Are we sure maintaining sort order for dictionary keys justifies the cost? I think it might for some properties (e.g. extras) but probably not for others, especially commonly-used ones that really ought to be order-independent like extensions. But maybe that's too much complexity and we don't expect the tree vs hashtable difference to make a tangible difference anyway. What do you think?

@javagl
Copy link
Contributor

javagl commented Oct 26, 2021

About the performance:

  1. Use typedef std::map MyMapType; and use MyMapType everywhere
  2. Run a sensible benchmark in the profiler
  3. Change the typedef to std::unordered_map
  4. Run a sensible benchmark in the profiler
  5. ?
  6. Profit

About the order: One relevant question is whether the order is part of the /* doxygen comment / specification */ of the respective class.

  /**
   * @brief A dictionary object of metadata about per-feature properties.
   *
   * The order of the properties is established with the standard `operator<`. We're 
   * not messing around with any case-insensitive ordering or reversed ordering 
   * or whatnot...
   */
  std::map<std::string, Properties> properties;

If this is not specified, then a unit tests should not make assumptions about the order of the elements.

If the unit test has a pesudocode like this

JSON json = read("{
    "aaa" : 123,
    "ccc" : 234,
    "bbb" : 345
}";

assert(json.propertyNames[0] === "aaa");
assert(json.propertyNames[1] === "bbb");
assert(json.propertyNames[2] === "ccc");

Then it assumes std::map.

If it does

assert(json.propertyNames[0] === "aaa");
assert(json.propertyNames[1] === "ccc");
assert(json.propertyNames[2] === "bbb");

Then it assumes the insertion order (most likely - depending on the internal parser mechanisms 😬 )

If it does

assert(json.propertyNames.size() == 3);
assert(json.propertyNames.contains("aaa"));
assert(json.propertyNames.contains("bbb"));
assert(json.propertyNames.contains("ccc"));

Then it does not make any assumptions.


An aside: When talking about "preserving the order", I'm always wondering about what "preserving" means. But maybe that's a nitpick, rooted in TreeMap vs LinkedHashMap, where the first one uses the "alphabetic" order (like std::map), but the second one is a hash-based map that actually preserves the insertion order - I don't know whether there is an STL class that does this.

@lilleyse
Copy link
Contributor Author

But maybe that's too much complexity and we don't expect the tree vs hashtable difference to make a tangible difference anyway. What do you think?

Generally since we're dealing with such small dictionaries I wouldn't expect the map implementation to make a tangible difference, also I wouldn't expect the dictionary elements to be accessed too frequently. I like how std::map preserves the order during the roundtrip from reader to writer.

For what it's worth, tinygltf also uses std::map

@lilleyse
Copy link
Contributor Author

I like how std::map preserves the order during the roundtrip from reader to writer.

Never mind, this is not true at all, my test code happened to be ordered alphabetically already. Should have looked closer at what @javagl said above.

@baothientran
Copy link
Contributor

I heard long time ago from CppCon (I can't remember where), std::map performs better than std::unordered_map when input is small (roughly less than 40 elements). I naively never confirmed it, so it maybe a good chance to see if that's true.

@kring
Copy link
Member

kring commented Oct 27, 2021

I heard long time ago from CppCon (I can't remember where), std::map performs better than std::unordered_map when input is small (roughly less than 40 elements).

Interesting, I can't really think of why that would be. Maybe std::map does something clever (rather than a heap allocation per tree node) or std::unordered_map does something surprising (like use a large hash table on the first insertion). Considering our dictionaries usually have less than 10 elements, the fastest thing is probably in fact a vector (or, better: two vectors, one with keys, one with values) linearly searched.

Anyway I don't mean to bike shed performance here when it's almost certainly not an area we should spend time optimizing. But maintaining order seemed like a fairly dubious benefit in most cases (and if it's not maintaining order but rather imposing alphabetical order, that benefit is even smaller IMO), so even paying a small cost in terms of extra heap allocations (which can involve locking and therefore have an unintuitive effect on performance under multithreaded load) and pointer chasing for tree traversal (which can have an outsized impact on overall performance outside microbenchmarks due to cache thrashing) might not be worth it.

If we do want to preserve order, though, there's https://github.com/Tessil/ordered-map and probably other options as well.

For what it's worth, tinygltf also uses std::map

I suspect this is more because of its C++03 legacy rather than a considered choice, but I could be wrong.

@lilleyse
Copy link
Contributor Author

The main benefit for preserving order would be to have cross-platform consistent output when writing glTF and 3D Tiles in the tilers, it feels like a weird idiosyncrasy otherwise.

https://github.com/Tessil/ordered-map seems promising and I might give that a try. That would also make us consistent with object property order in JavaScript (ES2015 and pre-ES2015 by convention for strings https://stackoverflow.com/questions/5525795/does-javascript-guarantee-object-property-order/23202095#23202095)

@javagl
Copy link
Contributor

javagl commented Oct 27, 2021

I'll probably repeat the hint for typedef/using in the future, and maybe point to https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rt-alias , because I'd really like to see things like
std::optional<CesiumAsync::Future<std::shared_ptr<CesiumAsync::IAssetRequest>>> requestTileContent(...)
to be
TileContentResponse requestTileContent(...)
at some point. And I think that it could help to abstract away questions like the map/unordered_map nicely, in a way that allows switching the map type transparently for the user in the future.


Regarding the option to pull in a dependency like ordered-map, without knowing its maintenance state, reliability, debug safety behavior or performance characteristics: I'd hesitate about that. (Unless it's hidden in an alias, and can be switched back transparently later...)


The main benefit for preserving order would be to have cross-platform consistent output when writing glTF and 3D Tiles in the tilers, it feels like a weird idiosyncrasy otherwise.

There may be two forms of consistency.

  • Consistency for the read-write-read-write roundtrip: In this case, something like ordered-map could help to achieve consistency in that the aforementioned dictionary aaa: 1, ccc: 2, bbb: 3 would be written out again in that exact order. This assumes a certain internal parsing mechanism, though. I think we're currently using a SAX-like approach in the glTF part, where this would be the case. But if we use a DOM-like approach, and the JSON parser somewhere uses a map or even unordered_map, then it could be impractical to try and achieve this sort of consistency...
  • Consistency for the outputs of different tools: This might be even more consistent with map than with ordered-map, because regardless of whether the glTF generator does gltf.add("buffers",b); gltf.add("accessors",a); or vice versa, the order will always be alphabetically, as per the constraint of std::map. That's a guarantee that could be made on the interface level. The question is rather whether this guarantee should be made.

An aside regarding the performance: I think that there are too many variables for the performance of map vs. unordered_map to make a general statement here. Some variables are obvious:

  • Size of the data structure
  • Operation: Insert vs. remove vs. lookup vs. iteration...?

Some other variables are hidden. The main one is the type of the key. There might be significant differences between a map<string, X> and a map<int, X>.

Eventually, the performance will depend on the operator< (or the hash function). And then one can think about a map with string keys like "a", "b", "c" ...., and a map where all the keys have the pattern "SomeLongStringWithFixedPrefixAnd [...10000 characters that are equal and are only different in the last character]". Talking about that without a sensible benchmark+profiler run is... pretty much speculating. (Maybe profound and informed, but still a form of bike shedding).

Add 3DTILES_content_gltf to Cesium3DTiles and Cesium3DTilesReader
@cesium-concierge
Copy link

Thanks again for your contribution @lilleyse!

No one has commented on this pull request in 30 days. Maintainers, can you review, merge or close to keep things tidy?

I'm going to re-bump this in 30 days. If you'd like me to stop, just comment with @cesium-concierge stop. If you want me to start again, just delete the comment.

3 similar comments
@cesium-concierge
Copy link

Thanks again for your contribution @lilleyse!

No one has commented on this pull request in 30 days. Maintainers, can you review, merge or close to keep things tidy?

I'm going to re-bump this in 30 days. If you'd like me to stop, just comment with @cesium-concierge stop. If you want me to start again, just delete the comment.

@cesium-concierge
Copy link

Thanks again for your contribution @lilleyse!

No one has commented on this pull request in 30 days. Maintainers, can you review, merge or close to keep things tidy?

I'm going to re-bump this in 30 days. If you'd like me to stop, just comment with @cesium-concierge stop. If you want me to start again, just delete the comment.

@cesium-concierge
Copy link

Thanks again for your contribution @lilleyse!

No one has commented on this pull request in 30 days. Maintainers, can you review, merge or close to keep things tidy?

I'm going to re-bump this in 30 days. If you'd like me to stop, just comment with @cesium-concierge stop. If you want me to start again, just delete the comment.

@lilleyse
Copy link
Contributor Author

lilleyse commented Mar 4, 2022

This PR is now well out of date so I'll close it. But it may come up again in the future.

@lilleyse lilleyse closed this Mar 4, 2022
@lilleyse lilleyse deleted the ordered-map branch March 4, 2022 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants