Add benchmarks for JSON deserialization including randomized map keys #6

carterkozak · 2023-04-18T21:06:56Z

Most benchmarks use the same set of input data for each iteration, which is heavily biased toward caches. This benchmark is meant to stress canonicalzation cache misses in a way that we observe in some production systems.

My implementation isn't ideal because I'm doing a fair bit of work to generate randomized data within the measured component of the benchmark, however I've profiled the results and most time is not spent within the setup portion, and the benchmark setup is comparable between configurations such that the results should be comparable between flags.

Initially I began this investigation based on the InternCache, which I expected to be the primary bottleneck for reasons listed here: https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
There is a measurable impact disabling interning, particularly with smaller heap sizes, but there's a much larger improvement when we opt out of canonicalization entirely, especially in paths which rely on the ByteQuadsCanonicalizer rather than CharsToNameCanonicalizer.

My initial results running java -Xmx256m -jar target/perf.jar ".*JsonArbitraryFieldNameBenchmark.*" -wi 4 -w 4 -i 4 -r 4 -f 1 -t max -rf json (28 threads):

(note the score is microseconds per operation where lower is better, not operations per second)

Benchmark                                       (mode)                   (shape)        (type)  Mode  Cnt    Score    Error  Units
JsonArbitraryFieldNameBenchmark.parse          DEFAULT            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  245.657 ± 77.651  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT            RANDOM_KEY_MAP        READER  avgt    4   22.317 ±  1.231  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  245.578 ± 54.366  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4   26.413 ±  5.014  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  225.419 ±  4.009  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN            RANDOM_KEY_MAP        READER  avgt    4    9.220 ±  0.082  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  222.956 ± 20.320  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4   10.543 ±  0.051  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4    8.427 ±  0.044  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE            RANDOM_KEY_MAP        READER  avgt    4    8.376 ±  0.024  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4    9.176 ±  0.056  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4    9.086 ±  0.053  us/op

Most benchmarks use the same set of input data for each iteration, which is heavily biased toward caches. This benchmark is meant to stress canonicalzation cache misses in a way that we observe in some production systems. My implementation isn't ideal because I'm doing a fair bit of work to generate randomized data within the measured component of the benchmark, however I've profiled the results and most time is not spent within the setup portion, and the benchmark setup is comparable between configurations such that the results should be comparable between flags. Initially I began this investigation based on the InternCache, which I expected to be the primary bottle-neck for reasons listed here: https://shipilev.net/jvm/anatomy-quarks/10-string-intern/ There is a measurable impact disabling interning, particularly with smaller heap sizes, but there's a much larger improvement when we opt out of canonicalization entirely, especially in paths which rely on the ByteQuadsCanonicalizer rather than CharsToNameCanonicalizer.

cowtowncoder · 2023-04-18T21:17:07Z

Yes, ByteQuadsCanonicalizer definitely takes heavier hit for decoding in case of symbol table miss -- intern()ing is of less concern there (although obviously not helpful either).

src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java

cowtowncoder · 2023-04-18T21:39:04Z

src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java

+
+        public String stringThree;
+
+        @JsonCreator(mode = JsonCreator.Mode.PROPERTIES)


Instead of using Creators, maybe simple setters/getters; this is faster than using Constructors.
Or actually just assigning directly into public Fields as they are there already.

I considered avoiding databinding altogether, but it's a convenient way to consume the data with some level of validation. Happy to update to getters and setters (or perhaps public property fields?).

Yeah not a huge deal by any measure, either works fine.

cowtowncoder · 2023-04-18T21:39:52Z

src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java

+    @Setup
+    public void setup() {
+        factory = mode.apply(new JsonFactory());
+        ObjectMapper mapper = new ObjectMapper(factory)


Similar to JsonFactory, let's use builders here to make code 3.0 compatible.

cowtowncoder · 2023-04-18T21:40:44Z

src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java

+        INPUT_STREAM() {
+            @Override
+            JsonParser create(JsonFactory factory, Supplier<String> jsonSupplier) throws IOException {
+                return factory.createParser(new ByteArrayInputStream(jsonSupplier.get().getBytes(StandardCharsets.UTF_8)));


Why not make supplier provide bytes, to avoid additional conversion overhead here?

Any chance you have an example? I reached for the easiest way to build dynamic json on the fly. I suspect that String.getBytes(UTF_8) performance isn't as far from building the byte array directly with some logic to encode keys due to the vectorized fast-path converting a jdk11+ compact string to ascii-compatible bytes (array copy without encoding).

Dynamic part may be problematic, but existing tests have helper class (InputData?) that creates data, extracts bytes etc. So it's more a question of not doing this conversion for each parse/generation run than performance of getting bytes itself. That is, keeping it out of measurement loop. If that makes sense.

carterkozak · 2023-04-19T20:07:39Z

It's interesting that the UTF8StreamJsonParser implementation seems much more performant at reading randomized string values than ReaderBasedJsonParser, so it would stand to reason that an option to turn off canonicalization in UTF8StreamJsonParser would be more efficient than falling back to ReaderBasedJsonParser when canonicalization is not enabled. Before I investigate/prototype, I'm curious if this matches your expectation.

No rush, as always I appreciate your help :-)

cowtowncoder · 2023-04-19T22:00:06Z

@carterkozak Yes, UTF8StreamJsonParser in general should be more efficient than Reader-backed one, since it combines UTF8-decoding and tokenization. I spent more time optimizing it as well. So having to use generally slower one (due to canonicalization not working due to unbounded key space) is unfortunate. It would of course be possible to design alternative that was optimized differently (faster decoding for non-canonicalizing case) but that'd be slower for cases where canonicalization (direct from byte[] to String).

cowtowncoder · 2023-04-19T23:18:28Z

@carterkozak One other thing: as I implied, decoding for "miss" case for UTF8StreamJsonParser is not heavily optimized, so another thing that might be useful with profiling would be to see if there are wins. It's quite possible, given lack of focus; and now with reproduction (can run JMH test under profiler) it'd be relatively easy to get numbers.

carterkozak · 2023-04-20T01:27:33Z

That makes sense, I’d definitely prefer to improve all configurations rather than exclusively a non-default case. I’ll report back once I’ve done some investigation. Thanks!

carterkozak · 2023-04-20T20:34:33Z

Initial testing seems to indicate that without canonicalization, UTF8StreamJsonParser (with very light modification to check _symbols.isCanonicalizing() outperforms the reader-based alternative in this benchmark (when the reader-based implementation wraps an InputStreamReader, for a like-for-like comparison).

When canonicalization is enabled, in cases without a bounded set of keys, the performance difference between ByteQuadsCanonicalizer and CharsToNameCanonicalizer appears to come down to buckets: The ByteQuadsCanonicalizer features a flattened hash table which is fantastic for locality, however any time mutations are required, the array copies are much larger, therefore more expensive.
In the benchmark with canonicalization, the CharsToNameCanonicalizer symbols array grows from 64 elements to 256 elements, and the buckets array grows from 32 to 128 buckets with a total size of 12000 strings. Copies are individually inexpensive on smaller arrays, and bucket mutations are inexpensive due to the immutable bucket datastructure (although traversal of the linked list is fairly expensive).
ByteQuadsCanonicalizer names array grows from 1024 elements initially, up to 16384 before resetting and the main hash array grows from 4096 elements to 65536, so each time a new key is encountered, we end up copying roughly a third of a megabyte on heap (based on 16384 length names array times 4 bytes per object reference, assuming compressed references plus 65536 length integer array at 4 bytes a piece).

Aside/pipedream: I wonder if we could prepopulate the canonicalization table, for example databind is likely to encounter most property names based on annotation scanning models. I suppose in an ideal world we might have contextual canonicalization based on the set of expected fields rather than a global cache.

We expect unlimited keyspace due to maps keyed by random IDs which are usually UUID-based. This interacts poorly with string canonicalization components in Jackson in ways that cause heavy heap churn. See the discussion here for the details: FasterXML/jackson-benchmarks#6 The primary risk is that without canonicalization, we allocate reused strings much more heavily. Strings used to match setter methods should be very short lived and should not escape the parsing thread, so the JIT may be able to do terribly clever things to help us out. Note: it may be possible to canonicalize contextually in jackson 3, which would give us the best of both worlds!

cowtowncoder · 2023-04-21T02:27:22Z

One thing wrt Jackson 3.0 (I know, I know, far away...) is that per BeanDeserializer known keys are pre-canonicalized. I may have mentioned this. That keeps them away from "global" ByteQuadsCanonicalizer (and CharsToNameCanonicalizer).

Another smaller possible win is of course considering what size is big enough symbol table that works for bounded sets (with whatever 98% of cases etc); if that's smaller than current max.
That is, could/should we lower the limit for pathological cases.

But pre-population idea is not bad either: given that new entries are appended at the end (slower), canonicalization reused keys is beneficial.
(or is this how it works? I'd need to double-check)

carterkozak · 2023-04-21T21:24:46Z

Ah yes, the PropertyNameMatcher in 3.0 exactly what I had in mind! I admit I hadn't read through much of the 3.0 code until today.

Another smaller possible win is of course considering what size is big enough symbol table that works for bounded sets (with whatever 98% of cases etc); if that's smaller than current max.

One idea I'm toying with is a compromise between the flat structure and full copy by allowing canonicalizer tables to be stacked. On lookup, the local table can be checked, then the parent on cache miss, but on store, only the child table is updated without creating a full copy. The two must eventually be merged in some way to avoid stacking more than two tables (degenerating into a linked list), but such an update can be done at the end of the process to avoid concurrent copies most of which will be thrown away. I need to think a bit more about how this could improve the single-threaded case since there are no concurrent updates, and the table is always merged back to the parent at the end.

cowtowncoder reviewed Apr 18, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java Show resolved Hide resolved

cowtowncoder reviewed Apr 18, 2023

View reviewed changes

Updates based on feedback (thanks!)

a94d718

cowtowncoder approved these changes Apr 19, 2023

View reviewed changes

cowtowncoder merged commit b4d08c1 into FasterXML:2.15 Apr 19, 2023

carterkozak mentioned this pull request Apr 20, 2023

Opt out of canonicalization in Conjure JsonFactories palantir/conjure-java-runtime#2603

Merged

carterkozak mentioned this pull request Apr 21, 2023

Investigate feasibility of UTF8StreamJsonParser without canonicalization FasterXML/jackson-core#994

Closed

schlosna mentioned this pull request Aug 10, 2023

Add StringReader input benchmark #9

Merged


		public String stringThree;

		@JsonCreator(mode = JsonCreator.Mode.PROPERTIES)

Add benchmarks for JSON deserialization including randomized map keys #6

Add benchmarks for JSON deserialization including randomized map keys #6

Uh oh!

Conversation

carterkozak commented Apr 18, 2023

Uh oh!

cowtowncoder commented Apr 18, 2023

Uh oh!

Uh oh!

cowtowncoder Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carterkozak Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

carterkozak Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

carterkozak commented Apr 19, 2023

Uh oh!

cowtowncoder commented Apr 19, 2023

Uh oh!

cowtowncoder commented Apr 19, 2023

Uh oh!

carterkozak commented Apr 20, 2023

Uh oh!

carterkozak commented Apr 20, 2023

Uh oh!

cowtowncoder commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carterkozak commented Apr 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cowtowncoder Apr 18, 2023 •

edited

Loading

cowtowncoder commented Apr 21, 2023 •

edited

Loading