-
Notifications
You must be signed in to change notification settings - Fork 13
Add benchmarks for JSON deserialization including randomized map keys #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks for JSON deserialization including randomized map keys #6
Conversation
Most benchmarks use the same set of input data for each iteration, which is heavily biased toward caches. This benchmark is meant to stress canonicalzation cache misses in a way that we observe in some production systems. My implementation isn't ideal because I'm doing a fair bit of work to generate randomized data within the measured component of the benchmark, however I've profiled the results and most time is not spent within the setup portion, and the benchmark setup is comparable between configurations such that the results should be comparable between flags. Initially I began this investigation based on the InternCache, which I expected to be the primary bottle-neck for reasons listed here: https://shipilev.net/jvm/anatomy-quarks/10-string-intern/ There is a measurable impact disabling interning, particularly with smaller heap sizes, but there's a much larger improvement when we opt out of canonicalization entirely, especially in paths which rely on the ByteQuadsCanonicalizer rather than CharsToNameCanonicalizer.
|
Yes, |
src/main/java/com/fasterxml/jackson/perf/json/JsonArbitraryFieldNameBenchmark.java
Show resolved
Hide resolved
|
|
||
| public String stringThree; | ||
|
|
||
| @JsonCreator(mode = JsonCreator.Mode.PROPERTIES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using Creators, maybe simple setters/getters; this is faster than using Constructors.
Or actually just assigning directly into public Fields as they are there already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered avoiding databinding altogether, but it's a convenient way to consume the data with some level of validation. Happy to update to getters and setters (or perhaps public property fields?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not a huge deal by any measure, either works fine.
| @Setup | ||
| public void setup() { | ||
| factory = mode.apply(new JsonFactory()); | ||
| ObjectMapper mapper = new ObjectMapper(factory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to JsonFactory, let's use builders here to make code 3.0 compatible.
| INPUT_STREAM() { | ||
| @Override | ||
| JsonParser create(JsonFactory factory, Supplier<String> jsonSupplier) throws IOException { | ||
| return factory.createParser(new ByteArrayInputStream(jsonSupplier.get().getBytes(StandardCharsets.UTF_8))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make supplier provide bytes, to avoid additional conversion overhead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you have an example? I reached for the easiest way to build dynamic json on the fly. I suspect that String.getBytes(UTF_8) performance isn't as far from building the byte array directly with some logic to encode keys due to the vectorized fast-path converting a jdk11+ compact string to ascii-compatible bytes (array copy without encoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic part may be problematic, but existing tests have helper class (InputData?) that creates data, extracts bytes etc. So it's more a question of not doing this conversion for each parse/generation run than performance of getting bytes itself. That is, keeping it out of measurement loop. If that makes sense.
|
It's interesting that the No rush, as always I appreciate your help :-) |
|
@carterkozak Yes, |
|
@carterkozak One other thing: as I implied, decoding for "miss" case for |
|
That makes sense, I’d definitely prefer to improve all configurations rather than exclusively a non-default case. I’ll report back once I’ve done some investigation. Thanks! |
|
Initial testing seems to indicate that without canonicalization, When canonicalization is enabled, in cases without a bounded set of keys, the performance difference between Aside/pipedream: I wonder if we could prepopulate the canonicalization table, for example databind is likely to encounter most property names based on annotation scanning models. I suppose in an ideal world we might have contextual canonicalization based on the set of expected fields rather than a global cache. |
We expect unlimited keyspace due to maps keyed by random IDs which are usually UUID-based. This interacts poorly with string canonicalization components in Jackson in ways that cause heavy heap churn. See the discussion here for the details: FasterXML/jackson-benchmarks#6 The primary risk is that without canonicalization, we allocate reused strings much more heavily. Strings used to match setter methods should be very short lived and should not escape the parsing thread, so the JIT may be able to do terribly clever things to help us out. Note: it may be possible to canonicalize contextually in jackson 3, which would give us the best of both worlds!
|
One thing wrt Jackson 3.0 (I know, I know, far away...) is that per Another smaller possible win is of course considering what size is big enough symbol table that works for bounded sets (with whatever 98% of cases etc); if that's smaller than current max. But pre-population idea is not bad either: given that new entries are appended at the end (slower), canonicalization reused keys is beneficial. |
|
Ah yes, the
One idea I'm toying with is a compromise between the flat structure and full copy by allowing canonicalizer tables to be stacked. On lookup, the local table can be checked, then the parent on cache miss, but on store, only the child table is updated without creating a full copy. The two must eventually be merged in some way to avoid stacking more than two tables (degenerating into a linked list), but such an update can be done at the end of the process to avoid concurrent copies most of which will be thrown away. I need to think a bit more about how this could improve the single-threaded case since there are no concurrent updates, and the table is always merged back to the parent at the end. |
Most benchmarks use the same set of input data for each iteration, which is heavily biased toward caches. This benchmark is meant to stress canonicalzation cache misses in a way that we observe in some production systems.
My implementation isn't ideal because I'm doing a fair bit of work to generate randomized data within the measured component of the benchmark, however I've profiled the results and most time is not spent within the setup portion, and the benchmark setup is comparable between configurations such that the results should be comparable between flags.
Initially I began this investigation based on the InternCache, which I expected to be the primary bottleneck for reasons listed here: https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
There is a measurable impact disabling interning, particularly with smaller heap sizes, but there's a much larger improvement when we opt out of canonicalization entirely, especially in paths which rely on the ByteQuadsCanonicalizer rather than CharsToNameCanonicalizer.
My initial results running
java -Xmx256m -jar target/perf.jar ".*JsonArbitraryFieldNameBenchmark.*" -wi 4 -w 4 -i 4 -r 4 -f 1 -t max -rf json(28 threads):(note the score is microseconds per operation where lower is better, not operations per second)