Apply recent C optimizations to Java extension #725

headius · 2025-01-09T00:40:26Z

Just catching up with all of @byroot's excellent optimization work.

@byroot

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

headius · 2025-01-09T00:41:11Z

As part of this I'll be trying to align more of the Java code with the C equivalents, with comments to indicate how they sync up. This should make it easier to keep them in sync in the future.

@byroot

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

Lots of surrounding state so just take the hit of a Set and Iterator rather than a big visitor object.

This change duplicates some code from JRuby to allow rendering the fixnum value to a shared byte array rather than allocating new for each value. Since fixnum dumping is a leaf operation, only one is needed per session.

headius · 2025-01-09T08:39:05Z

I jumped down the optimization well and continued past the recent string optimizations on to the other types of dumpable objects. Strings are now faster than in CRuby, as well as a few other cases in the encoder.rb benchmark, but many cases are still slower... sometimes less than half the performance.

Tracking results here: https://gist.github.com/headius/3e56d80656543bf2343f4b26f00bc446

Anonymous classes show up as unnamed, numbered classes in profiles which makes them difficult to read.

Rather than allocating a buffer to hold N copies of arrayNL, just write it N times. We're buffering into a stream anyway. This makes array dumping zero-alloc other than buffer growth.

Since there's a fixed number of types we have special dumping logic for, this abstraction just introduces overhead we don't need. This patch starts moving away from indirecting all dumps through the Handler abstraction and directly generating from the type switch. This also aligns better with the main loop of the C code and should inline and optimize better.

The byte[] output stream used here extended ByteArrayOutputStream from the JDK, which sychronizes all mutation operations (like writes). Since this is only going to be used once within a given call stack, it needs no synchronization. This change more than triples the performance of a benchmark of dumping an array of empty arrays and should increase performance of all dump forms.

* Return incoming array if only one repeat is needed and array is exact size. * Only retrieve ByteList fields once for repeat writes.

headius · 2025-01-15T18:17:21Z

A nice discovery: the default ByteArrayOutputStream we extend for our ByteList version uses synchronized on its write methods, and removing that (by avoiding the JDK impl) improves performance of dumping substantially.

The math is much faster here than array access, due to bounds checking and pointer dereferencing.

Java will generated accessor methods for private fields, burning some inlining budget.

headius added 3 commits January 8, 2025 16:23

Make benchmark runnable without oj available

776993a

Port convert_UTF8_to_ASCII_only_JSON to Java

be45c9a

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

Align string generate method with generate_json_string

4d37e9f

headius added 5 commits January 8, 2025 21:43

Port convert_UTF8_to_JSON from C

38c7831

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

Use external iteration to reduce alloc

0a5f6e7

Lots of surrounding state so just take the hit of a Set and Iterator rather than a big visitor object.

Remove unused imports

98cb785

Inline ConvertBytes logic for long to byte[]

75cf6fe

This change duplicates some code from JRuby to allow rendering the fixnum value to a shared byte array rather than allocating new for each value. Since fixnum dumping is a leaf operation, only one is needed per session.

Eliminate * import

8f4ce51

headius added 3 commits January 9, 2025 02:42

Restructure handlers for easier profiling

845fc46

Anonymous classes show up as unnamed, numbered classes in profiles which makes them difficult to read.

Avoid allocation when writing Array delimiters

9d74a1f

Rather than allocating a buffer to hold N copies of arrayNL, just write it N times. We're buffering into a stream anyway. This makes array dumping zero-alloc other than buffer growth.

headius mentioned this pull request Jan 9, 2025

Add a fast path for ASCII strings #620

Merged

headius added 3 commits January 14, 2025 21:18

Match C version of fbuffer_append_long

f7eede3

Minor tweaks to reduce complexity

b11e4f2

headius force-pushed the jruby_optz branch from a128bfa to 97ac36f Compare January 15, 2025 16:20

Reduce overhead in repeats

bd2007a

* Return incoming array if only one repeat is needed and array is exact size. * Only retrieve ByteList fields once for repeat writes.

headius mentioned this pull request Jan 16, 2025

Improvements to support json library jruby/jruby#8574

Draft

headius added 9 commits January 16, 2025 14:32

Use equivalent of rb_sym2str

4f7d404

Microoptimizations for ByteList stream

10d752b

Cast to byte not necessary

39d410f

Refactor this for better inlining

7f9b6a3

More tiny tweaks to reduce overhead of generateString

70aadd0

Refactor to avoid repeated boolean checks

0133cbc

Eliminate memory accesses for digits

9de3120

The math is much faster here than array access, due to bounds checking and pointer dereferencing.

Loosen visibility to avoid accessor methods

d0a718c

Java will generated accessor methods for private fields, burning some inlining budget.

Modify parser bench to work without oj or rapidjson

67a00da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply recent C optimizations to Java extension #725

Apply recent C optimizations to Java extension #725

headius commented Jan 9, 2025

headius commented Jan 9, 2025

headius commented Jan 9, 2025

headius commented Jan 15, 2025

Apply recent C optimizations to Java extension #725

Are you sure you want to change the base?

Apply recent C optimizations to Java extension #725

Conversation

headius commented Jan 9, 2025

headius commented Jan 9, 2025

headius commented Jan 9, 2025

headius commented Jan 15, 2025