You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating workarounds for jruby/jruby#6265 I realized that all dumping for e.g. to_json is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.
This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.
This would have several benefits:
No intermediate String to hold the entirety of the dumped json.
No intermediate Strings for components of a dumped collection; Array and Hash currently dump each element or pair to a separate String and then append that String to the result buffer.
Reduced allocation, copying, and GC overhead when dumping directly to IO.
I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Json.dump allows you to pass an IO to which the dump output will
be sent, but it still buffers the entire output in memory before
sending it to the given IO. This leads to issues on JRuby like
jruby/jruby#6265 when it tries to create a byte[] that exceeds the
maximum size of a signed int (JVM's array size limit).
This commit plumbs the IO all the way through the generation logic
so that it can be written to directly without filling a temporary
memory buffer first. This allow JRuby to dump object graphs that
would normally produce more content than the JVM can hold in a
single array, providing a workaround for jruby/jruby#6265.
It is unfortunately a bit slow to dump directly to IO due to the
many small writes that all acquire locks and participate in the
IO encoding subsystem. A more direct path that can skip some of
these pieces could be more competitive with the in-memory version,
but functionally it expands the size of graphs that cana be dumped
when using JRuby.
See ruby#524
Ref: ruby#524
Rather than to buffer everything in memory.
Unfortunately Ruby doesn't provide an API to write into
and IO without first allocating a string, which is a bit
wasteful.
While investigating workarounds for jruby/jruby#6265 I realized that all dumping for e.g.
to_json
is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.
This would have several benefits:
I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.
The text was updated successfully, but these errors were encountered: