Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming enhancements for dumping #524

Open
headius opened this issue Mar 31, 2023 · 0 comments
Open

Streaming enhancements for dumping #524

headius opened this issue Mar 31, 2023 · 0 comments

Comments

@headius
Copy link
Contributor

headius commented Mar 31, 2023

While investigating workarounds for jruby/jruby#6265 I realized that all dumping for e.g. to_json is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.

This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.

This would have several benefits:

  • No intermediate String to hold the entirety of the dumped json.
  • No intermediate Strings for components of a dumped collection; Array and Hash currently dump each element or pair to a separate String and then append that String to the result buffer.
  • Reduced allocation, copying, and GC overhead when dumping directly to IO.
  • Potential to provide IO-like or String-like receivers of the dumped json, allowing for a workaround to the Java 2GB array limitation (Exception NegativeArraySizeException during JSON.dump of large Hash jruby/jruby#6265).

I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.

headius added a commit to headius/json that referenced this issue Aug 15, 2023
Json.dump allows you to pass an IO to which the dump output will
be sent, but it still buffers the entire output in memory before
sending it to the given IO. This leads to issues on JRuby like
jruby/jruby#6265 when it tries to create a byte[] that exceeds the
maximum size of a signed int (JVM's array size limit).

This commit plumbs the IO all the way through the generation logic
so that it can be written to directly without filling a temporary
memory buffer first. This allow JRuby to dump object graphs that
would normally produce more content than the JVM can hold in a
single array, providing a workaround for jruby/jruby#6265.

It is unfortunately a bit slow to dump directly to IO due to the
many small writes that all acquire locks and participate in the
IO encoding subsystem. A more direct path that can skip some of
these pieces could be more competitive with the in-memory version,
but functionally it expands the size of graphs that cana be dumped
when using JRuby.

See ruby#524
byroot added a commit to byroot/json that referenced this issue Nov 5, 2024
Ref: ruby#524

Rather than to buffer everything in memory.

Unfortunately Ruby doesn't provide an API to write into
and IO without first allocating a string, which is a bit
wasteful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants