-
-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception NegativeArraySizeException during JSON.dump of large Hash #6265
Comments
Not too surprising... this is dumping the json output into one of our ByteList instances, which are based on Java byte array, and on Java the limit for such buffers is 2GB. The only workaround I can think of at the moment would be to dump to an IO stream, rather than dumping to a >2GB in-memory buffer. Unfortunately fixing this is a much larger challenge, since the result of |
@enebo @lopex I don't know that we are any closer to a solution on this now than we were ten years ago. One option would be modifying bytelist to either use multiple arrays or a long[] but clearly that will impact a huge amount of code that expects to be able to get a byte[] out. On the other hand, the vast majority of cases will still be under 2GB, so perhaps we could incrementally add support for ranges outside int32 and error for cases that expect to have a real byte[]. |
Hello,
and on the latest version 9.4.2.0 the same error with a bit different stack trace formatting:
Java has given memory larger than 2 Gb - |
This remains a Java limitation. The buffer into which your large hash is being dumped eventually grows to be larger than 2GB, which is the limit of a Java array. Since we only have one implementation of Ruby's String, and that implementation uses a Java byte[], we cannot grow a string any larger than the 2GB limit. The solutions to this in the Java world are to use multiple arrays or to use a native block of memory via a native ByteBuffer. I can see two paths forward for fixing this with ByteBuffer:
I'm going to poke around the json library and see if the latter option might work in the short term. |
Unfortunately the json library is not compatible with the approach I outlined. In all three implementations of the generator–the pure-Ruby version, the C version, and the Java version–all json is written first to a String, and then that String is returned or written to an IO. There's no streaming of json data into an abstract "write" or "append" interface, so there's no way to trick it into using a different type of buffer. I was also mistaken when I said ByteBuffer could be used to work around this. ByteBuffers can only be constructed with a size specified in a Java int, effectively limiting it to 2GB. So that leaves us to use a native buffer in some other way, such as through Ruby FFI, Java FFI libraries like jnr-ffi, or the new OpenJDK project Panama's support for native memory buffers. So the project is pretty big but also could be pretty valuable. For IO-heavy json use cases, writing to a String buffer is obviously not going to be the most efficient option; there would be value in enhancing json to write not to a String but to any String-like or IO-like object provided by the caller. That would in turn allow us to pass it a native memory wrapper and avoid the 2GB byte[] limit. The wrapper itself could be implemented today with FFI, but for better efficiency the JIT enhancements that come with project Panama make it a more attractive target. And Panama is only available in a preview form as of JDK 19 having incubated in JDK 17 and 18. I will be presenting on this topic (in part) next week so I'm looking into the possibilities right now. |
Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit). This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265. It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby. See ruby#54
Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit). This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265. It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby. See ruby#524
See ruby/json#524 for a proof-of-concept streaming dump implementation. This is likely the closest we can get in the near term to defeating the JVM array-size limit, but I could use some help cleaning it up and getting it shipped. |
Environment Information
Provide at least:
jruby -v
) and command line (flags, JRUBY_OPTS, etc)Originally detected on 9.2.9.0:
jruby 9.2.9.0 (2.5.7) 2019-10-30 458ad3e OpenJDK 64-Bit Server VM 11.0.5+10 on 11.0.5+10 [darwin-x86_64]
but can be reproduced also on the latest 9.2.11.1:
jruby 9.2.11.1 (2.5.7) 2020-03-25 b1f55b1 OpenJDK 64-Bit Server VM 11.0.5+10 on 11.0.5+10 [darwin-x86_64]
uname -a
)Linux staging-app1 4.4.0-170-generic Allow the RubyClass to be determined when extending BigDecimal #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
also on Dev box:
Darwin JBA-MacBook-Pro.local 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64
Other relevant info you may wish to add:
Expected Behavior
JSON.dump
completes without an error for a large hash. On MRI 2.5.1 provided sample code completes without an error.Actual Behavior
An error is thrown:
It seems that 2Gb is the limit when the error starts to occur.
The text was updated successfully, but these errors were encountered: