Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When dumping to IO, dump directly #538

Closed
wants to merge 24 commits into from

Conversation

headius
Copy link
Contributor

@headius headius commented Aug 15, 2023

Note, this PR is a proof-of-concept of direct IO dumping in the json library. It works, but is not as fast as it could be, and there's no CRuby implementation yet.

Json.dump allows you to pass an IO to which the dump output will be sent, but it still buffers the entire output in memory before sending it to the given IO. This leads to issues on JRuby like jruby/jruby#6265 when it tries to create a byte[] that exceeds the maximum size of a signed int (JVM's array size limit).

This commit plumbs the IO all the way through the generation logic so that it can be written to directly without filling a temporary memory buffer first. This allow JRuby to dump object graphs that would normally produce more content than the JVM can hold in a single array, providing a workaround for jruby/jruby#6265.

It is unfortunately a bit slow to dump directly to IO due to the many small writes that all acquire locks and participate in the IO encoding subsystem. A more direct path that can skip some of these pieces could be more competitive with the in-memory version, but functionally it expands the size of graphs that cana be dumped when using JRuby.

See #524

@headius headius marked this pull request as draft August 15, 2023 00:31
@headius
Copy link
Contributor Author

headius commented Aug 15, 2023

Note that prior to this patch, the script provided in jruby/jruby#6265 required 2-3GB of memory to run. Afterwards it varies between 200-600MB and completes successfully.

@segiddins
Copy link

For performance, it might help to have a buffered proxy to the underlying IO? That way, you only incur the locking/encoding overhead every buffer block size, vs every substring that gets written?

@headius
Copy link
Contributor Author

headius commented Aug 19, 2023

@segiddins Yeah I hoped that internal buffering in IO would be sufficient for this but that buffer may simply be too small or there's enough overhead with character encoding checks that we lose the benefit. I'd like to play with some other buffering strategies, and some of the larger methods that do many tiny writes could do those rights into a temporary string that gets dumped in one go.

@headius
Copy link
Contributor Author

headius commented Aug 19, 2023

Oh, I did run a profile of this on JRuby and the other cost we have using IO as the buffer is all the locking that's required to keep it thread safe. So if you're writing individual characters, that's a lot of lock acquisition and releasing. Batching up those rights into coarser chunks would make a big difference.

@byroot
Copy link
Member

byroot commented Nov 5, 2024

I'm assuming you're not actively working on this, so I'll close.

@byroot byroot closed this Nov 5, 2024
@headius
Copy link
Contributor Author

headius commented Nov 5, 2024

@byroot I never stopped working on this but never got feedback from any stakeholders. I still believe it should be done.

@headius
Copy link
Contributor Author

headius commented Nov 5, 2024

FWIW we have a customer that needed this, and I assume they still do. Dumping directly to an in-memory buffer is prohibitive for very large json streams.

@byroot
Copy link
Member

byroot commented Nov 5, 2024

Yeah, it makes sense, and looking at the C implementation, I think it would be relatively easy to do the C implementation.

I'm just trying to keep the repo tidy, if you feel strongly about keeping that draft open feel free to re-open.

but never got feedback from any stakeholders.

Well, as mentioned previously, from my point of view you are the Java implementation maintainer, so if the feature doesn't require to change the public API, feel fee to implement and merge whatever you want.

@headius
Copy link
Contributor Author

headius commented Nov 5, 2024

@byroot I was not a maintainer at the time, so I was not enthusiastic to do much with it. That's why it sat for a year+ without any work being done; if the maintainer was not on board, there would not be much point in me doing it.

I also did not hear from a single maintainer of the C extension and I do not have the knowledge of that codebase to make a similar change, so that further dampened my enthusiasm.

I will re-open. I believe we should do this for both extensions and json-pure.

@headius headius reopened this Nov 5, 2024
@byroot
Copy link
Member

byroot commented Nov 5, 2024

I believe we should do this for both extensions and json-pure.

json_pure is gone as of a few minutes ago.

As for the C extension, I can take care of it, but the nice thing about this feature is that AFAICT it doesn't change anything about the public API it's just an internal implementation detail, so IMO we can perfectly merge the Java side of it and see later for the C side.

Ref: ruby#524

Rather than to buffer everything in memory.

Unfortunately Ruby doesn't provide an API to write into
and IO without first allocating a string, which is a bit
wasteful.
@headius
Copy link
Contributor Author

headius commented Nov 20, 2024

@byroot That makes sense. I will return to this and get it working well for the Java extension. If it seems like a useful optimization at that point, we'll proceed with a C impl.

@byroot
Copy link
Member

byroot commented Nov 20, 2024

Most of the C implementation is in #686, just need to take the time to polish it.

y-yagi and others added 11 commits November 20, 2024 16:27
Ignoring `CHAR_BITS` > 8 platform, as far as `ch` indexes
`escape_table` that is hard-coded as 256 elements.

```
../../../../src/ext/json/generator/generator.c(121): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(122): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(243): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(244): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(291): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(292): warning C4333: '>>': right shift by too large amount, data loss
```
If we assume most string don't contain any escape sequence we can avoid
a lot of costly operations when it holds true.

Before:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   884.000 i/100ms
                  oj   789.000 i/100ms
          Oj::Parser   943.000 i/100ms
           rapidjson   584.000 i/100ms
Calculating -------------------------------------
                json      8.897k (± 1.3%) i/s  (112.40 μs/i) -     45.084k in   5.068520s
                  oj      7.967k (± 1.5%) i/s  (125.52 μs/i) -     40.239k in   5.051985s
          Oj::Parser      9.564k (± 1.4%) i/s  (104.56 μs/i) -     48.093k in   5.029626s
           rapidjson      5.947k (± 1.4%) i/s  (168.16 μs/i) -     29.784k in   5.009437s

Comparison:
                json:     8896.5 i/s
          Oj::Parser:     9563.8 i/s - 1.08x  faster
                  oj:     7966.8 i/s - 1.12x  slower
           rapidjson:     5946.7 i/s - 1.50x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    83.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    77.000 i/100ms
           rapidjson    54.000 i/100ms
Calculating -------------------------------------
                json    823.083 (± 1.8%) i/s    (1.21 ms/i) -      4.150k in   5.043805s
                  oj    632.538 (± 1.4%) i/s    (1.58 ms/i) -      3.200k in   5.060073s
          Oj::Parser    769.122 (± 1.8%) i/s    (1.30 ms/i) -      3.850k in   5.007501s
           rapidjson    548.494 (± 1.5%) i/s    (1.82 ms/i) -      2.754k in   5.022153s

Comparison:
                json:      823.1 i/s
          Oj::Parser:      769.1 i/s - 1.07x  slower
                  oj:      632.5 i/s - 1.30x  slower
           rapidjson:      548.5 i/s - 1.50x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    34.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    427.162 (± 1.2%) i/s    (2.34 ms/i) -      2.173k in   5.087666s
                  oj    351.463 (± 2.8%) i/s    (2.85 ms/i) -      1.768k in   5.035149s
          Oj::Parser    461.849 (± 3.7%) i/s    (2.17 ms/i) -      2.340k in   5.074461s
           rapidjson    395.155 (± 1.8%) i/s    (2.53 ms/i) -      1.989k in   5.034927s

Comparison:
                json:      427.2 i/s
          Oj::Parser:      461.8 i/s - 1.08x  faster
           rapidjson:      395.2 i/s - 1.08x  slower
                  oj:      351.5 i/s - 1.22x  slower
```

After:

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   953.000 i/100ms
                  oj   813.000 i/100ms
          Oj::Parser   956.000 i/100ms
           rapidjson   563.000 i/100ms
Calculating -------------------------------------
                json      9.525k (± 1.2%) i/s  (104.98 μs/i) -     47.650k in   5.003252s
                  oj      8.117k (± 0.5%) i/s  (123.20 μs/i) -     40.650k in   5.008283s
          Oj::Parser      9.590k (± 3.2%) i/s  (104.27 μs/i) -     48.756k in   5.089794s
           rapidjson      6.020k (± 0.9%) i/s  (166.10 μs/i) -     30.402k in   5.050155s

Comparison:
                json:     9525.3 i/s
          Oj::Parser:     9590.1 i/s - same-ish: difference falls within error
                  oj:     8116.7 i/s - 1.17x  slower
           rapidjson:     6020.5 i/s - 1.58x  slower

== Parsing twitter.json (567916 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    87.000 i/100ms
                  oj    64.000 i/100ms
          Oj::Parser    75.000 i/100ms
           rapidjson    55.000 i/100ms
Calculating -------------------------------------
                json    866.563 (± 0.8%) i/s    (1.15 ms/i) -      4.350k in   5.020138s
                  oj    643.567 (± 0.8%) i/s    (1.55 ms/i) -      3.264k in   5.072101s
          Oj::Parser    777.346 (± 3.5%) i/s    (1.29 ms/i) -      3.900k in   5.023933s
           rapidjson    557.158 (± 0.7%) i/s    (1.79 ms/i) -      2.805k in   5.034731s

Comparison:
                json:      866.6 i/s
          Oj::Parser:      777.3 i/s - 1.11x  slower
                  oj:      643.6 i/s - 1.35x  slower
           rapidjson:      557.2 i/s - 1.56x  slower

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    41.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    40.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    429.216 (± 1.2%) i/s    (2.33 ms/i) -      2.173k in   5.063351s
                  oj    354.755 (± 1.1%) i/s    (2.82 ms/i) -      1.785k in   5.032374s
          Oj::Parser    465.114 (± 3.7%) i/s    (2.15 ms/i) -      2.360k in   5.081634s
           rapidjson    387.135 (± 1.3%) i/s    (2.58 ms/i) -      1.950k in   5.037787s

Comparison:
                json:      429.2 i/s
          Oj::Parser:      465.1 i/s - 1.08x  faster
           rapidjson:      387.1 i/s - 1.11x  slower
                  oj:      354.8 i/s - 1.21x  slower
```
`rb_cstr2inum` isn't very fast because it handles tons of
different scenarios, and also require a NULL terminated string
which forces us to copy the number into a secondary buffer.

But since the parser already computed the length, we can much more
cheaply do this with a very simple function as long as the number
is small enough to fit into a native type (`long long`).

If the number is too long, we can fallback to the `rb_cstr2inum`
slowpath.

Before:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    40.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    425.941 (± 1.9%) i/s    (2.35 ms/i) -      2.160k in   5.072833s
                  oj    349.617 (± 1.7%) i/s    (2.86 ms/i) -      1.750k in   5.006953s
          Oj::Parser    464.767 (± 1.7%) i/s    (2.15 ms/i) -      2.340k in   5.036381s
           rapidjson    382.413 (± 2.4%) i/s    (2.61 ms/i) -      1.938k in   5.070757s

Comparison:
                json:      425.9 i/s
          Oj::Parser:      464.8 i/s - 1.09x  faster
           rapidjson:      382.4 i/s - 1.11x  slower
                  oj:      349.6 i/s - 1.22x  slower
```

After:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    46.000 i/100ms
                  oj    33.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    462.332 (± 3.2%) i/s    (2.16 ms/i) -      2.346k in   5.080504s
                  oj    351.140 (± 1.1%) i/s    (2.85 ms/i) -      1.782k in   5.075616s
          Oj::Parser    473.500 (± 1.3%) i/s    (2.11 ms/i) -      2.385k in   5.037695s
           rapidjson    395.052 (± 3.5%) i/s    (2.53 ms/i) -      1.989k in   5.042275s

Comparison:
                json:      462.3 i/s
          Oj::Parser:      473.5 i/s - same-ish: difference falls within error
           rapidjson:      395.1 i/s - 1.17x  slower
                  oj:      351.1 i/s - 1.32x  slower
```
And stop benchmarking the Generator::State re-use, as it
no longer make a sizeable difference.
Fix: ruby#694

This was lost during the .gemspec merge and not noticed
because it was falling back to loading the jars from the
stdlib.
byroot and others added 10 commits November 20, 2024 16:27
This way they're hidden in diffs.

It would be good to enforce on CI that the generated files match
the source change, however ragel's output isn't consistent
across versions and system, so we'll have to rely on changes
being noticed by further contributions.
Before this commit, we would try to scan for a float, then if that
failed, scan for an integer.  But floats and integers have many bytes in
common, so we would end up scanning the same bytes multiple times.

This patch combines integer and float scanning machines so that we only
have to scan bytes once.  If the machine finds "float parts", then it
executes the "isFloat" transition in the machine, which sets a boolean
letting us know that the parser found a float.

If we didn't find a float, but we did match, then we know it's an int.
Fix: ruby#697

This way even if `Encoding.default_external` is set to a weird value
the document will be parsed just fine.
The documentation state `Oj::Parser.usual` isn't thread safe:
https://github.com/ohler55/oj/blob/c70bf4125b546bc7146840b15de36460d42b4dff/ext/oj/parser.c#L1507-L1513

As such we shouldn't benchark it this way, but instantiate a new
parser every time. Technically in real world scenarios you could
create a pool of parsers and re-use them, but if it's not provided
by the gem, I'm not sure we should go out of our way to do it.
Otherwise the likeliness of seeing that key again is really low, and looking up
the cache is just a waste.

Before:

```
== Parsing small hash (65 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. 9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   343.049k i/100ms
                  oj   213.943k i/100ms
          Oj::Parser    31.583k i/100ms
           rapidjson   303.433k i/100ms
Calculating -------------------------------------
                json      3.704M (± 1.5%) i/s  (270.01 ns/i) -     18.525M in   5.003078s
                  oj      2.200M (± 1.1%) i/s  (454.46 ns/i) -     11.125M in   5.056526s
          Oj::Parser    285.369k (± 4.8%) i/s    (3.50 μs/i) -      1.453M in   5.103866s
           rapidjson      3.216M (± 1.6%) i/s  (310.95 ns/i) -     16.082M in   5.001973s

Comparison:
                json:  3703517.4 i/s
           rapidjson:  3215983.0 i/s - 1.15x  slower
                  oj:  2200417.1 i/s - 1.68x  slower
          Oj::Parser:   285369.1 i/s - 12.98x  slower

== Parsing test from oj (258 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. 9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    54.539k i/100ms
                  oj    41.473k i/100ms
          Oj::Parser    24.064k i/100ms
           rapidjson    51.466k i/100ms
Calculating -------------------------------------
                json    549.386k (± 1.6%) i/s    (1.82 μs/i) -      2.781M in   5.064316s
                  oj    417.003k (± 1.3%) i/s    (2.40 μs/i) -      2.115M in   5.073047s
          Oj::Parser    226.500k (± 4.7%) i/s    (4.42 μs/i) -      1.131M in   5.005466s
           rapidjson    526.124k (± 1.0%) i/s    (1.90 μs/i) -      2.676M in   5.087176s

Comparison:
                json:   549385.6 i/s
           rapidjson:   526124.3 i/s - 1.04x  slower
                  oj:   417003.4 i/s - 1.32x  slower
          Oj::Parser:   226500.4 i/s - 2.43x  slower
```

After:

```
== Parsing small hash (65 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. 9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   361.394k i/100ms
                  oj   217.203k i/100ms
          Oj::Parser    28.855k i/100ms
           rapidjson   303.404k i/100ms
Calculating -------------------------------------
                json      3.859M (± 2.9%) i/s  (259.13 ns/i) -     19.515M in   5.061302s
                  oj      2.191M (± 1.6%) i/s  (456.49 ns/i) -     11.077M in   5.058043s
          Oj::Parser    315.132k (± 7.1%) i/s    (3.17 μs/i) -      1.587M in   5.065707s
           rapidjson      3.156M (± 4.0%) i/s  (316.88 ns/i) -     15.777M in   5.008949s

Comparison:
                json:  3859046.5 i/s
           rapidjson:  3155778.5 i/s - 1.22x  slower
                  oj:  2190616.0 i/s - 1.76x  slower
          Oj::Parser:   315132.4 i/s - 12.25x  slower

== Parsing test from oj (258 bytes)
ruby 3.4.0dev (2024-11-13T12:32:57Z fstr-update-callba.. 9b44b455b3) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    55.682k i/100ms
                  oj    40.343k i/100ms
          Oj::Parser    25.119k i/100ms
           rapidjson    51.500k i/100ms
Calculating -------------------------------------
                json    555.808k (± 1.4%) i/s    (1.80 μs/i) -      2.784M in   5.010092s
                  oj    412.283k (± 1.7%) i/s    (2.43 μs/i) -      2.098M in   5.089900s
          Oj::Parser    279.306k (±13.3%) i/s    (3.58 μs/i) -      1.356M in   5.022079s
           rapidjson    517.177k (± 2.7%) i/s    (1.93 μs/i) -      2.626M in   5.082352s

Comparison:
                json:   555808.3 i/s
           rapidjson:   517177.1 i/s - 1.07x  slower
                  oj:   412283.2 i/s - 1.35x  slower
          Oj::Parser:   279306.5 i/s - 1.99x  slower
```
@headius
Copy link
Contributor Author

headius commented Nov 20, 2024

@byroot Your approach is pretty close to the new attempt I'm making. I will rebase my work on this PR.

@headius
Copy link
Contributor Author

headius commented Nov 20, 2024

Correction, I will rebase my PR on your PR #686 once I update it for the recent changes.

When an IO is given, we should try to write directly to it. This
patch moves that direction by always doing JSON dumping into an
OutputStream, which can be implemented on top of a given IO or by
producing a ByteList via a ByteArrayOutputStream.
This connects up the OutputStream-based generator logic to the
incoming IO parameter (an IO or nil). Also included here are some
small changes to support the new state.rb:

* Require the state.rb file at the end of ext init.
* Move State#configure to _configure.
* Add State#strict? as an alias for strict.
@headius
Copy link
Contributor Author

headius commented Nov 20, 2024

@byroot Github seems to be confused about how this has been rebased. The two commits I made here are the only two for JRuby, and the others should already be on master somewhere. Perhaps my rebase from your branch has confused it too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants