Skip to content

Optimize JSON parsing a bit#14366

Merged
straight-shoota merged 7 commits intomasterfrom
optimize-json
May 14, 2024
Merged

Optimize JSON parsing a bit#14366
straight-shoota merged 7 commits intomasterfrom
optimize-json

Conversation

@asterite
Copy link
Member

I saw this thread, so...

Two things here:

  1. Read string/IO byte per byte instead of char per char
  2. When trying to parse a string in the IO case, try to use the peek buffer if there's one

I used this Ruby/Crystal file to generate a big JSON:

json = %({"foo": 1, "bar": 2, "hello": "this will have an escape in it\\\"oh well", "banana": true, "array": [1, 2, 3, "a string that is a bit long"], "hash": {"a": 1, "b": 2, "c": 3}})
file = "[" + ([json] * 100000).join(",") + "]"
File.write("json.json", file)

Here's the benchmark:

require "json"
require "benchmark"

file_io = File.open("json.json")
file_string = File.read("json.json")

Benchmark.ips do |x|
  x.report("JSON.parse (string)") do
    JSON.parse(file_string)
  end
  x.report("JSON.parse (IO)") do
    file_io.pos = 0
    JSON.parse(file_io)
  end
end

Before:

JSON.parse (string)   8.41  (118.97ms) (± 0.86%)  117MB/op        fastest
    JSON.parse (IO)   3.64  (274.72ms) (± 1.99%)  117MB/op   2.31× slower

After:

JSON.parse (string)  11.98  ( 83.50ms) (± 7.70%)  117MB/op        fastest
    JSON.parse (IO)   9.06  (110.41ms) (± 6.26%)  117MB/op   1.32× slower

I think the difference between before and after will be bigger if there are more strings to parse.

A note for the forum thread: I tried parsing that same big file with Ruby 3.1 and Ruby was (slightly) slower: 16 seconds in Crystal vs. 19 seconds in Ruby. This is on a Mac. So I don't know why it was slower in Crystal for OP (maybe in Linux it's different?)

Regarding memory: Crystal requires 117MB to load that entire data into memory. But in Ruby it's the same. So I'm not sure how memory can be further optimized...

Please review carefully! I think tests for when the peek buffer is incomplete or unavailable might not exist right now. Feel free to continue working on top of this PR (pushing commits to this branch or creating another PR from this code).

@asterite
Copy link
Member Author

I found another optimization that improved both speed of parsing and the memory it allocates.

This PR made parsing slower by always allocating a string to parse integers and floats from. It also made it consume more memory. I was concerned about it at that time exactly because of that, but I agreed that correctness is more important than performance. However, we can have both! For "small" integers (less than 19 digits, never floats) we can compute the int value and always know that we are doing it right.

Before the last commit:

JSON.parse (string)  10.76  ( 92.92ms) (± 8.00%)  125MB/op        fastest
    JSON.parse (IO)   7.51  (133.21ms) (± 5.99%)  125MB/op   1.43× slower

After the last commit:

JSON.parse (string)  12.09  ( 82.69ms) (± 1.28%)  105MB/op        fastest
    JSON.parse (IO)   8.06  (124.08ms) (± 2.37%)  105MB/op   1.50× slower

So a bit faster and 20MB less, so about 16% percent less in this case.

But the memory allocated here depends on the amount of numbers and how long they are. For example, using this file:

json = "[1234234, 2982374, 3982734, 49827344, 592834, 65825, 723498, 82348, 9239847324, 1082348, 1123498, 122348, 132348, 142347, 152348, 16234283]"
file = "[" + ([json] * 10000).join(",") + "]"
File.write("json.json", file)

running the benchmark, before this PR:

JSON.parse (string)  95.86  ( 10.43ms) (± 1.44%)  13.9MB/op        fastest
    JSON.parse (IO)  42.03  ( 23.79ms) (± 3.17%)  13.9MB/op   2.28× slower

after this PR:

JSON.parse (string) 205.35  (  4.87ms) (± 1.11%)  9.06MB/op        fastest
    JSON.parse (IO)  76.83  ( 13.02ms) (± 1.18%)  9.06MB/op   2.67× slower

so 5 MB less than before, from a total of 14MB, that's about 30% less memory!

@crysbot
Copy link
Collaborator

crysbot commented Mar 16, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/17

@asterite
Copy link
Member Author

I think one of the "peek" branches is wrong. I'll see if I can merge it. I don't have a test to reproduce it yet, just one scenario I ran into.

@asterite asterite marked this pull request as draft March 16, 2024 17:51
@asterite asterite marked this pull request as ready for review March 16, 2024 18:16
@philipp-kempgen
Copy link

A note for the forum thread: I tried parsing that same big file with Ruby 3.1 and Ruby was (slightly) slower: 16 seconds in Crystal vs. 19 seconds in Ruby. This is on a Mac. So I don't know why it was slower in Crystal for OP

Let's make the JSON data contain some long strings. They are Base64-encoded in my case, but that doesn't matter.

gen-json.rb:

json = %|{"a_base64":#{("a" * 5000).inspect},"b_base64":#{("a" * 10000).inspect}}|
json = "[" + ([json] * 2500).join(",") + "]"
File.write("json.json", json)

benchmark-pk.cr:

require "json"
require "benchmark"

file_io = File.open("json.json")
file_string = File.read("json.json")

Benchmark.bm do |x|
  x.report("JSON.parse (string)") do
    JSON.parse(file_string)
  end
end

benchmark-pk.rb:

require "json"
require "benchmark"

file_io = File.open("json.json")
str = File.read("json.json")

Benchmark.bm(19) do |x|
  x.report("JSON.parse(str)") do
    JSON.parse(str)
  end
end
$ crystal build --release benchmark-pk.cr
$ /usr/bin/time -l ./benchmark-pk
                          user     system      total        real
JSON.parse (string)   0.174718   0.005743   0.180461 (  0.180668)
        0,20 real         0,18 user         0,02 sys
            94814208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                5877  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  39  involuntary context switches
          1868924524  instructions retired
           591412605  cycles elapsed
            93389632  peak memory footprint

That's before your changes, on Apple aarch64/arm64.

$ /usr/bin/time -l ruby ./benchmark-pk.rb
                          user     system      total        real
JSON.parse(str)       0.063432   0.003553   0.066985 (  0.067092)
        0,12 real         0,10 user         0,01 sys
            90554368  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                5642  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  43  involuntary context switches
          1390183537  instructions retired
           367126927  cycles elapsed
            86574272  peak memory footprint

That's using Ruby 3.3.0.
The Ruby version is faster by a factor of 0.181/0.067 = 2.70.

@crysbot
Copy link
Collaborator

crysbot commented Mar 17, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/20

@jzakiya
Copy link

jzakiya commented Mar 17, 2024

Starting with Ruby 3.3 you can enable YJIT at runtime.
It would be nice to see the w/wo YJIT times for this.

I now put the following Ruby snippet at the top of Ruby files to
automatically enable YJIT when using Ruby >=3.3.

# Enable YJIT if using CRuby >= 3.3"
RubyVM::YJIT.enable if RUBY_ENGINE == 'ruby' and RUBY_VERSION.to_f >= 3.3

@philipp-kempgen
Copy link

Starting with Ruby 3.3 you can enable YJIT at runtime. It would be nice to see the w/wo YJIT times for this.

I get the same times with or without YJIT.

@philipp-kempgen
Copy link

And just for the record, here's my benchmark in Ruby with Oj:

require "benchmark"
require "json"
require "oj"

file_io = File.open("json.json")
str = File.read("json.json")

Benchmark.bm(19) do |x|
  x.report("JSON.parse(str)") do
    JSON.parse(str)
  end
  x.report("Oj.load(str)") do
    Oj.load(str)
  end
end
                          user     system      total        real
JSON.parse(str)       0.063371   0.003690   0.067061 (  0.067182)
Oj.load(str)          0.028192   0.003248   0.031440 (  0.031523)

i.e. Oj is faster than the Crystal version by a factor of 0.181/0.032 = 5.66.

@asterite
Copy link
Member Author

There are more improvements to be made here. I'd like to send them one by one in small PRs. If I put them all together here chances of this being merged are very small.

@asterite
Copy link
Member Author

That said, I don't think we'll reach the level of optimization of Oj. The C code is pretty hand-crafted. We could try to do the same but that file has copyright...

@crysbot
Copy link
Collaborator

crysbot commented Mar 21, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/25

# :nodoc:
class JSON::Lexer::StringBased < JSON::Lexer
def initialize(string)
def initialize(string : String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polish: This could be @string : String.


pos = 0

while true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm wondering if the strategy here could be based on byte search (peek.index('"')) instead? The implementation for that can be more efficient than iterating over each byte. Slice#index on a byte buffer is backed by memchr, so it depends on how much the libc implementation is optimized.

We still need to sanity check for escape sequences and unallowed characters in the potential string, though. So that would reduce effectiveness. I think it could overall be better that way, but I'm not sure. Just wanted to leave this thought here. It should be good to take the current implementation for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be true. When I needed to find the end of a JSON string in a Bytes, I came up with something like this:

def json_bytes_end_of_string_index( haystack : Bytes, offset : Int32 = 0 ) : Int32?
  index = haystack.index( '"'.ord.to_u8, offset )  # memchr()
  return nil  if ! index
  return index  unless haystack[ index - 1 ] == '\\'.ord.to_u8
  # Fall back to the more complete bytewise implementation below.
  # ...
end

I did not check if it makes sense to use strpbrk() to find the next "interesting" byte, such as quote, backslash, ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the other optimization I was thinking. It makes string parsing much faster. I didn't want to include it in this PR though!

@crysbot
Copy link
Collaborator

crysbot commented Apr 17, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/stringpool-make-the-hash-key-part-of-the-public-api/6766/5

straight-shoota added a commit that referenced this pull request Jul 10, 2024
straight-shoota added a commit that referenced this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants