Conversation
|
I found another optimization that improved both speed of parsing and the memory it allocates. This PR made parsing slower by always allocating a string to parse integers and floats from. It also made it consume more memory. I was concerned about it at that time exactly because of that, but I agreed that correctness is more important than performance. However, we can have both! For "small" integers (less than 19 digits, never floats) we can compute the int value and always know that we are doing it right. Before the last commit: After the last commit: So a bit faster and 20MB less, so about 16% percent less in this case. But the memory allocated here depends on the amount of numbers and how long they are. For example, using this file: running the benchmark, before this PR: after this PR: so 5 MB less than before, from a total of 14MB, that's about 30% less memory! |
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/17 |
|
I think one of the "peek" branches is wrong. I'll see if I can merge it. I don't have a test to reproduce it yet, just one scenario I ran into. |
Let's make the JSON data contain some long strings. They are Base64-encoded in my case, but that doesn't matter.
json = %|{"a_base64":#{("a" * 5000).inspect},"b_base64":#{("a" * 10000).inspect}}|
json = "[" + ([json] * 2500).join(",") + "]"
File.write("json.json", json)
require "json"
require "benchmark"
file_io = File.open("json.json")
file_string = File.read("json.json")
Benchmark.bm do |x|
x.report("JSON.parse (string)") do
JSON.parse(file_string)
end
end
require "json"
require "benchmark"
file_io = File.open("json.json")
str = File.read("json.json")
Benchmark.bm(19) do |x|
x.report("JSON.parse(str)") do
JSON.parse(str)
end
end$ crystal build --release benchmark-pk.cr
$ /usr/bin/time -l ./benchmark-pk
user system total real
JSON.parse (string) 0.174718 0.005743 0.180461 ( 0.180668)
0,20 real 0,18 user 0,02 sys
94814208 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
5877 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
39 involuntary context switches
1868924524 instructions retired
591412605 cycles elapsed
93389632 peak memory footprintThat's before your changes, on Apple aarch64/arm64. $ /usr/bin/time -l ruby ./benchmark-pk.rb
user system total real
JSON.parse(str) 0.063432 0.003553 0.066985 ( 0.067092)
0,12 real 0,10 user 0,01 sys
90554368 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
5642 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
43 involuntary context switches
1390183537 instructions retired
367126927 cycles elapsed
86574272 peak memory footprintThat's using Ruby 3.3.0. |
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/20 |
|
Starting with Ruby 3.3 you can enable YJIT at runtime. I now put the following Ruby snippet at the top of Ruby files to |
I get the same times with or without YJIT. |
|
And just for the record, here's my benchmark in Ruby with Oj: require "benchmark"
require "json"
require "oj"
file_io = File.open("json.json")
str = File.read("json.json")
Benchmark.bm(19) do |x|
x.report("JSON.parse(str)") do
JSON.parse(str)
end
x.report("Oj.load(str)") do
Oj.load(str)
end
endi.e. Oj is faster than the Crystal version by a factor of 0.181/0.032 = 5.66. |
|
There are more improvements to be made here. I'd like to send them one by one in small PRs. If I put them all together here chances of this being merged are very small. |
|
That said, I don't think we'll reach the level of optimization of Oj. The C code is pretty hand-crafted. We could try to do the same but that file has copyright... |
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/performance-issues-with-the-json-parser/6678/25 |
| # :nodoc: | ||
| class JSON::Lexer::StringBased < JSON::Lexer | ||
| def initialize(string) | ||
| def initialize(string : String) |
There was a problem hiding this comment.
polish: This could be @string : String.
|
|
||
| pos = 0 | ||
|
|
||
| while true |
There was a problem hiding this comment.
thought: I'm wondering if the strategy here could be based on byte search (peek.index('"')) instead? The implementation for that can be more efficient than iterating over each byte. Slice#index on a byte buffer is backed by memchr, so it depends on how much the libc implementation is optimized.
We still need to sanity check for escape sequences and unallowed characters in the potential string, though. So that would reduce effectiveness. I think it could overall be better that way, but I'm not sure. Just wanted to leave this thought here. It should be good to take the current implementation for now.
There was a problem hiding this comment.
Might be true. When I needed to find the end of a JSON string in a Bytes, I came up with something like this:
def json_bytes_end_of_string_index( haystack : Bytes, offset : Int32 = 0 ) : Int32?
index = haystack.index( '"'.ord.to_u8, offset ) # memchr()
return nil if ! index
return index unless haystack[ index - 1 ] == '\\'.ord.to_u8
# Fall back to the more complete bytewise implementation below.
# ...
endI did not check if it makes sense to use strpbrk() to find the next "interesting" byte, such as quote, backslash, ...
There was a problem hiding this comment.
That's the other optimization I was thinking. It makes string parsing much faster. I didn't want to include it in this PR though!
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/stringpool-make-the-hash-key-part-of-the-public-api/6766/5 |
This reverts commit 9ef6366.
I saw this thread, so...
Two things here:
I used this Ruby/Crystal file to generate a big JSON:
Here's the benchmark:
Before:
After:
I think the difference between before and after will be bigger if there are more strings to parse.
A note for the forum thread: I tried parsing that same big file with Ruby 3.1 and Ruby was (slightly) slower: 16 seconds in Crystal vs. 19 seconds in Ruby. This is on a Mac. So I don't know why it was slower in Crystal for OP (maybe in Linux it's different?)
Regarding memory: Crystal requires 117MB to load that entire data into memory. But in Ruby it's the same. So I'm not sure how memory can be further optimized...
Please review carefully! I think tests for when the peek buffer is incomplete or unavailable might not exist right now. Feel free to continue working on top of this PR (pushing commits to this branch or creating another PR from this code).