-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON serialization/deserialization performance #254
Comments
In addition to ffjson, I came across https://github.com/mailru/easyjson recently which is a similar concept, using code generation, but purports to be faster. Haven't checked it out in detail though. |
Just tested draining the body in |
This is just an experiment. I don't have an opinion yet if this is a good or bad idea, really. Feedback welcome... I just tested creating the JSON manually, hard-coded via a This currently only covers the For reference, here are the benchmark results (run via
Applications could further speed up the process and remove JSON serialization altogether if they supply a |
Hi, |
@r--w Yes, escaping is required and it is not bulletproof for sure. It's just an experiment to find out if it's worth the effort. I will never implement this for all code paths: Probably only Bulk/Search is critical enough. |
In my opinion the option to create JSON manually (during indexing) should be sufficient for all "hardcore" users. Just didn't know that it's avaialble (I should have read the source code before ;) |
No. Documentation should be better ;-) |
@olivere Is this something that you are actively pursuing? |
Not at the moment. There is some performance testing ahead in my day-to-day job, so we'll see how that goes and if I can get the results back into Elastic. |
I have used https://github.com/buger/jsonparser, it's about 10X faster than stock json package |
@rami-dabain Do you have any stats on how much time is actually spend in JSON encoding/decoding for your use case? Can you post your code somewhere? |
A pattern I recently use is to split work into several goroutines via a pipeline. The excellent JSON performance might not be perfect in stdlib. However, if you use all cores, you might not need I'm very interested in getting feedback. Does that work for you as well? |
I havent used jsonparser directly with ES. However I used it in an HTTP receiver service, CPU was maxed out at around 12k requests/second (as the receiver was decoding the json-request), that went up to ~45k requests/second when I switched parsing the json using the mentioned library, most gains were on fetching values for a known key-path. I suggest giving it a try as it doesn't require any "compilation"/code-generation |
I'm not going to use |
That would be great, and more flexible |
TBH I think you could actually do that already by implementing your own I don't have the time to play around with it now, but I'll keep this issue open for a while as a way to gather feedback. |
I Use the following:
As i need only one result, then :
If no json.unmarshall is called within elastic then all is fine! |
In that case, What would probably work is to implement your own type AwesomeDecoder struct {
}
func (d *AwesomeDecoder) Decode(data []byte, v interface{}) error {
switch t := v.(type) {
default:
return json.Unmarshal(data, v)
case *elastic.SearchResult:
// Use your favorite JSON parser here to fill in the req'd fields for *SearchResult
// ...
}
} Yeah, not nice, but that probably works without changing a single line of code in Elastic. |
I'm closing this for two reasons: One, I will keep using Second, based on my experience with production load, the problem of serializing/deserializing becomes less of an issue once you do things concurrently. E.g. deserialization has been an issue for me when scrolling; but scrolling in parallel makes this an I/O problem and JSON serialization/deserialization simply is not a problem any more. If you still have performance issues regarding JSON serialization/deserialization, let me know in a new issue. |
JSON serialization/deserialization seems to be an issue with many users, especially running Bulk / Search operations at scale.
Things we could do:
sync.Pool
to reduce GC pressure. Something like this.Related issues #253, #208.
The text was updated successfully, but these errors were encountered: