JSON serialization/deserialization performance #254

olivere · 2016-03-29T10:20:49Z

JSON serialization/deserialization seems to be an issue with many users, especially running Bulk / Search operations at scale.

Things we could do:

Use ffjson for the hot paths like Search/Bulk.
Re-using buffers via sync.Pool to reduce GC pressure. Something like this.
Keep an eye on #317 and/or use go-drainclose before this gets fixed in Go. Of course, we need to check if this has an impact for Elastic as well.

Related issues #253, #208.

The text was updated successfully, but these errors were encountered:

dimfeld · 2016-03-29T13:47:44Z

In addition to ffjson, I came across https://github.com/mailru/easyjson recently which is a similar concept, using code generation, but purports to be faster. Haven't checked it out in detail though.

olivere · 2016-03-29T14:54:38Z

Just tested draining the body in PerformRequest (3.) but that doesn't make a difference. If it's a regression in stdlib, it'll be fixed. If it's not, it's safe to keep it as is. Either way, it won't make sense to fix it in Elastic.

olivere · 2016-03-30T11:15:05Z

This is just an experiment. I don't have an opinion yet if this is a good or bad idea, really. Feedback welcome...

I just tested creating the JSON manually, hard-coded via a bytes.Buffer, in the json-serialization branch. Compare before and after.

This currently only covers the action_and_metadata line (see bulk docs), where JSON serialization can be hard-coded because we know the fields in advance (in contrast to the document data).

For reference, here are the benchmark results (run via go test -run=Bulk -bench=Bulk -benchmem, then compare before/after via benchcmp):

benchmark                                              old ns/op     new ns/op     delta
BenchmarkBulkDeleteRequestSerialization-4              3324          3112          -6.38%
BenchmarkBulkIndexRequestSerialization-4               5754          5710          -0.76%
BenchmarkBulkEstimatedSizeInBytesWith1Request-4        13423         12547         -6.53%
BenchmarkBulkEstimatedSizeInBytesWith100Requests-4     1328849       1213027       -8.72%
BenchmarkBulkUpdateRequestSerialization-4              5009          4641          -7.35%

benchmark                                              old allocs     new allocs     delta
BenchmarkBulkDeleteRequestSerialization-4              29             34             +17.24%
BenchmarkBulkIndexRequestSerialization-4               37             43             +16.22%
BenchmarkBulkEstimatedSizeInBytesWith1Request-4        109            125            +14.68%
BenchmarkBulkEstimatedSizeInBytesWith100Requests-4     10612          12212          +15.08%
BenchmarkBulkUpdateRequestSerialization-4              42             47             +11.90%

benchmark                                              old bytes     new bytes     delta
BenchmarkBulkDeleteRequestSerialization-4              1512          640           -57.67%
BenchmarkBulkIndexRequestSerialization-4               2384          1528          -35.91%
BenchmarkBulkEstimatedSizeInBytesWith1Request-4        6259          3660          -41.52%
BenchmarkBulkEstimatedSizeInBytesWith100Requests-4     631082        371193        -41.18%
BenchmarkBulkUpdateRequestSerialization-4              2440          1520          -37.70%

Applications could further speed up the process and remove JSON serialization altogether if they supply a string or json.RawMessage as Elastic only runs JSON serializer if the data structure cannot be transformed into a string directly (see e.g. bulk_index_request.go).

r--w · 2016-03-31T09:13:47Z

Hi,
I just tried creating JSON documents manually and passing it as a string. This reduced memory significantly. One thing that is worth mentioning is that you need to escape values as in https://github.com/golang/go/blob/5fea2ccc77eb50a9704fa04b7c61755fe34e1d95/src/encoding/json/encode.go#L788 and use sync.Pool for bytes.Buffer in order to avoid allocations.

olivere · 2016-03-31T09:37:12Z

@r--w Yes, escaping is required and it is not bulletproof for sure. It's just an experiment to find out if it's worth the effort. I will never implement this for all code paths: Probably only Bulk/Search is critical enough.

r--w · 2016-03-31T09:45:32Z

In my opinion the option to create JSON manually (during indexing) should be sufficient for all "hardcore" users. Just didn't know that it's avaialble (I should have read the source code before ;)

olivere · 2016-03-31T09:49:51Z

No. Documentation should be better ;-)

sidazhang · 2016-06-29T17:12:27Z

@olivere Is this something that you are actively pursuing?

olivere · 2016-06-29T17:16:06Z

Not at the moment.

There is some performance testing ahead in my day-to-day job, so we'll see how that goes and if I can get the results back into Elastic.

rami-dabain · 2016-08-29T11:37:06Z

I have used https://github.com/buger/jsonparser, it's about 10X faster than stock json package

olivere · 2016-08-29T12:29:39Z

@rami-dabain Do you have any stats on how much time is actually spend in JSON encoding/decoding for your use case? Can you post your code somewhere?

olivere · 2016-08-30T17:40:31Z

A pattern I recently use is to split work into several goroutines via a pipeline. The excellent golang.org/x/sync/errgroup makes that dead simple. Here's an example of that pattern.

JSON performance might not be perfect in stdlib. However, if you use all cores, you might not need
to squeeze out the last nanoseconds. With the example above, I was able to saturate our Elasticsearch cluster, and suddenly JSON decoding performance seems like a minor issue. ;-)

I'm very interested in getting feedback. Does that work for you as well?

rami-dabain · 2016-09-02T08:50:13Z

I havent used jsonparser directly with ES. However I used it in an HTTP receiver service, CPU was maxed out at around 12k requests/second (as the receiver was decoding the json-request), that went up to ~45k requests/second when I switched parsing the json using the mentioned library, most gains were on fetching values for a known key-path. I suggest giving it a try as it doesn't require any "compilation"/code-generation

olivere · 2016-09-02T09:24:41Z

I'm not going to use jsonparser in Elastic by default. But what could possible work is to return structures (for performance critical services) in a way that users are able to use alternative JSON parsers like jsonparser.

rami-dabain · 2016-09-02T09:27:29Z

That would be great, and more flexible

olivere · 2016-09-02T09:35:26Z

TBH I think you could actually do that already by implementing your own Decoder as described in the wiki. However, you must probably return a matching response, i.e. a BulkResponse for bulk requests or a SearchResponse for search requests.

I don't have the time to play around with it now, but I'll keep this issue open for a while as a way to gather feedback.

rami-dabain · 2016-09-02T10:01:41Z

I Use the following:

               .Search(index).
        Type(type).
        Query(query).
        From(0).Size(1).
        Do()

As i need only one result, then :

                          for _, hit := range searchResult.Hits.Hits {
            jsonparser.ArrayEach(*hit.Source, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
                v, _ := jsonparser.ParseString(value)
                println(v)
            }, "clients", result.ClientId, "filters")

            break // only one result is enough
        }

If no json.unmarshall is called within elastic then all is fine!

olivere · 2016-09-02T10:23:47Z

In that case, json.Unmarshal is called (see here). However, the _source field decoding is deferred by using json.RawMessage (see here). So I guess your code doesn't save that much time.

What would probably work is to implement your own Decoder and do a type switch in its Decode(...) func on the desired result (e.g. *SearchResult), then do the decoding with whatever you choose as your favorite JSON decoder. You absolutely have to make sure that all the fields of e.g. *SearchResult that your application requires are set by the your Decoder implementation, otherwise you'd shoot yourself in the foot. Something like this (untested!):

type AwesomeDecoder struct {
}

func (d *AwesomeDecoder) Decode(data []byte, v interface{}) error {
  switch t := v.(type) {
  default:
    return json.Unmarshal(data, v)
  case *elastic.SearchResult:
    // Use your favorite JSON parser here to fill in the req'd fields for *SearchResult
    // ...
  }
}

Yeah, not nice, but that probably works without changing a single line of code in Elastic.

olivere · 2016-09-22T14:31:41Z

I'm closing this for two reasons:

One, I will keep using encoding/json for Elastic as the sole mechanism to serialize/deserialize JSON. There are alternative ways to do so but these require workarounds such as the Decoder-method described above.

Second, based on my experience with production load, the problem of serializing/deserializing becomes less of an issue once you do things concurrently. E.g. deserialization has been an issue for me when scrolling; but scrolling in parallel makes this an I/O problem and JSON serialization/deserialization simply is not a problem any more.

If you still have performance issues regarding JSON serialization/deserialization, let me know in a new issue.

olivere changed the title ~~JSON performance~~ JSON serialization/deserialization performance Mar 29, 2016

olivere closed this as completed Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON serialization/deserialization performance #254

JSON serialization/deserialization performance #254

olivere commented Mar 29, 2016

dimfeld commented Mar 29, 2016

olivere commented Mar 29, 2016

olivere commented Mar 30, 2016

r--w commented Mar 31, 2016

olivere commented Mar 31, 2016

r--w commented Mar 31, 2016

olivere commented Mar 31, 2016

sidazhang commented Jun 29, 2016

olivere commented Jun 29, 2016

rami-dabain commented Aug 29, 2016

olivere commented Aug 29, 2016

olivere commented Aug 30, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

olivere commented Sep 22, 2016

JSON serialization/deserialization performance #254

JSON serialization/deserialization performance #254

Comments

olivere commented Mar 29, 2016

dimfeld commented Mar 29, 2016

olivere commented Mar 29, 2016

olivere commented Mar 30, 2016

r--w commented Mar 31, 2016

olivere commented Mar 31, 2016

r--w commented Mar 31, 2016

olivere commented Mar 31, 2016

sidazhang commented Jun 29, 2016

olivere commented Jun 29, 2016

rami-dabain commented Aug 29, 2016

olivere commented Aug 29, 2016

olivere commented Aug 30, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

rami-dabain commented Sep 2, 2016

olivere commented Sep 2, 2016

olivere commented Sep 22, 2016