Introduce bulk routing optimization.#62020
Introduce bulk routing optimization.#62020howardhuanghua wants to merge 3 commits intoelastic:mainfrom
Conversation
There was a problem hiding this comment.
Interesting optimization! But it breaks GET. Also, assigning routing will increase indexing time and storage. I wonder if we instead append a few letters to the generated-id or repeatedly generate a new id until all sub-requests are partitioned to a single shard.
|
Pinging @elastic/es-distributed (:Distributed/CRUD) |
|
@dnhatn Thanks for checking. Yes, |
If _source is disabled, then users can't retrieve the auto-generated routing. |
|
_source is meta field? I could retrive _routing field with _source disabled in 6.4.3 version: Or the latest version has moved _routing into _source? |
|
Ah, you're right. |
|
Here, the performance gain is not just due to avoiding the long-tail sick shard. By making this change, you are able to change the bulk size per shard, which has a huge impact on performance. Additionally, the transport layer co-ordination overhead is reduced here too. It would be good if you keep similar id format as current as it is optimized for FST storage and some guarantees are maintained there too. |
Would you mean the current generated random bulk routing?
|
|
Check this. |
|
We discussed this as a team and have following feedback:
|
|
Thanks @henningandersen. Currently more customers have large scale clusters, 100+ nodes, 50-100+ shards would be normal case. Tencent has lots of these huge clusters in logging scenario. We introduce this bulk routing optimization mainly to solve fan-out issues in these large scale cluster case, as network issues, garbage collections would cause long-tail shard operation performance issue. Even we could increase the bulk queue, the pending operations would still cause a lot of resources. Would you please help to share more materials about coordinator level batching feature? |
Issue
In large scale cluster, to avoid single shard size too big, user needs to set too many shards in a single index.
For example, one of our user's production cluster has 100 data nodes, and each of
downlinkindex has around 150 shards to control single shard size around 30-50GB:This cluster has total 1 million+ docs TPS, we found cluster has 0.1%+ bulk reject rate per minute.
The main reason is that each bulk request (bulk size around 2MB) would be separated to 100 sub write requests to each data node. In this case, if one or several data nodes got Old GC, unstable network or hardware failures, we call these shards as long-tail sick shards (they are randomly and temporarily), they would cause the bulk request to be delayed and finally cause the bulk reject exception. In scenarios with a large number of shards in a single index, this bulk reject issue can be significant.
Solution
User could use routing to control a bulk request only goes to a single shard, to avoid long-tail sick shards affection. However, this needs user's extra developments for each business index, sometimes specially in logging platform scenario, they don't care about routings, high performance of bulk throughput is very important.
This PR introduces a bulk routing optimization to speed up bulk performance. There is an index level setting called
index.bulk_routing.enabledto control this optimization, it's false by default. If no user defined_idfield and no user defined_routing, with the setting enabled, all the sub requests of the same index in a bulk request will be added the same routing automatically. Then same index's writing requests only go to one of the shards. Here is the setting:Then if user has large cluster with lots of shards in an index, user could simple enable this optimization on this index without extra developments.
Performance improvement
In the above customer production cluster, after enabling this bulk routing optimization, The original rejection rate drops to 0, the CPU drops 25%, and the write speed increases 10%: