Introduce bulk routing optimization. by howardhuanghua · Pull Request #62020 · elastic/elasticsearch

howardhuanghua · 2020-09-06T07:20:32Z

Issue

In large scale cluster, to avoid single shard size too big, user needs to set too many shards in a single index.
For example, one of our user's production cluster has 100 data nodes, and each of downlink index has around 150 shards to control single shard size around 30-50GB:

health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   downlink-20200904.3  gT_4J_-dQHuOq_G8cE00kg 153   0 8098910441            0      4.4tb          4.4tb
green  open   downlink-20200904    _ruilUSwQMq7aEYfXwBykQ 150   0 7078752021            0      3.8tb          3.8tb
green  open   downlink-20200904.9  EzVpJ4PTS1ae5E3SBYAXiA 159   0 8201595897            0      4.5tb          4.5tb
green  open   downlink-20200903.4  mRGqgjXXRN2rQQ6uj8Womw 154   0 1892298596            0        1tb            1tb
green  open   downlink-20200905.6  Z4TsjPjvQhO-APWo5aoPWw 156   0 6674856189            0      3.6tb          3.6tb
green  open   downlink-20200904.8  TGn8-xCcRzikNb2ZejkfxQ 158   0 7345715637            0        4tb            4tb
green  open   downlink-20200903.3  lmB-kjw9Rt6g2sdEEzjnuA 153   0 7452545046            0      4.1tb          4.1tb
green  open   downlink-20200905.8  GmECfHqESi6n3VxADWP8eg 158   0 4064526603            0      2.5tb          2.5tb

This cluster has total 1 million+ docs TPS, we found cluster has 0.1%+ bulk reject rate per minute.
The main reason is that each bulk request (bulk size around 2MB) would be separated to 100 sub write requests to each data node. In this case, if one or several data nodes got Old GC, unstable network or hardware failures, we call these shards as long-tail sick shards (they are randomly and temporarily), they would cause the bulk request to be delayed and finally cause the bulk reject exception. In scenarios with a large number of shards in a single index, this bulk reject issue can be significant.

Solution

User could use routing to control a bulk request only goes to a single shard, to avoid long-tail sick shards affection. However, this needs user's extra developments for each business index, sometimes specially in logging platform scenario, they don't care about routings, high performance of bulk throughput is very important.

This PR introduces a bulk routing optimization to speed up bulk performance. There is an index level setting called index.bulk_routing.enabled to control this optimization, it's false by default. If no user defined _id field and no user defined _routing, with the setting enabled, all the sub requests of the same index in a bulk request will be added the same routing automatically. Then same index's writing requests only go to one of the shards. Here is the setting:

// set it on index creation
PUT /my-index
{
    "settings" : {
        "index" : {
            "bulk_routing.enabled" : true
        }
    }
}

// update it dynamically
PUT /my-index/_settings
{
  "index.bulk_routing.enabled": true
}

// use template to control some of the indices
PUT /_template/bulk_routing_template
{
  "index_patterns": ["indices-prefix*"],
  "settings": {
    "bulk_routing.enabled": true
  }
}

Then if user has large cluster with lots of shards in an index, user could simple enable this optimization on this index without extra developments.

Performance improvement

In the above customer production cluster, after enabling this bulk routing optimization, The original rejection rate drops to 0, the CPU drops 25%, and the write speed increases 10%:

dnhatn

Interesting optimization! But it breaks GET. Also, assigning routing will increase indexing time and storage. I wonder if we instead append a few letters to the generated-id or repeatedly generate a new id until all sub-requests are partitioned to a single shard.

elasticmachine · 2020-09-06T15:01:43Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

howardhuanghua · 2020-09-07T00:39:11Z

@dnhatn Thanks for checking. Yes, getById would be breaked, this is the same as user-defined routing case. When user defined their own routing in writing request, it also breaks GET. User need to query docs to get routing first and specific routing in GET operation. And also extra routing would increase storage, so I limit it only 10 bytes length. As you suggested it's better if we could make auto-generated-id to route all sub-requests to a single shard, I will continue to investigate it.

dnhatn · 2020-09-07T01:04:31Z

getById would be breaked, this is the same as user-defined routing case. When user defined their own routing in writing request, it also breaks GET.

If _source is disabled, then users can't retrieve the auto-generated routing.

howardhuanghua · 2020-09-07T02:19:42Z

_source is meta field? I could retrive _routing field with _source disabled in 6.4.3 version:

"hits": [
      {
        "_index": "live_stream_stat_data-60@1597075200000_7",
        "_type": "doc",
        "_id": "qf24oKYRwwzLA8g06P3QgI0A-gAAF_QD9o",
        "_score": 1,
        "_routing": "a0a611c3"
      }
]

Or the latest version has moved _routing into _source?

dnhatn · 2020-09-07T02:49:06Z

Ah, you're right.

itiyamas · 2020-09-07T04:18:08Z

Here, the performance gain is not just due to avoiding the long-tail sick shard. By making this change, you are able to change the bulk size per shard, which has a huge impact on performance. Additionally, the transport layer co-ordination overhead is reduced here too.
It also helps with reducing the experimentation that one needs to do with bulk size when we scale the shard count.

It would be good if you keep similar id format as current as it is optimized for FST storage and some guarantees are maintained there too.

howardhuanghua · 2020-09-07T06:37:39Z

It would be good if you keep similar id format as current as it is optimized for FST storage and some guarantees are maintained there too.

Would you mean the current generated random bulk routing?
tmpRouting = UUIDs.randomBase64UUID().substring(12);
The main reasons that I use this randomBase64UUID and limit it in 10 bytes length:

It's quite fast than base64UUID as base64UUID has synchronized internal.
Lower down the routing storage affection.

itiyamas · 2020-09-07T10:05:07Z

Check this.

henningandersen · 2021-01-13T10:10:43Z

We discussed this as a team and have following feedback:

It seems very targetted to the specific situation.
We wonder if Increase default write queue size #59559 has helped the reject rate?
The 10 percent performance increase might be achieveable in other ways that we want to explore, for instance smarter shard level batching.
Future coordinator level batching may also help this scenario and be more widely applicable.

howardhuanghua · 2021-01-16T12:01:40Z

Thanks @henningandersen. Currently more customers have large scale clusters, 100+ nodes, 50-100+ shards would be normal case. Tencent has lots of these huge clusters in logging scenario. We introduce this bulk routing optimization mainly to solve fan-out issues in these large scale cluster case, as network issues, garbage collections would cause long-tail shard operation performance issue. Even we could increase the bulk queue, the pending operations would still cause a lot of resources. Would you please help to share more materials about coordinator level batching feature?

howardhuanghua added 3 commits September 6, 2020 12:10

Introduce bulk routing optimization.

ce7546f

revert extra intend

7327d16

revert unchanged

1904659

dnhatn reviewed Sep 6, 2020

View reviewed changes

dnhatn added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Sep 6, 2020

elasticmachine added the Team:Distributed Meta label for distributed team. label Sep 6, 2020

dnhatn added team-discuss and removed Team:Distributed Meta label for distributed team. labels Sep 6, 2020

dnhatn added the Team:Distributed Meta label for distributed team. label Sep 7, 2020

henningandersen removed the team-discuss label Jan 13, 2021

dnhatn mentioned this pull request Jan 14, 2021

we should't split bulk request into much shards to increase indexing concurrency on the coordinating node #67462

Closed

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce bulk routing optimization.#62020

Introduce bulk routing optimization.#62020
howardhuanghua wants to merge 3 commits intoelastic:mainfrom
TencentCloudES:bulk_routing

howardhuanghua commented Sep 6, 2020

Uh oh!

dnhatn left a comment •

edited

Loading

Uh oh!

elasticmachine commented Sep 6, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

dnhatn commented Sep 7, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

dnhatn commented Sep 7, 2020

Uh oh!

itiyamas commented Sep 7, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

itiyamas commented Sep 7, 2020

Uh oh!

henningandersen commented Jan 13, 2021

Uh oh!

howardhuanghua commented Jan 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

howardhuanghua commented Sep 6, 2020

Issue

Solution

Performance improvement

Uh oh!

dnhatn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Sep 6, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

dnhatn commented Sep 7, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

dnhatn commented Sep 7, 2020

Uh oh!

itiyamas commented Sep 7, 2020

Uh oh!

howardhuanghua commented Sep 7, 2020

Uh oh!

itiyamas commented Sep 7, 2020

Uh oh!

henningandersen commented Jan 13, 2021

Uh oh!

howardhuanghua commented Jan 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dnhatn left a comment •

edited

Loading