Add ability to limit fields added during Mapper#merge#102936
Add ability to limit fields added during Mapper#merge#102936elasticsearchmachine merged 50 commits intoelastic:mainfrom
Conversation
This is extracting a piece of independent functionality from elastic#96235
|
Pinging @elastic/es-search (Team:Search) |
|
@elasticmachine update branch |
…he cause of the segfault
…hey're the cause of the segfault" This reverts commit 9c49a87.
As an additional optimization, try to add the full object first. Only merge sub-mappers one by one in case there wasn't enough budget for the full object mappper.
javanna
left a comment
There was a problem hiding this comment.
I did another round. Thanks for all the progress you made. Don't get discouraged by the amount of comments, these are all details around refining, polishing and adding some tests. We are very close!
Could you also add docs to clarify design choices: why the field limit is checked as part of merge. I could see future readers getting confused about that. And describe how it works, meaning it is already applied in the data node, and if certain fields are allowed there, they are going to be allowed when the master does the merge of the mappings, but there are scenarios where data nodes allowed addition of fields that the master will reject. Am I getting this right? Could you expand on the mechanics of this, perhaps in the javadocs of MapperMergeContext?
I've documented most of that in the other PR. See also https://github.com/elastic/elasticsearch/pull/96235/files#diff-32c1e36544e0c0ef775f9e26576ef7df0504cc05466e513ba7b143947c234784R590 |
javanna
left a comment
There was a problem hiding this comment.
I did a last round, this look really good. My final comments are mainly on strengthening the tests further and a tiny bit more docs. Thanks for all the work. No need for another review round, merge when ready.
) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885
This is extracting a piece of independent functionality from #96235
This is the result of this discussion: #96235 (comment)
The mapping merge executed on the master node is the only authoritative place where we have a consistent view of the cluster state as it's guaranteed to be executed on a single thread on the master node. Therefore, there can be no race conditions and we exactly know how many fields are left before hitting the field limit.
Therefore, this is the best place to add as many fields as possible without going over the limit if
index.mapping.total_fields.ignore_dynamic_beyond_limitis set totruein the index settings.This change makes it possible to limit the number of fields added during mapping merger. However, that feature isn't actually used outside of tests.
In #96235, the feature to limit fields during the merge process is used to guarantee that the cluster state update never fails due to dynamic mapping updates that exceed the field limit in case
ignore_dynamic_beyond_limitis enabled.This works because after each dynamic mapping update, the indexing request gets retried (there's no change to that logic in this or related PRs) so that the document parsing takes the new mapping into account.
This only applies to dynamic mapping updates aka auto-updates. When explicitly trying to update the mapping with too many fields, the put mapping operation will still fail, even if
ignore_dynamic_beyond_limitis enabled.Depends on:
MergeResult#buildMergedMappers#104578