Simplify MergeResult#buildMergedMappers#104578
Conversation
|
Pinging @elastic/es-search (Team:Search) |
MergeResult#buildMergedMappers
| if (mergedMappers == null) { | ||
| mergedMappers = new HashMap<>(existing.mappers); | ||
| } | ||
| mergedMappers.put(merged.simpleName(), merged); |
There was a problem hiding this comment.
Could the put stay where it is? in every loop iteration we only ever put a single field, so we should be ok leaving a single put call, that adds the current merged instance to the map? That put call needs to be wrapped with a check that ensures we do not go beyond the limit?
There was a problem hiding this comment.
I don't think we can do that. See also #102936 (comment)
It's important to decrement the field budget before calling merge on the sub fields. I think we should have the put calls as close as possible to the place where we decrement the field budget.
There was a problem hiding this comment.
Looking at the latest changes in #102936, that no longer touches these lines, I still don't follow :)
The put is still within the loop, even if after the conditional instead of within the conditional . The only merge call I see in this method is performed within the conditional, before the put (before and after your change). What am I missing that makes this part of the changes necessary?
There was a problem hiding this comment.
I've actually reverted this in #102936 now (see c1b4258). Now that I've removed the Consumer methods from MapperMergeContext and there's only decrementFieldBudgetIfPossible which doesn't directly add the merged mapper to the map it think it's viable to not have the put in the if/else.
What's still important though is that we decrement the budget for the shallow object mapper in org.elasticsearch.index.mapper.ObjectMapper.MergeResult#truncateObjectMapper before calling shallowObjectMapper.merge. Otherwise, if we merged the object before decrementing the budget, we wouldn't have capacity to add the object itself.
javanna
left a comment
There was a problem hiding this comment.
I left a question about moving the put call, I may not see why it needs moving. LGTM otherwise.
This is extracting a piece of independent functionality from #96235 This is the result of this discussion: #96235 (comment) The mapping merge executed on the master node is the only authoritative place where we have a consistent view of the cluster state as it's guaranteed to be executed on a single thread on the master node. Therefore, there can be no race conditions and we exactly know how many fields are left before hitting the field limit. Therefore, this is the best place to add as many fields as possible without going over the limit if `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` in the index settings. This change makes it possible to limit the number of fields added during mapping merger. However, that feature isn't actually used outside of tests. In #96235, the feature to limit fields during the merge process is used to guarantee that the cluster state update never fails due to dynamic mapping updates that exceed the field limit in case `ignore_dynamic_beyond_limit` is enabled. This works because after each dynamic mapping update, the indexing request gets retried (there's no change to that logic in this or related PRs) so that the document parsing takes the new mapping into account. This only applies to dynamic mapping updates aka auto-updates. When explicitly trying to update the mapping with too many fields, the put mapping operation will still fail, even if `ignore_dynamic_beyond_limit` is enabled. Depends on: - [x] #104512 - [x] #104578
Encapsulates some mechanical changes from #102936