Skip to content

Revert change (#90674)#103865

Merged
elasticsearchmachine merged 8 commits intoelastic:mainfrom
benwtrent:feature/revert-90674
Jan 4, 2024
Merged

Revert change (#90674)#103865
elasticsearchmachine merged 8 commits intoelastic:mainfrom
benwtrent:feature/revert-90674

Conversation

@benwtrent
Copy link
Copy Markdown
Member

Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to account for later changes.

This is in contrast with: #103858

closes: #103011

@benwtrent benwtrent added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types v8.12.1 v8.13.0 v8.11.4 labels Jan 3, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 3, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Copy Markdown
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this and compared line by line with the PR that it reverts. It looks good to me. I think it is wise to revert at this point, and reconsider what we want to do next. You mentioned it did not revert cleanly: did the problems only come from the float handling in postProcessDynamicArrayMapping for dynamic mapping of vector fields?

Thanks for jumping on this!

@benwtrent
Copy link
Copy Markdown
Member Author

did the problems only come from the float handling in postProcessDynamicArrayMapping for dynamic mapping of vector fields?

Correct, the logic there changed from a separate bug fix.

@benwtrent benwtrent added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge labels Jan 4, 2024
@benwtrent
Copy link
Copy Markdown
Member Author

@elasticmachine update branch

@benwtrent benwtrent removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 4, 2024
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 4, 2024
@benwtrent
Copy link
Copy Markdown
Member Author

@elasticmachine update branch

new HashMap<>(),
new HashSet<>(),
new LinkedHashMap<>(),
new HashMap<>(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ it makes sense to go back to the original hashmap, which we had before.

@benwtrent
Copy link
Copy Markdown
Member Author

run elasticsearch-ci/part-2

@benwtrent
Copy link
Copy Markdown
Member Author

run elasticsearch-ci/part-1
run elasticsearch-ci/part-3
run elasticsearch-ci/part-4
run elasticsearch-ci/bwc-snapshots
run elasticsearch-ci/packaging-tests-unix-sample
run elasticsearch-ci/packaging-tests-windows-sample

@felixbarny
Copy link
Copy Markdown
Member

FWIW, I'm also happy about this change as it will make #102936 and #96235 much easier as we don't need separate implementations for mapperSize in both Mapper and Mapper.Builder that need to be kept in sync.

Before we re-introduce this change (in case we want to do that at some point), we should really merge the two PRs mentioned above first.

@benwtrent
Copy link
Copy Markdown
Member Author

@felixbarny

Before we re-introduce this change (in case we want to do that at some point), we should really merge the two PRs mentioned above first.

By "re-introduce" you mean adding back "keep builders" change that this PR reverts?

@benwtrent
Copy link
Copy Markdown
Member Author

@elasticmachine update branch

@felixbarny
Copy link
Copy Markdown
Member

By "re-introduce" you mean adding back "keep builders" change that this PR reverts?

Yes, that's what I meant. Aka revert the revert.

So before we're considering to re-introduce to "Store dynamic mapping updates as builders", we should first merge #102936 and #96235. We can then weigh in if the additional complexity of maintaining a mapperSize implementation in both Mapper and Mapper.Builder is worth the optimization.

@elasticsearchmachine elasticsearchmachine merged commit a74ae22 into elastic:main Jan 4, 2024
@benwtrent benwtrent deleted the feature/revert-90674 branch January 4, 2024 17:19
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 4, 2024
Reverts elastic#90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
elastic#103858

closes: elastic#103011
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.12
8.11

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 4, 2024
Reverts elastic#90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
elastic#103858

closes: elastic#103011
elasticsearchmachine pushed a commit that referenced this pull request Jan 4, 2024
Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
#103858

closes: #103011
elasticsearchmachine pushed a commit that referenced this pull request Jan 4, 2024
Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
#103858

closes: #103011
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Jan 5, 2024
Because of elastic#103865,
DocumentParserContext#addDynamicMapper receives a Mapper, not a Mapper.Builder again.
Therefore, we don't need a mapperSize method for the builder.

This simplifies things a lot.
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Jan 5, 2024
Because of elastic#103865,
DocumentParserContext#addDynamicMapper receives a Mapper, not a Mapper.Builder again.
Therefore, we don't need a mapperSize method for the builder.

This simplifies things a lot.
elasticsearchmachine pushed a commit that referenced this pull request Feb 6, 2024
Today, we're counting all mappers, including mappers for subfields that
aren't explicitly added to the mapping towards the field limit.

This means that some field types, such as `search_as_you_type` or
`percolator` count as more than one field even though that's not
apparent to users as they're just defining them as a single field in the
mapping.

This change makes it so that each field mapper only counts as one. We're
still counting multi-fields.

This makes it easier to understand for users why the field limit is hit.

~In addition to that, it also simplifies
#96235 as it makes the
implementation of `Mapper.Builder#getTotalFieldsCount` much easier and
easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk
of over- or under-estimating the field count of a `Mapper.Builder` in
`DocumentParserContext#addDynamicMapper`, which in turn reduces the risk
of data loss due to the issue described here:
#96235 (comment)

*Edit: due to #103865, we
don't need an implementation of `getTotalFieldsCount` or `mapperSize` in
`Mapper.Builder`. Still, this PR more closely aligns
`Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`,
which  `DocumentParserContext#addDynamicMapper` uses to determine
whether the field limit is hit*

A potential risk of this is that we're now effectively allowing more
fields in the mapping. It may be surprising to users that more fields
can be added to a mapping. Although, I'd not expect negative
consequences from that. Generally, I'd  expect users to be happy about
any change that reduces the risk of data loss.

We could also think about whether to apply the new counting logic only
to new indices (depending on the `IndexVersion`). However, that would
add more complexity and I'm not convinced about the value. We'd then
need to maintain two different ways of counting fields and also require
passing in the `IndexVersion` to `MappingLookup` which previously didn't
require the `IndexVersion`.

This PR is meant as a conversation starter. It would also simplify
#96235 but I don't think
this blocks that PR in any way.

I'm curious about the opinion of @javanna and @jpountz on this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.11.4 v8.12.1 v8.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Significant ingest performance regression while handling parsing dynamically added objects

5 participants