Skip to content

Check diffMappings after WAIT_FOR_YELLOW_SOURCE#148980

Merged
gsoldevila merged 3 commits intoelastic:mainfrom
gsoldevila:kbn-147371-wait-for-yellow-before-diff-mappings
Jan 17, 2023
Merged

Check diffMappings after WAIT_FOR_YELLOW_SOURCE#148980
gsoldevila merged 3 commits intoelastic:mainfrom
gsoldevila:kbn-147371-wait-for-yellow-before-diff-mappings

Conversation

@gsoldevila
Copy link
Contributor

Fixes a bug introduced by #147371.

The goal of the aforementioned PR was to avoid reindexing if the SO mappings didn't change.
But in that scenario we are still going to perform some operations on the index, and thus we must await for the index to be ready (aka WAIT_FOR_YELLOW_SOURCE).

This PR moves the diffMappings check (and the corresponding "shortcircuit") after the WAIT_FOR_YELLOW_SOURCE state + operation.

image

@gsoldevila gsoldevila added bug Fixes for quality problems that affect the customer experience Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting v8.7.0 labels Jan 16, 2023
@gsoldevila gsoldevila requested a review from a team as a code owner January 16, 2023 15:16
Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @gsoldevila ! This logic makes sense to me! I added one comment/question that is not a blocker.

I expect we will also need to update src/core/server/integration_tests/saved_objects/migrations/skip_reindex.test.ts. It would be nice to also assert the full expected flow of INIT -> WAIT... -> PREPARE... there.

// CHECKPOINT here we decide to go for yellow source
return {
...stateP,
aliases,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to get your thoughts:

By passing aliases here it seems more likely that this object could be stale if other Kibana's see yellow source before us. We would reach the PREPARE_COMPATIBLE_MIGRATION step, get an alias_not_found exception and still go to OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. So all good?

Copy link
Contributor Author

@gsoldevila gsoldevila Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

The only way I see to get fresher aliases information would be to perform a GET .kibana again (like the INIT step does), and even in that case, there's no way to guarantee that another instance will not delete the aliases before us, so I'm not sure it is worth it.

  • If the other instance is the same version, and it performs the same cleanup of aliases before us, it's alright to proceed to OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT on alias_not_found. As you said, odds of this happening are increasing with this fix, but the behavior / handling is still correct IMHO.

  • If the other instance is a more recent version, it will obtain the aliases in the INIT step:

    • if it obtains them before we update / delete them in the PREPARE_COMPATIBLE_MIGRATION step, it will proceed to CHECK_UNKNOWN_DOCUMENTS + all the reindex steps, which don't use the aliases anymore.
    • if it obtains them after we update / delete them in the PREPARE_COMPATIBLE_MIGRATION step, it will reindex the source index into a new index without using the alias for anything. Then, it would likely override our .kibana alias and make it point to the newly created index. In this case we'd have some garbage version aliases pointing to an older index version, but that's already the case without this fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ there's always a potential race condition between the time we get the fetchAliases respose and performing the preTransformDocsActions. Before it was probably in the order of < 100ms and now it could potentially be many minutes. So this makes any potential race conditions more likely to occur but doesn't introduce anything new.

Like you both said, any race conditions should be handled in PREPARE_COMPATIBLE_MIGRATION already.

@gsoldevila
Copy link
Contributor Author

Great work @gsoldevila ! This logic makes sense to me! I added one comment/question that is not a blocker.

I expect we will also need to update src/core/server/integration_tests/saved_objects/migrations/skip_reindex.test.ts. It would be nice to also assert the full expected flow of INIT -> WAIT... -> PREPARE... there.

I was waiting for CI feedback to see which integration tests need to be updated, thanks for the heads up!

@kibana-ci
Copy link

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

// CHECKPOINT here we decide to go for yellow source
return {
...stateP,
aliases,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ there's always a potential race condition between the time we get the fetchAliases respose and performing the preTransformDocsActions. Before it was probably in the order of < 100ms and now it could potentially be many minutes. So this makes any potential race conditions more likely to occur but doesn't introduce anything new.

Like you both said, any race conditions should be handled in PREPARE_COMPATIBLE_MIGRATION already.

@gsoldevila gsoldevila merged commit 5d33289 into elastic:main Jan 17, 2023
@rudolf rudolf added the Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects label Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants