Perform successful Elasticsearch version check before migrations #51311

rudolf · 2019-11-21T17:27:07Z

Summary

Testing notes:

When starting Kibana against a cluster with an unsupported ES node, Kibana should log:

Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations...

but should not start saved object migrations until all the nodes are compatible

Starting saved objects migrations
If Kibana has successfully started with a compatible ES cluster, but then an incompatible ES node joins the cluster, Kibana's "elasticsearch" plugin should go into a red state and reloading Kibana in a browser should render the status page.

Release note:

This fix addresses a regression where Kibana would not check that all Elasticsearch nodes are compatible before starting Saved Object migrations.

Checklist

Use ~~strikethroughs~~ to remove checklist items you don't feel are applicable to this PR.

This was checked for cross-browser compatibility, including a check against IE11
Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
This was checked for keyboard-only and screenreader accessibility

For maintainers

This was checked for breaking API changes and was labeled appropriately
This includes a feature addition or change that requires a release note and was labeled appropriately

elasticmachine · 2019-11-21T17:27:09Z

Pinging @elastic/kibana-platform (Team:Platform)

elasticmachine · 2019-11-21T18:48:39Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request
Commit: 5c91f29391a5a7592e5312701d02dbbb96328d4a

pgayvallet

I saw too late that this was only a draft! Kept only my NITs on current progress.

src/core/server/elasticsearch/elasticsearch_config.ts

src/core/server/elasticsearch/version_check/ensure_es_version.test.ts

src/core/server/elasticsearch/version_check/ensure_es_version.ts

joshdover · 2019-12-18T16:16:34Z

@rudolf What's the status here?

rudolf · 2020-01-26T21:15:48Z

src/core/server/saved_objects/saved_objects_service.ts

+    this.logger.debug(
+      'Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations...'
+    );
+    await this.setupDeps!.elasticsearch.esNodesCompatibility$.pipe(


This behaviour isn't 100% the same as what it was in legacy. In legacy we would start the status service so even though migrations wouldn't run, there would be a running server which showed that the Elasticsearch plugin was red with the reason which helps surface the underlying problem. Once we have a status service in NP we should aim to create similar behaviour.

pgayvallet

#49785 states

Although the reasons for abandoning the health check still stand, we will have to keep polling to do the version check since new Elasticsearch nodes can join an existing cluster after Kibana has started up.

In current PR, we are only waiting for ES to be ready once to trigger some actions, but further state change are doing nothing. Do we know what we are planning to do in case of scenario like

red -> green (trigger start of SO + legacy's waitUntilReady) -> red (atm do nothing) -> green (atm do nothing)

src/core/server/elasticsearch/elasticsearch_config.ts

src/core/server/elasticsearch/elasticsearch_service.ts

src/core/server/elasticsearch/version_check/ensure_es_version.ts

pgayvallet · 2020-01-28T09:43:24Z

src/legacy/core_plugins/elasticsearch/lib/version_health_check.js

+    esNodesCompatibility$.subscribe(({ isCompatible, message, kibanaVersion, warningNodes }) => {
+      if (!isCompatible) {
+        esPlugin.status.red(message);
+      } else {
+        if (message && message.length > 0) {
+          logWithMetadata(['warning'], message, {
+            kibanaVersion,
+            nodes: warningNodes,
+          });
+        }
+        esPlugin.status.green('Ready');
+        resolve();
+      }


Should we unsubscribe after the first resolve call to avoid wrongly recalling resolve in case of green->red->green ?

We want to keep updating the status so we need the subscription. Although resolve() should only be called once, calling it multiple times is a no-op so it won't cause any problems.

src/core/server/saved_objects/migrations/kibana/kibana_migrator.ts

pgayvallet · 2020-01-28T09:51:58Z

src/core/server/saved_objects/saved_objects_service.ts

+      await this.setupDeps!.elasticsearch.esNodesCompatibility$.pipe(
+        filter(nodes => nodes.isCompatible),
+        take(1)
+      ).toPromise();


Except in the legacy code you adapted, we are not displaying any info message for the user about the fact that we are waiting (maybe indefinitely) for ES to be ready?

Maybe we should add a timeout and throw a fatal after some time? Or are we expecting Kibana to hang indefinitely waiting for this condition?

Should this check be done at a higher level (thinking in the Server)? It seems to me that waiting for ES to be ready is higher responsibility than the SOService should handle.

Except in the legacy code you adapted, we are not displaying any info message for the user about the fact that we are waiting (maybe indefinitely) for ES to be ready?

I've changed the log message to an info to indicate that we're waiting for ES and when we're starting migrations.

Maybe we should add a timeout and throw a fatal after some time? Or are we expecting Kibana to hang indefinitely waiting for this condition?

The existing behaviour is to wait indefinitely. It could take a day before a faulty cluster is fixed, in such a case I think it's nice if Kibana just starts working again automatically.

Should this check be done at a higher level (thinking in the Server)? It seems to me that waiting for ES to be ready is higher responsibility than the SOService should handle.

I don't have a strong opinion, but I think if the SO Service has some dependency on an external condition then the logic to wait for that condition belongs in the SO Service. This is a minor, but when it comes to the logging tags it might make it easier to see that these logs are related if they all have the same tags, rather than some being tagged server and others savedobjects-service.

I would say it might be good to repeat this message on an interval but I wouldn't consider that a blocker to this PR.

I don't have a strong opinion, but I think if the SO Service has some dependency on an external condition then the logic to wait for that condition belongs in the SO Service. This is a minor, but when it comes to the logging tags it might make it easier to see that these logs are related if they all have the same tags, rather than some being tagged server and others savedobjects-service.

I think it's fine we put this in SO service until/if there are other Core services that require this as well.

legrego

Spaces changes LGTM - code review only

joshdover

LGTM after a couple minor changes (and green CI)

joshdover · 2020-01-29T21:25:35Z

x-pack/plugins/spaces/server/lib/request_interceptors/on_post_auth_interceptor.test.ts

+    const { http, elasticsearch } = await root.setup();
+
+    // Mock esNodesCompatibility$ to prevent `root.start()` from blocking on ES version check
+    elasticsearch.esNodesCompatibility$ = elasticsearchServiceMock.createInternalSetup().esNodesCompatibility$;


Any reason not to do this in KbnTestUtils? Seems like this could break other tests in confusing ways in the future

Yes, I find this rather ugly. KbnTestUtils is sometimes used with an ES server in which case we don't need to fake esNodesCompatibility$, but plugins should never have to run tests against an incompatible ES node so it's probably safe to always skip this during testing.

To prevent potentially long build/test cycle when changing integration tests, I'll rather do this in a separate PR.

joshdover · 2020-01-29T21:39:32Z

src/core/server/saved_objects/saved_objects_service.ts

+      await this.setupDeps!.elasticsearch.esNodesCompatibility$.pipe(
+        filter(nodes => nodes.isCompatible),
+        take(1)
+      ).toPromise();


I don't have a strong opinion, but I think if the SO Service has some dependency on an external condition then the logic to wait for that condition belongs in the SO Service. This is a minor, but when it comes to the logging tags it might make it easier to see that these logs are related if they all have the same tags, rather than some being tagged server and others savedobjects-service.

I think it's fine we put this in SO service until/if there are other Core services that require this as well.

src/core/server/elasticsearch/version_check/ensure_es_version.test.ts

src/core/server/elasticsearch/version_check/ensure_es_version.ts

rudolf · 2020-01-31T09:24:23Z

After further testing I realised there were two incorrect behaviours:

pollEsNodesVersion would not start polling immediately, but the first poll would be scheduled to happen after esVersionCheckInterval. For a long interval like 30s this blocks kibana from starting up with an additional 30s.
Because pollEsNodesVersion used a switchMap there's a risk that we'll bombard ES with queries if the nodes.info response time > esVersionCheckInterval. Since this is an expensive api call there's a risk that we'll suddenly make a slow cluster much slower. I've changed the implementation to use an exhaustMap so we won't schedule new requests until the previous request has resolved.

kibanamachine · 2020-01-31T09:56:33Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: f05eab1

History

💔 Build #23654 failed 91e0256
💔 Build #23585 failed 164324d
💔 Build #23328 failed c5e754d
💔 Build #23305 failed e51b5b0
💔 Build #23300 failed 134f221

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

pgayvallet

LGTM

pgayvallet · 2020-01-31T10:12:52Z

src/legacy/server/http/integration_tests/default_route_provider.test.ts

  let root: Root;
  beforeAll(async () => {
-    root = kbnTestServer.createRoot();
+    root = kbnTestServer.createRoot({ migrations: { skip: true } });


NIT: It seems you adapted every call to createRoot to add this. Should we set migrations: { skip: true } as a default in kbnTestServer.createRoot ?

Some tests run with esArchiver and then they need to apply migrations. Ideally we shouldn't disable migrations, but we should disable the ES version check itself. There is elasticsearch.ignoreVersionMismatch but it's only available in development and our integration tests run against production. We could just make this option available in production, but I think it warrants a bigger discussion so I created #56505

…stic#51311) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion

tylersmalley · 2020-01-31T19:44:20Z

@rudolf I have backported to 7.x, but 7.6 is a bit more challenging as the tests rely on saved_objects_service.test.mocks which was added here #55012, which relies on #55156 - both not present in 7.6. I think it would be best for you to handle the conflict resolution here to ensure there are no errors.

) (#56600) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion

) (#56629) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion

rudolf added Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// Feature:New Platform v8.0.0 v7.5.0 v7.6.0 labels Nov 21, 2019

pgayvallet reviewed Nov 24, 2019

View reviewed changes

rudolf removed the v7.5.0 label Nov 25, 2019

rudolf added v7.7.0 and removed v7.6.0 labels Jan 15, 2020

rudolf force-pushed the healthcheck-before-migrations branch from 5c91f29 to 07829f8 Compare January 22, 2020 15:19

Convert parts of elasticsearch version check to ts

374ac9e

rudolf force-pushed the healthcheck-before-migrations branch from 07829f8 to bd49618 Compare January 24, 2020 14:21

Move ES version check to NP

e2d6157

rudolf force-pushed the healthcheck-before-migrations branch from bd49618 to e2d6157 Compare January 24, 2020 14:53

rudolf added 3 commits January 24, 2020 23:30

Improve types

573af7c

Fix tests

91411fb

Wait till for compatible ES nodes before SO migrations

79202cc

rudolf commented Jan 26, 2020

View reviewed changes

rudolf marked this pull request as ready for review January 26, 2020 21:18

rudolf requested a review from a team as a code owner January 26, 2020 21:18

rudolf added the v7.6.1 label Jan 26, 2020

rudolf added 2 commits January 27, 2020 14:39

Fix typo

a67bd79

Don't wait for ES compatibility if skipMigrations=true

9b3b9a5

pgayvallet reviewed Jan 28, 2020

View reviewed changes

rudolf added 3 commits January 28, 2020 14:11

Legacy Elasticsearch plugin integration test

7966883

Fix integration tests

2540de8

Fix more tests

816fd4d

LeeDr added the blocker label Jan 29, 2020

legrego approved these changes Jan 29, 2020

View reviewed changes

joshdover approved these changes Jan 29, 2020

View reviewed changes

rudolf added 6 commits January 30, 2020 14:57

Fix tests after merge with master

b990cb7

Test for isCompatible=false when ES version check throws

b120ef3

add comment: Incompatibility message takes precedence

164324d

Merge branch 'master' into healthcheck-before-migrations

c4c3cd3

Start pollEsNodesVersion immediately

91e0256

Refactor pollEsNodesVersion

f05eab1

pgayvallet approved these changes Jan 31, 2020

View reviewed changes

rudolf requested review from joshdover and tylersmalley January 31, 2020 15:33

rudolf mentioned this pull request Jan 31, 2020

[Discuss] Disabling Core blocking checks during testing #56505

Closed

joshdover approved these changes Jan 31, 2020

View reviewed changes

tylersmalley approved these changes Jan 31, 2020

View reviewed changes

tylersmalley merged commit f1068cd into elastic:master Jan 31, 2020

tylersmalley mentioned this pull request Jan 31, 2020

[7.x] Perform successful Elasticsearch version check before migrations (#51311) #56544

Closed

rudolf mentioned this pull request Feb 3, 2020

[7.x] Perform successful Elasticsearch version check before migrations (#51311) #56600

Merged

rudolf mentioned this pull request Feb 3, 2020

[7.6] Perform successful Elasticsearch version check before migrations (#51311) #56629

Merged

rudolf deleted the healthcheck-before-migrations branch February 4, 2020 10:24

This was referenced Feb 5, 2020

Delay Kibana version upgrade until Elasticsearch is fully upgraded elastic/cloud-on-k8s#2353

Closed

Kibana still spamming logs when different version from Elasticsearch #14480

Closed

rudolf mentioned this pull request Feb 21, 2020

[Bug] When setting up a development environment, if elasticsearch.host is misconfigured it does not catch error #52572

Closed

Perform successful Elasticsearch version check before migrations #51311

Perform successful Elasticsearch version check before migrations #51311

Uh oh!

Conversation

rudolf commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing notes:

Release note:

Checklist

For maintainers

Uh oh!

elasticmachine commented Nov 21, 2019

Uh oh!

elasticmachine commented Nov 21, 2019

💔 Build Failed

Uh oh!

pgayvallet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joshdover commented Dec 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgayvallet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

legrego left a comment

Choose a reason for hiding this comment

Uh oh!

joshdover left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rudolf commented Jan 31, 2020

Uh oh!

kibanamachine commented Jan 31, 2020

💚 Build Succeeded

History

Uh oh!

pgayvallet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylersmalley commented Jan 31, 2020

rudolf commented Nov 21, 2019 •

edited

Loading

joshdover left a comment •

edited

Loading